Search Results for author: Shu Kong

Found 37 papers, 21 papers with code

Roadside Monocular 3D Detection via 2D Detection Prompting

no code implementations • 1 Apr 2024 • Yechi Ma, Shuoquan Wei, Churun Zhang, Wei Hua, Yanan Li, Shu Kong

Our method builds on a key insight that, compared with 3D detectors, a 2D detector is much easier to train and performs significantly better w. r. t detections on the 2D image plane.

Paper
Add Code

Boosting Image Restoration via Priors from Pre-trained Models

no code implementations • 11 Mar 2024 • Xiaogang Xu, Shu Kong, Tao Hu, Zhe Liu, Hujun Bao

Pre-trained models with large-scale training data, such as CLIP and Stable Diffusion, have demonstrated remarkable performance in various high-level computer vision tasks such as image understanding and generation from language descriptions.

Deblurring Denoising +2

Paper
Add Code

AccessLens: Auto-detecting Inaccessibility of Everyday Objects

no code implementations • 29 Jan 2024 • Nahyun Kwon, Qian Lu, Muhammad Hasham Qazi, Joanne Liu, Changhoon Oh, Shu Kong, Jeeeun Kim

In our increasingly diverse society, everyday physical interfaces often present barriers, impacting individuals across various contexts.

Paper
Add Code

The Neglected Tails of Vision-Language Models

no code implementations • 23 Jan 2024 • Shubham Parashar, Zhiqiu Lin, Tian Liu, Xiangjue Dong, Yanan Li, Deva Ramanan, James Caverlee, Shu Kong

We address this by using large language models (LLMs) to count the number of pretraining texts that contain synonyms of these concepts.

Retrieval Zero-Shot Learning

Paper
Add Code

Revisiting Few-Shot Object Detection with Vision-Language Models

1 code implementation • 22 Dec 2023 • Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan

In this work, we propose Foundational FSOD, a new benchmark protocol that evaluates detectors pre-trained on any external datasets and fine-tuned on K-shots per target class.

Autonomous Vehicles Few-Shot Object Detection +3

Paper
Code

Long-Tailed 3D Detection via 2D Late Fusion

no code implementations • 18 Dec 2023 • Yechi Ma, Neehar Peri, Shuoquan Wei, Wei Hua, Deva Ramanan, Yanan Li, Shu Kong

Autonomous vehicles (AVs) must accurately detect objects from both common and rare classes for safe navigation, motivating the problem of Long-Tailed 3D Object Detection (LT3D).

3D Object Detection Autonomous Vehicles +2

Paper
Add Code

Instance Tracking in 3D Scenes from Egocentric Videos

1 code implementation • 7 Dec 2023 • Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes

We explore this problem by first introducing a new benchmark dataset, consisting of RGB and depth videos, per-frame camera pose, and instance-level annotations in both 2D camera and 3D world coordinates.

Human-Object Interaction Detection Object Tracking

Paper
Code

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

1 code implementation • 6 Dec 2023 • Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Alpha-CLIP not only preserves the visual recognition ability of CLIP but also enables precise control over the emphasis of image contents.

3D Generation

492

Paper
Code

A High-Resolution Dataset for Instance Detection with Multi-View Instance Capture

1 code implementation • 30 Oct 2023 • Qianqian Shen, Yunhan Zhao, Nahyun Kwon, Jeeeun Kim, Yanan Li, Shu Kong

Instance detection (InsDet) is a long-lasting problem in robotics and computer vision, aiming to detect object instances (predefined by some visual examples) in a cluttered scene.

8k Object +2

Paper
Code

Prompting Scientific Names for Zero-Shot Species Recognition

no code implementations • 15 Oct 2023 • Shubham Parashar, Zhiqiu Lin, Yanan Li, Shu Kong

We find that common names are more likely to be included in CLIP's training set, and prompting them achieves 2$\sim$5 times higher accuracy on benchmarking datasets of fine-grained species recognition.

Benchmarking Zero-Shot Learning

Paper
Add Code

OV-PARTS: Towards Open-Vocabulary Part Segmentation

1 code implementation • NeurIPS 2023 • Meng Wei, Xiaoyu Yue, Wenwei Zhang, Shu Kong, Xihui Liu, Jiangmiao Pang

Secondly, part segmentation introduces an open granularity challenge due to the diverse and often ambiguous definitions of parts in the open world.

Open Vocabulary Semantic Segmentation Segmentation +1

Paper
Code

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

1 code implementation • 26 May 2023 • Yuzhu Wang, Lechao Cheng, Manni Duan, Yongheng Wang, Zunlei Feng, Shu Kong

Finally, we propose a rather simple loss term (dubbed ND loss) to simultaneously (1) encourage student to produce large-\emph{norm} features, and (2) align the \emph{direction} of student features and teacher class-means.

Ranked #1 on Knowledge Distillation on ImageNet

Domain Adaptation Knowledge Distillation

Paper
Code

Far3Det: Towards Far-Field 3D Detection

no code implementations • 25 Nov 2022 • Shubham Gupta, Jeet Kanjani, Mengtian Li, Francesco Ferroni, James Hays, Deva Ramanan, Shu Kong

We focus on the task of far-field 3D detection (Far3Det) of objects beyond a certain distance from an observer, e. g., $>$50m.

Autonomous Vehicles Philosophy

Paper
Add Code

Towards Long-Tailed 3D Detection

1 code implementation • 16 Nov 2022 • Neehar Peri, Achal Dave, Deva Ramanan, Shu Kong

Moreover, semantic classes are often organized within a hierarchy, e. g., tail classes such as child and construction-worker are arguably subclasses of pedestrian.

Paper
Code

Continual Learning with Evolving Class Ontologies

no code implementations • 10 Oct 2022 • Zhiqiu Lin, Deepak Pathak, Yu-Xiong Wang, Deva Ramanan, Shu Kong

LECO requires learning classifiers in distinct time periods (TPs); each TP introduces a new ontology of "fine" labels that refines old ontologies of "coarse" labels (e. g., dog breeds that refine the previous ${\tt dog}$).

Class Incremental Learning Image Classification +3

Paper
Add Code

Creating a Forensic Database of Shoeprints from Online Shoe Tread Photos

1 code implementation • 4 May 2022 • Samia Shafique, Bailey Kong, Shu Kong, Charless C. Fowlkes

We develop a method termed ShoeRinsics that learns to predict depth by leveraging a mix of fully supervised synthetic data and unsupervised retail image data.

Benchmarking Depth Estimation +3

Paper
Code

Long-Tailed Recognition via Weight Balancing

1 code implementation • CVPR 2022 • Shaden Alshammari, Yu-Xiong Wang, Deva Ramanan, Shu Kong

In contrast, weight decay penalizes larger weights more heavily and so learns small balanced weights; the MaxNorm constraint encourages growing small weights within a norm ball but caps all the weights by the radius.

Ranked #9 on Long-tail Learning on CIFAR-100-LT (ρ=10)

Classification Long-tail Learning

117

Paper
Code

Multimodal Object Detection via Probabilistic Ensembling

2 code implementations • 7 Apr 2021 • Yi-Ting Chen, Jinghao Shi, Zelin Ye, Christoph Mertz, Deva Ramanan, Shu Kong

Object detection with multimodal inputs can improve many safety-critical systems such as autonomous vehicles (AVs).

Autonomous Vehicles Object +2

118

Paper
Code

OpenGAN: Open-Set Recognition via Open Data Generation

1 code implementation • ICCV 2021 • Shu Kong, Deva Ramanan

However, the former generalizes poorly to diverse open test data due to overfitting to the training outliers, which are unlikely to exhaustively span the open-world.

Open Set Learning

112

Paper
Code

An Empirical Exploration of Open-Set Recognition via Lightweight Statistical Pipelines

no code implementations • 1 Jan 2021 • Shu Kong, Deva Ramanan

Machine-learned safety-critical systems need to be self-aware and reliably know their unknowns in the open-world.

open-set classification Open Set Learning +3

Paper
Add Code

Camera Pose Matters: Improving Depth Prediction by Mitigating Pose Distribution Bias

1 code implementation • CVPR 2021 • Yunhan Zhao, Shu Kong, Charless Fowlkes

We show that jointly applying the two methods improves depth prediction on images captured under uncommon and even never-before-seen camera poses.

Data Augmentation Depth Estimation +1

Paper
Code

Weak Supervision and Referring Attention for Temporal-Textual Association Learning

no code implementations • 21 Jun 2020 • Zhiyuan Fang, Shu Kong, Zhe Wang, Charless Fowlkes, Yezhou Yang

The referring attention is our designed mechanism acting as a scoring function for grounding the given queries over frames temporally.

Paper
Add Code

Celeganser: Automated Analysis of Nematode Morphology and Age

1 code implementation • 11 May 2020 • Linfeng Wang, Shu Kong, Zachary Pincus, Charless Fowlkes

The nematode Caenorhabditis elegans (C. elegans) serves as an important model organism in a wide variety of biological studies.

Paper
Code

Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation

no code implementations • CVPR 2020 • Yunhan Zhao, Shu Kong, Daeyun Shin, Charless Fowlkes

In this setting, we find that existing domain translation approaches are difficult to train and offer little advantage over simple baselines that use a mix of real and synthetic data.

Depth Prediction Monocular Depth Estimation +2

Paper
Add Code

Modularized Textual Grounding for Counterfactual Resilience

1 code implementation • CVPR 2019 • Zhiyuan Fang, Shu Kong, Charless Fowlkes, Yezhou Yang

Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries.

Attribute counterfactual +4

Paper
Code

Multigrid Predictive Filter Flow for Unsupervised Learning on Videos

2 code implementations • 2 Apr 2019 • Shu Kong, Charless Fowlkes

We introduce multigrid Predictive Filter Flow (mgPFF), a framework for unsupervised learning on videos.

Ranked #1 on Skeleton Based Action Recognition on JHMDB Pose Tracking

Optical Flow Estimation Pose Tracking +3

139

Paper
Code

Image Reconstruction with Predictive Filter Flow

2 code implementations • 28 Nov 2018 • Shu Kong, Charless Fowlkes

We propose a simple, interpretable framework for solving a wide range of image reconstruction problems such as denoising and deconvolution.

Ranked #18 on Image Super-Resolution on Set14 - 4x upscaling

Deblurring Denoising +3

139

Paper
Code

Pixel-wise Attentional Gating for Parsimonious Pixel Labeling

1 code implementation • 3 May 2018 • Shu Kong, Charless Fowlkes

To achieve parsimonious inference in per-pixel labeling tasks with a limited computational budget, we propose a \emph{Pixel-wise Attentional Gating} unit (\emph{PAG}) that learns to selectively process a subset of spatial locations at each layer of a deep convolutional network.

Ranked #7 on Semantic Segmentation on KITTI Semantic Segmentation

Boundary Detection Semantic Segmentation +1

Paper
Code

Fine-Grained Facial Expression Analysis Using Dimensional Emotion Model

no code implementations • 2 May 2018 • Feng Zhou, Shu Kong, Charless Fowlkes, Tao Chen, Baiying Lei

Specifically, we first mapped facial expressions into dimensional measures so that we transformed facial expression analysis from a classification problem to a regression one.

General Classification regression

Paper
Add Code

Weakly Supervised Attention Learning for Textual Phrases Grounding

no code implementations • 1 May 2018 • Zhiyuan Fang, Shu Kong, Tianshu Yu, Yezhou Yang

Grounding textual phrases in visual content is a meaningful yet challenging problem with various potential applications such as image-text inference or text-driven multimedia interaction.

Paper
Add Code

Recurrent Pixel Embedding for Instance Grouping

2 code implementations • CVPR 2018 • Shu Kong, Charless Fowlkes

We introduce a differentiable, end-to-end trainable framework for solving pixel-level grouping problems such as instance segmentation consisting of two novel components.

Ranked #2 on Object Proposal Generation on PASCAL VOC 2012, 60 proposals per image

Boundary Detection Clustering +4

144

Paper
Code

Recurrent Scene Parsing with Perspective Understanding in the Loop

1 code implementation • CVPR 2018 • Shu Kong, Charless Fowlkes

We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby.

Ranked #32 on Semantic Segmentation on SUN-RGBD (using extra training data)

Monocular Depth Estimation Scene Parsing +2

Paper
Code

Low-rank Bilinear Pooling for Fine-Grained Classification

no code implementations • CVPR 2017 • Shu Kong, Charless Fowlkes

To address the computational demands of high feature dimensionality, we propose to represent the covariance features as a matrix and apply a low-rank bilinear classifier.

Classification General Classification

Paper
Add Code

Photo Aesthetics Ranking Network with Attributes and Content Adaptation

2 code implementations • 6 Jun 2016 • Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, Charless Fowlkes

In this work, we propose to learn a deep convolutional neural network to rank photo aesthetics in which the relative ranking of photo aesthetics are directly modeled in the loss function.

Ranked #7 on Aesthetics Quality Assessment on AVA

Aesthetics Quality Assessment

292

Paper
Code

Spatially Aware Dictionary Learning and Coding for Fossil Pollen Identification

no code implementations • 3 May 2016 • Shu Kong, Surangi Punyasena, Charless Fowlkes

We propose a robust approach for performing automatic species-level recognition of fossil pollen grains in microscopy images that exploits both global shape and local texture characteristics in a patch-based matching methodology.

Dictionary Learning General Classification

Paper
Add Code

Collaborative Receptive Field Learning

1 code implementation • 2 Feb 2014 • Shu Kong, Zhuolin Jiang, Qiang Yang

However, measuring pairwise distance of RF's for building the similarity graph is a nontrivial problem.

General Classification Object Categorization

Paper
Code

Learning Mid-Level Features and Modeling Neuron Selectivity for Image Classification

no code implementations • 22 Jan 2014 • Shu Kong, Zhuolin Jiang, Qiang Yang

We now know that mid-level features can greatly enhance the performance of image learning, but how to automatically learn the image features efficiently and in an unsupervised manner is still an open question.

Age Estimation Clustering +7

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.