no code implementations • 1 Apr 2024 • Yechi Ma, Shuoquan Wei, Churun Zhang, Wei Hua, Yanan Li, Shu Kong
Our method builds on a key insight that, compared with 3D detectors, a 2D detector is much easier to train and performs significantly better w. r. t detections on the 2D image plane.
no code implementations • 11 Mar 2024 • Xiaogang Xu, Shu Kong, Tao Hu, Zhe Liu, Hujun Bao
Pre-trained models with large-scale training data, such as CLIP and Stable Diffusion, have demonstrated remarkable performance in various high-level computer vision tasks such as image understanding and generation from language descriptions.
no code implementations • 29 Jan 2024 • Nahyun Kwon, Qian Lu, Muhammad Hasham Qazi, Joanne Liu, Changhoon Oh, Shu Kong, Jeeeun Kim
In our increasingly diverse society, everyday physical interfaces often present barriers, impacting individuals across various contexts.
no code implementations • 23 Jan 2024 • Shubham Parashar, Zhiqiu Lin, Tian Liu, Xiangjue Dong, Yanan Li, Deva Ramanan, James Caverlee, Shu Kong
We address this by using large language models (LLMs) to count the number of pretraining texts that contain synonyms of these concepts.
1 code implementation • 22 Dec 2023 • Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan
In this work, we propose Foundational FSOD, a new benchmark protocol that evaluates detectors pre-trained on any external datasets and fine-tuned on K-shots per target class.
no code implementations • 18 Dec 2023 • Yechi Ma, Neehar Peri, Shuoquan Wei, Wei Hua, Deva Ramanan, Yanan Li, Shu Kong
Autonomous vehicles (AVs) must accurately detect objects from both common and rare classes for safe navigation, motivating the problem of Long-Tailed 3D Object Detection (LT3D).
1 code implementation • 7 Dec 2023 • Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes
We explore this problem by first introducing a new benchmark dataset, consisting of RGB and depth videos, per-frame camera pose, and instance-level annotations in both 2D camera and 3D world coordinates.
1 code implementation • 6 Dec 2023 • Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
Alpha-CLIP not only preserves the visual recognition ability of CLIP but also enables precise control over the emphasis of image contents.
1 code implementation • 30 Oct 2023 • Qianqian Shen, Yunhan Zhao, Nahyun Kwon, Jeeeun Kim, Yanan Li, Shu Kong
Instance detection (InsDet) is a long-lasting problem in robotics and computer vision, aiming to detect object instances (predefined by some visual examples) in a cluttered scene.
no code implementations • 15 Oct 2023 • Shubham Parashar, Zhiqiu Lin, Yanan Li, Shu Kong
We find that common names are more likely to be included in CLIP's training set, and prompting them achieves 2$\sim$5 times higher accuracy on benchmarking datasets of fine-grained species recognition.
1 code implementation • NeurIPS 2023 • Meng Wei, Xiaoyu Yue, Wenwei Zhang, Shu Kong, Xihui Liu, Jiangmiao Pang
Secondly, part segmentation introduces an open granularity challenge due to the diverse and often ambiguous definitions of parts in the open world.
1 code implementation • 26 May 2023 • Yuzhu Wang, Lechao Cheng, Manni Duan, Yongheng Wang, Zunlei Feng, Shu Kong
Finally, we propose a rather simple loss term (dubbed ND loss) to simultaneously (1) encourage student to produce large-\emph{norm} features, and (2) align the \emph{direction} of student features and teacher class-means.
Ranked #1 on Knowledge Distillation on ImageNet
no code implementations • 25 Nov 2022 • Shubham Gupta, Jeet Kanjani, Mengtian Li, Francesco Ferroni, James Hays, Deva Ramanan, Shu Kong
We focus on the task of far-field 3D detection (Far3Det) of objects beyond a certain distance from an observer, e. g., $>$50m.
1 code implementation • 16 Nov 2022 • Neehar Peri, Achal Dave, Deva Ramanan, Shu Kong
Moreover, semantic classes are often organized within a hierarchy, e. g., tail classes such as child and construction-worker are arguably subclasses of pedestrian.
no code implementations • 10 Oct 2022 • Zhiqiu Lin, Deepak Pathak, Yu-Xiong Wang, Deva Ramanan, Shu Kong
LECO requires learning classifiers in distinct time periods (TPs); each TP introduces a new ontology of "fine" labels that refines old ontologies of "coarse" labels (e. g., dog breeds that refine the previous ${\tt dog}$).
1 code implementation • 4 May 2022 • Samia Shafique, Bailey Kong, Shu Kong, Charless C. Fowlkes
We develop a method termed ShoeRinsics that learns to predict depth by leveraging a mix of fully supervised synthetic data and unsupervised retail image data.
1 code implementation • CVPR 2022 • Shaden Alshammari, Yu-Xiong Wang, Deva Ramanan, Shu Kong
In contrast, weight decay penalizes larger weights more heavily and so learns small balanced weights; the MaxNorm constraint encourages growing small weights within a norm ball but caps all the weights by the radius.
Ranked #9 on Long-tail Learning on CIFAR-100-LT (ρ=10)
2 code implementations • 7 Apr 2021 • Yi-Ting Chen, Jinghao Shi, Zelin Ye, Christoph Mertz, Deva Ramanan, Shu Kong
Object detection with multimodal inputs can improve many safety-critical systems such as autonomous vehicles (AVs).
1 code implementation • ICCV 2021 • Shu Kong, Deva Ramanan
However, the former generalizes poorly to diverse open test data due to overfitting to the training outliers, which are unlikely to exhaustively span the open-world.
no code implementations • 1 Jan 2021 • Shu Kong, Deva Ramanan
Machine-learned safety-critical systems need to be self-aware and reliably know their unknowns in the open-world.
1 code implementation • CVPR 2021 • Yunhan Zhao, Shu Kong, Charless Fowlkes
We show that jointly applying the two methods improves depth prediction on images captured under uncommon and even never-before-seen camera poses.
no code implementations • 21 Jun 2020 • Zhiyuan Fang, Shu Kong, Zhe Wang, Charless Fowlkes, Yezhou Yang
The referring attention is our designed mechanism acting as a scoring function for grounding the given queries over frames temporally.
1 code implementation • 11 May 2020 • Linfeng Wang, Shu Kong, Zachary Pincus, Charless Fowlkes
The nematode Caenorhabditis elegans (C. elegans) serves as an important model organism in a wide variety of biological studies.
no code implementations • CVPR 2020 • Yunhan Zhao, Shu Kong, Daeyun Shin, Charless Fowlkes
In this setting, we find that existing domain translation approaches are difficult to train and offer little advantage over simple baselines that use a mix of real and synthetic data.
1 code implementation • CVPR 2019 • Zhiyuan Fang, Shu Kong, Charless Fowlkes, Yezhou Yang
Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries.
2 code implementations • 2 Apr 2019 • Shu Kong, Charless Fowlkes
We introduce multigrid Predictive Filter Flow (mgPFF), a framework for unsupervised learning on videos.
2 code implementations • 28 Nov 2018 • Shu Kong, Charless Fowlkes
We propose a simple, interpretable framework for solving a wide range of image reconstruction problems such as denoising and deconvolution.
Ranked #18 on Image Super-Resolution on Set14 - 4x upscaling
1 code implementation • 3 May 2018 • Shu Kong, Charless Fowlkes
To achieve parsimonious inference in per-pixel labeling tasks with a limited computational budget, we propose a \emph{Pixel-wise Attentional Gating} unit (\emph{PAG}) that learns to selectively process a subset of spatial locations at each layer of a deep convolutional network.
Ranked #7 on Semantic Segmentation on KITTI Semantic Segmentation
no code implementations • 2 May 2018 • Feng Zhou, Shu Kong, Charless Fowlkes, Tao Chen, Baiying Lei
Specifically, we first mapped facial expressions into dimensional measures so that we transformed facial expression analysis from a classification problem to a regression one.
no code implementations • 1 May 2018 • Zhiyuan Fang, Shu Kong, Tianshu Yu, Yezhou Yang
Grounding textual phrases in visual content is a meaningful yet challenging problem with various potential applications such as image-text inference or text-driven multimedia interaction.
2 code implementations • CVPR 2018 • Shu Kong, Charless Fowlkes
We introduce a differentiable, end-to-end trainable framework for solving pixel-level grouping problems such as instance segmentation consisting of two novel components.
1 code implementation • CVPR 2018 • Shu Kong, Charless Fowlkes
We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby.
Ranked #32 on Semantic Segmentation on SUN-RGBD (using extra training data)
no code implementations • CVPR 2017 • Shu Kong, Charless Fowlkes
To address the computational demands of high feature dimensionality, we propose to represent the covariance features as a matrix and apply a low-rank bilinear classifier.
2 code implementations • 6 Jun 2016 • Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, Charless Fowlkes
In this work, we propose to learn a deep convolutional neural network to rank photo aesthetics in which the relative ranking of photo aesthetics are directly modeled in the loss function.
Ranked #7 on Aesthetics Quality Assessment on AVA
no code implementations • 3 May 2016 • Shu Kong, Surangi Punyasena, Charless Fowlkes
We propose a robust approach for performing automatic species-level recognition of fossil pollen grains in microscopy images that exploits both global shape and local texture characteristics in a patch-based matching methodology.
1 code implementation • 2 Feb 2014 • Shu Kong, Zhuolin Jiang, Qiang Yang
However, measuring pairwise distance of RF's for building the similarity graph is a nontrivial problem.
no code implementations • 22 Jan 2014 • Shu Kong, Zhuolin Jiang, Qiang Yang
We now know that mid-level features can greatly enhance the performance of image learning, but how to automatically learn the image features efficiently and in an unsupervised manner is still an open question.