no code implementations • 14 Dec 2023 • Kai Qiu, Huishuai Zhang, Zhirong Wu, Stephen Lin
However, the model robustness, which is a critical aspect for safety, is often optimized for each specific task rather than at the pretraining stage.
no code implementations • 11 Oct 2023 • Chenguo Lin, Xumeng Wen, Wei Cao, Congrui Huang, Jiang Bian, Stephen Lin, Zhirong Wu
In this work, we make key technical contributions that are tailored to the numerical properties of time-series data and allow the model to scale to large datasets, e. g., millions of temporal sequences.
1 code implementation • 22 Sep 2023 • Yuwei Sun, Hideya Ochiai, Zhirong Wu, Stephen Lin, Ryota Kanai
Existing studies such as the Coordination method employ iterative cross-attention mechanisms with a bottleneck to enable the sparse association of inputs.
1 code implementation • CVPR 2023 • Long Lian, Zhirong Wu, Stella X. Yu
The Gestalt law of common fate, i. e., what move at the same speed belong together, has inspired unsupervised object discovery based on motion segmentation.
Ranked #1 on Unsupervised Object Segmentation on FBMS-59
1 code implementation • ICCV 2023 • Huimin Wu, Chenyang Lei, Xiao Sun, Peng-Shuai Wang, Qifeng Chen, Kwang-Ting Cheng, Stephen Lin, Zhirong Wu
Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part.
no code implementations • 17 Dec 2022 • Long Lian, Zhirong Wu, Stella X. Yu
Previous methods in unsupervised video object segmentation (UVOS) have demonstrated the effectiveness of motion as either input or supervision for segmentation.
no code implementations • 21 Nov 2022 • Zhihang Zhong, Mingxi Cheng, Zhirong Wu, Yuhui Yuan, Yinqiang Zheng, Ji Li, Han Hu, Stephen Lin, Yoichi Sato, Imari Sato
Image cropping has progressed tremendously under the data-driven paradigm.
1 code implementation • 20 Jul 2022 • Zhihang Zhong, Xiao Sun, Zhirong Wu, Yinqiang Zheng, Stephen Lin, Imari Sato
Existing solutions to this problem estimate a single image sequence without considering the motion ambiguity for each region.
1 code implementation • 9 Jun 2022 • Zhirong Wu, Zihang Lai, Xiao Sun, Stephen Lin
The paper presents a scalable approach for learning spatially distributed visual representations over individual tokens and a holistic instance representation simultaneously.
1 code implementation • 12 Mar 2022 • Zhihang Zhong, Mingdeng Cao, Xiao Sun, Zhirong Wu, Zhongyi Zhou, Yinqiang Zheng, Stephen Lin, Imari Sato
In this paper, instead of two consecutive frames, we propose to exploit a pair of images captured by dual RS cameras with reversed RS directions for this highly challenging task.
4 code implementations • CVPR 2022 • Yutong Chen, Fangyun Wei, Xiao Sun, Zhirong Wu, Stephen Lin
Concretely, we pretrain the sign-to-gloss visual network on the general domain of human actions and the within-domain of a sign-to-gloss dataset, and pretrain the gloss-to-text translation network on the general domain of a multilingual corpus and the within-domain of a gloss-to-text corpus.
Ranked #3 on Sign Language Translation on CSL-Daily
1 code implementation • CVPR 2022 • Xudong Wang, Zhirong Wu, Long Lian, Stella X. Yu
Our key insight is that pseudo-labels are naturally imbalanced due to intrinsic data similarity, even when a model is trained on balanced source data and evaluated on balanced target data.
Ranked #1 on Few-Shot Image Classification on ImageNet - 0-Shot (using extra training data)
1 code implementation • 22 Nov 2021 • Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin
For human action understanding, a popular research direction is to analyze short video clips with unambiguous semantic content, such as jumping and drinking.
no code implementations • ICCV 2023 • Ceyuan Yang, Yujun Shen, Zhiyi Zhang, Yinghao Xu, Jiapeng Zhu, Zhirong Wu, Bolei Zhou
We then equip the well-learned discriminator backbone with an attribute classifier to ensure that the generator captures the appropriate characters from the reference.
1 code implementation • NeurIPS 2021 • Runtao Liu, Zhirong Wu, Stella X. Yu, Stephen Lin
Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.
Ranked #7 on Unsupervised Object Segmentation on FBMS-59
no code implementations • 29 Sep 2021 • Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin
However, methods for understanding short semantic actions cannot be directly translated to long kinematic sequences such as dancing, where it becomes challenging even to semantically label the human movements.
1 code implementation • NeurIPS 2021 • Fangyun Wei, Yue Gao, Zhirong Wu, Han Hu, Stephen Lin
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
1 code implementation • CVPR 2021 • Ceyuan Yang, Zhirong Wu, Bolei Zhou, Stephen Lin
The pretext task is to predict the instance category given the composited images as well as the foreground bounding boxes.
no code implementations • 1 Jan 2021 • Depu Meng, Zigang Geng, Zhirong Wu, Bin Xiao, Houqiang Li, Jingdong Wang
The proposed consistent instance classification (ConIC) approach simultaneously optimizes the classification loss and an additional consistency loss explicitly penalizing the feature dissimilarity between the augmented views from the same instance.
1 code implementation • 3 Aug 2020 • Peng-Shuai Wang, Yu-Qi Yang, Qian-Fang Zou, Zhirong Wu, Yang Liu, Xin Tong
Although unsupervised feature learning has demonstrated its advantages to reducing the workload of data labeling and network design in many fields, existing unsupervised 3D learning methods still cannot offer a generic network for various shape analysis tasks with competitive performance to supervised methods.
Ranked #2 on 3D Semantic Segmentation on PartNet
3D Point Cloud Linear Classification 3D Semantic Segmentation
no code implementations • ICLR 2021 • Nanxuan Zhao, Zhirong Wu, Rynson W. H. Lau, Stephen Lin
Contrastive visual pretraining based on the instance discrimination pretext task has made significant progress.
1 code implementation • CVPR 2020 • Yizhuo Zhang, Zhirong Wu, Houwen Peng, Stephen Lin
Semi-supervised video object segmentation aims to separate a target object from a video sequence, given the mask in the first frame.
no code implementations • 14 Apr 2020 • Nanxuan Zhao, Zhirong Wu, Rynson W. H. Lau, Stephen Lin
To address this problem, we propose a data-driven approach for learning invariance to backgrounds.
1 code implementation • 20 Dec 2018 • Bin Liu, Zhirong Wu, Han Hu, Stephen Lin
In this paper, we propose a generic framework that utilizes unlabeled data to aid generalization for all three tasks.
2 code implementations • ECCV 2018 • Zhirong Wu, Alexei A. Efros, Stella X. Yu
Current major approaches to visual recognition follow an end-to-end formulation that classifies an input image into one of the pre-determined set of semantic categories.
4 code implementations • CVPR 2018 • Zhirong Wu, Yuanjun Xiong, Stella X. Yu, Dahua Lin
Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so.
Ranked #40 on Semi-Supervised Image Classification on ImageNet - 1% labeled data (Top 5 Accuracy metric)
14 code implementations • 5 May 2018 • Zhirong Wu, Yuanjun Xiong, Stella Yu, Dahua Lin
Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so.
Ranked #13 on Contrastive Learning on imagenet-1k
6 code implementations • ICCV 2017 • Yue Zhao, Yuanjun Xiong, Li-Min Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin
Detecting actions in untrimmed videos is an important yet challenging task.
Ranked #6 on Action Recognition on THUMOS’14
1 code implementation • 7 Sep 2016 • Zhirong Wu, Dahua Lin, Xiaoou Tang
Markov Random Fields (MRFs), a formulation widely used in generative image modeling, have long been plagued by the lack of expressive power.
no code implementations • 19 Nov 2015 • Zhirong Wu, Dahua Lin, Xiaoou Tang
This suggests that the semantic structure of a neural network may be manifested through a guided binarization process.
no code implementations • CVPR 2015 • Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao
Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses from raw CAD data, and discovers hierarchical compositional part representations automatically.
Ranked #35 on 3D Point Cloud Classification on ModelNet40 (Mean Accuracy metric)