no code implementations • 5 May 2024 • Yu Gao, Jiancheng Huang, Xiaopeng Sun, Zequn Jie, Yujie Zhong, Lin Ma
In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation.
2 code implementations • 12 Apr 2024 • Cong Wei, Haoxian Tan, Yujie Zhong, Yujiu Yang, Lin Ma
Recent advancements have empowered Large Language Models for Vision (vLLMs) to generate detailed perceptual outcomes, including bounding boxes and masks.
1 code implementation • 7 Apr 2024 • Yingsen Zeng, Yujie Zhong, Chengjian Feng, Lin Ma
Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos.
Ranked #2 on Natural Language Moment Retrieval on ActivityNet Captions (R@5,IoU=0.5 metric)
no code implementations • 8 Feb 2024 • Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma
The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector.
2 code implementations • 12 Sep 2023 • Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim, Chen Chen, Fabian Deuser, Feng Yan, Fufu Yu, Gal Shitrit, Guanshuo Wang, Gyusik Choi, Hankyul Kim, Hao Guo, Hasby Fahrudin, Hidenari Koguchi, Håkan Ardö, Ibrahim Salah, Ido Yerushalmy, Iftikar Muhammad, Ikuma Uchida, Ishay Be'ery, Jaonary Rabarisoa, Jeongae Lee, Jiajun Fu, Jianqin Yin, Jinghang Xu, Jongho Nang, Julien Denize, Junjie Li, Junpei Zhang, Juntae Kim, Kamil Synowiec, Kenji Kobayashi, Kexin Zhang, Konrad Habel, Kota Nakajima, Licheng Jiao, Lin Ma, Lizhi Wang, Luping Wang, Menglong Li, Mengying Zhou, Mohamed Nasr, Mohamed Abdelwahed, Mykola Liashuha, Nikolay Falaleev, Norbert Oswald, Qiong Jia, Quoc-Cuong Pham, Ran Song, Romain Hérault, Rui Peng, Ruilong Chen, Ruixuan Liu, Ruslan Baikulov, Ryuto Fukushima, Sergio Escalera, Seungcheon Lee, Shimin Chen, Shouhong Ding, Taiga Someya, Thomas B. Moeslund, Tianjiao Li, Wei Shen, Wei zhang, Wei Li, Wei Dai, Weixin Luo, Wending Zhao, Wenjie Zhang, Xinquan Yang, Yanbiao Ma, Yeeun Joo, Yingsen Zeng, Yiyang Gan, Yongqiang Zhu, Yujie Zhong, Zheng Ruan, Zhiheng Li, Zhijian Huang, Ziyu Meng
More information on the tasks, challenges, and leaderboards are available on https://www. soccer-net. org.
3 code implementations • 11 Sep 2023 • Dingfeng Shi, Qiong Cao, Yujie Zhong, Shan An, Jian Cheng, Haogang Zhu, DaCheng Tao
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
Ranked #1 on Temporal Action Localization on MultiTHUMOS
no code implementations • 5 Jun 2023 • Changcheng Xiao, Qiong Cao, Yujie Zhong, Long Lan, Xiang Zhang, Zhigang Luo, DaCheng Tao
This challenge arises from two main factors: the insufficient discriminability of ReID features and the predominant utilization of linear motion models in MOT.
Ranked #4 on Multi-Object Tracking on SportsMOT
1 code implementation • 1 Jun 2023 • Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie
Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.
2 code implementations • 22 May 2023 • Feng Yan, Weixin Luo, Yujie Zhong, Yiyang Gan, Lin Ma
Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not surpassed non-end-to-end tracking-by-detection methods.
Ranked #1 on Video Object Tracking on SoccerNet-v2
1 code implementation • ICCV 2023 • Cong Han, Yujie Zhong, Dengjie Li, Kai Han, Lin Ma
Recently, the open-vocabulary semantic segmentation problem has attracted increasing attention and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pretrained visual-language model.
1 code implementation • CVPR 2023 • Xiao Zhou, Yujie Zhong, Zhen Cheng, Fan Liang, Lin Ma
To address this problem, we propose a novel loss paradigm termed Sparse Pairwise (SP) loss that only leverages few appropriate pairs for each class in a mini-batch, and empirically demonstrate that it is sufficient for the ReID tasks.
1 code implementation • CVPR 2023 • Dingfeng Shi, Yujie Zhong, Qiong Cao, Lin Ma, Jia Li, DaCheng Tao
In this paper, we present a one-stage framework TriDet for temporal action detection.
Ranked #2 on Temporal Action Localization on EPIC-KITCHENS-100
1 code implementation • 24 Dec 2022 • Dengjie Li, Siyu Chen, Yujie Zhong, Lin Ma
In person re-identification (ReID) tasks, many works explore the learning of part features to improve the performance over global image features.
Ranked #2 on Person Re-Identification on Occluded-DukeMTMC
1 code implementation • CVPR 2023 • Chengjian Feng, Zequn Jie, Yujie Zhong, Xiangxiang Chu, Lin Ma
However, the typical convolution ignores the radial symmetry of the BEV features and increases the difficulty of the detector optimization.
no code implementations • 10 Oct 2022 • Zixu Wang, Yujie Zhong, Yishu Miao, Lin Ma, Lucia Specia
However, even in paired video-text segments, only a subset of the frames are semantically relevant to the corresponding text, with the remainder representing noise; where the ratio of noisy frames is higher for longer videos.
7 code implementations • 5 Oct 2022 • Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li
The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.
1 code implementation • 29 Aug 2022 • Chang Liu, Yujie Zhong, Andrew Zisserman, Weidi Xie
In this paper, we consider the problem of generalised visual object counting, with the goal of developing a computational model for counting the number of objects from arbitrary semantic categories, using arbitrary number of "exemplars", i. e. zero-shot or few-shot counting.
Ranked #3 on Object Counting on CARPK
1 code implementation • 14 Jul 2022 • Dingfeng Shi, Yujie Zhong, Qiong Cao, Jing Zhang, Lin Ma, Jia Li, DaCheng Tao
Moreover, we propose two losses to facilitate and stabilize the training of action classification.
Ranked #15 on Temporal Action Localization on THUMOS’14
1 code implementation • CVPR 2022 • Sheng Guo, Zihua Xiong, Yujie Zhong, LiMin Wang, Xiaobo Guo, Bing Han, Weilin Huang
In this paper, we present a new cross-architecture contrastive learning (CACL) framework for self-supervised video representation learning.
no code implementations • CVPR 2022 • Xianing Chen, Qiong Cao, Yujie Zhong, Jing Zhang, Shenghua Gao, DaCheng Tao
Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation.
2 code implementations • 30 Mar 2022 • Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma
The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.
1 code implementation • 2 Dec 2021 • Zelu Deng, Yujie Zhong, Sheng Guo, Weilin Huang
This work aims at improving instance retrieval with self-supervision.
no code implementations • 23 Sep 2021 • Xianing Chen, Chunlin Xu, Qiong Cao, Jialang Xu, Yujie Zhong, Jiale Xu, Zhengxin Li, Jingya Wang, Shenghua Gao
Transformers have shown preferable performance on many vision tasks.
1 code implementation • ICCV 2021 • Chengjian Feng, Yujie Zhong, Weilin Huang
Specifically, EBL increases the intensity of the adjustment of the decision boundary for the weak classes by a designed score-guided loss margin between any two classes.
Ranked #10 on Object Detection on LVIS v1.0 val
5 code implementations • ICCV 2021 • Chengjian Feng, Yujie Zhong, Yu Gao, Matthew R. Scott, Weilin Huang
One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks.
Ranked #3 on 2D Object Detection on CeyMo
no code implementations • 9 Jul 2021 • Haoxian Tan, Sheng Guo, Yujie Zhong, Matthew R. Scott, Weilin Huang
In this paper, we propose a conceptually simple yet efficient method to bridge these two paradigms, referred as Mutually-aware Sub-Graphs Differentiable Architecture Search (MSG-DAS).
1 code implementation • 11 Jan 2021 • Guanting Liu, Yujie Zhong, Sheng Guo, Matthew R. Scott, Weilin Huang
To overcome this limitation, in this paper, we propose a Hierarchical Differentiable Architecture Search (H-DAS) that performs architecture search both at the cell level and at the stage level.
1 code implementation • 19 Nov 2020 • Yujie Zhong, Linhai Xie, Sen Wang, Lucia Specia, Yishu Miao
In this paper, we teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
1 code implementation • ECCV 2020 • Yujie Zhong, Zelu Deng, Sheng Guo, Matthew R. Scott, Weilin Huang
FAD consists of a designed search space and an efficient architecture search algorithm.
no code implementations • 26 Mar 2020 • Yujie Zhong, Relja Arandjelović, Andrew Zisserman
The objective of this work is to learn a compact embedding of a set of descriptors that is suitable for efficient retrieval and ranking, whilst maintaining discriminability of the individual descriptors.
2 code implementations • 23 Oct 2018 • Yujie Zhong, Relja Arandjelović, Andrew Zisserman
The objective of this paper is to learn a compact representation of image sets for template-based face recognition.
Ranked #3 on Face Verification on IJB-A