Search Results for author: Yujie Zhong

Found 31 papers, 23 papers with code

Matten: Video Generation with Mamba-Attention

no code implementations • 5 May 2024 • Yu Gao, Jiancheng Huang, Xiaopeng Sun, Zequn Jie, Yujie Zhong, Lin Ma

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation.

Video Generation

Paper
Add Code

LaSagnA: Language-based Segmentation Assistant for Complex Queries

2 code implementations • 12 Apr 2024 • Cong Wei, Haoxian Tan, Yujie Zhong, Yujiu Yang, Lin Ma

Recent advancements have empowered Large Language Models for Vision (vLLMs) to generate detailed perceptual outcomes, including bounding boxes and masks.

Segmentation Semantic Segmentation

211

Paper
Code

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

1 code implementation • 7 Apr 2024 • Yingsen Zeng, Yujie Zhong, Chengjian Feng, Lin Ma

Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos.

Ranked #2 on Natural Language Moment Retrieval on ActivityNet Captions (R@5,IoU=0.5 metric)

Action Detection Moment Queries +4

Paper
Code

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

no code implementations • 8 Feb 2024 • Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma

The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector.

Object object-detection +1

Paper
Add Code

SoccerNet 2023 Challenges Results

2 code implementations • 12 Sep 2023 • Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim, Chen Chen, Fabian Deuser, Feng Yan, Fufu Yu, Gal Shitrit, Guanshuo Wang, Gyusik Choi, Hankyul Kim, Hao Guo, Hasby Fahrudin, Hidenari Koguchi, Håkan Ardö, Ibrahim Salah, Ido Yerushalmy, Iftikar Muhammad, Ikuma Uchida, Ishay Be'ery, Jaonary Rabarisoa, Jeongae Lee, Jiajun Fu, Jianqin Yin, Jinghang Xu, Jongho Nang, Julien Denize, Junjie Li, Junpei Zhang, Juntae Kim, Kamil Synowiec, Kenji Kobayashi, Kexin Zhang, Konrad Habel, Kota Nakajima, Licheng Jiao, Lin Ma, Lizhi Wang, Luping Wang, Menglong Li, Mengying Zhou, Mohamed Nasr, Mohamed Abdelwahed, Mykola Liashuha, Nikolay Falaleev, Norbert Oswald, Qiong Jia, Quoc-Cuong Pham, Ran Song, Romain Hérault, Rui Peng, Ruilong Chen, Ruixuan Liu, Ruslan Baikulov, Ryuto Fukushima, Sergio Escalera, Seungcheon Lee, Shimin Chen, Shouhong Ding, Taiga Someya, Thomas B. Moeslund, Tianjiao Li, Wei Shen, Wei zhang, Wei Li, Wei Dai, Weixin Luo, Wending Zhao, Wenjie Zhang, Xinquan Yang, Yanbiao Ma, Yeeun Joo, Yingsen Zeng, Yiyang Gan, Yongqiang Zhu, Yujie Zhong, Zheng Ruan, Zhiheng Li, Zhijian Huang, Ziyu Meng

More information on the tasks, challenges, and leaderboards are available on https://www. soccer-net. org.

Action Spotting Camera Calibration +3

Paper
Code

Temporal Action Localization with Enhanced Instant Discriminability

3 code implementations • 11 Sep 2023 • Dingfeng Shi, Qiong Cao, Yujie Zhong, Shan An, Jian Cheng, Haogang Zhu, DaCheng Tao

Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.

Ranked #1 on Temporal Action Localization on MultiTHUMOS

Action Detection Temporal Action Localization

150

Paper
Code

MotionTrack: Learning Motion Predictor for Multiple Object Tracking

no code implementations • 5 Jun 2023 • Changcheng Xiao, Qiong Cao, Yujie Zhong, Long Lan, Xiang Zhang, Zhigang Luo, DaCheng Tao

This challenge arises from two main factors: the insufficient discriminability of ReID features and the predominant utilization of linear motion models in MOT.

Ranked #4 on Multi-Object Tracking on SportsMOT

motion prediction Multi-Object Tracking +2

Paper
Add Code

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

1 code implementation • 1 Jun 2023 • Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.

Story Visualization Style Transfer +2

154

Paper
Code

Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object Tracking

2 code implementations • 22 May 2023 • Feng Yan, Weixin Luo, Yujie Zhong, Yiyang Gan, Lin Ma

Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not surpassed non-end-to-end tracking-by-detection methods.

Ranked #1 on Video Object Tracking on SoccerNet-v2

Multi-Object Tracking Video Object Tracking

137

Paper
Code

Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

1 code implementation • ICCV 2023 • Cong Han, Yujie Zhong, Dengjie Li, Kai Han, Lin Ma

Recently, the open-vocabulary semantic segmentation problem has attracted increasing attention and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pretrained visual-language model.

Classification Language Modelling +3

Paper
Code

Adaptive Sparse Pairwise Loss for Object Re-Identification

1 code implementation • CVPR 2023 • Xiao Zhou, Yujie Zhong, Zhen Cheng, Fan Liang, Lin Ma

To address this problem, we propose a novel loss paradigm termed Sparse Pairwise (SP) loss that only leverages few appropriate pairs for each class in a mini-batch, and empirically demonstrate that it is sufficient for the ReID tasks.

Object

Paper
Code

TriDet: Temporal Action Detection with Relative Boundary Modeling

1 code implementation • CVPR 2023 • Dingfeng Shi, Yujie Zhong, Qiong Cao, Lin Ma, Jia Li, DaCheng Tao

In this paper, we present a one-stage framework TriDet for temporal action detection.

Ranked #2 on Temporal Action Localization on EPIC-KITCHENS-100

Action Detection Temporal Action Localization

150

Paper
Code

DiP: Learning Discriminative Implicit Parts for Person Re-Identification

1 code implementation • 24 Dec 2022 • Dengjie Li, Siyu Chen, Yujie Zhong, Lin Ma

In person re-identification (ReID) tasks, many works explore the learning of part features to improve the performance over global image features.

Ranked #2 on Person Re-Identification on Occluded-DukeMTMC

Person Re-Identification Position

Paper
Code

AeDet: Azimuth-invariant Multi-view 3D Object Detection

1 code implementation • CVPR 2023 • Chengjian Feng, Zequn Jie, Yujie Zhong, Xiangxiang Chu, Lin Ma

However, the typical convolution ignores the radial symmetry of the BEV features and increases the difficulty of the detector optimization.

3D Object Detection Depth Estimation +3

Paper
Code

Contrastive Video-Language Learning with Fine-grained Frame Sampling

no code implementations • 10 Oct 2022 • Zixu Wang, Yujie Zhong, Yishu Miao, Lin Ma, Lucia Specia

However, even in paired video-text segments, only a subset of the frames are semantically relevant to the corresponding text, with the remainder representing noise; where the ratio of noisy frames is higher for longer videos.

Question Answering Representation Learning +3

Paper
Add Code

SoccerNet 2022 Challenges Results

7 code implementations • 5 Oct 2022 • Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li

The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.

Action Spotting Camera Calibration +3

Paper
Code

CounTR: Transformer-based Generalised Visual Counting

1 code implementation • 29 Aug 2022 • Chang Liu, Yujie Zhong, Andrew Zisserman, Weidi Xie

In this paper, we consider the problem of generalised visual object counting, with the goal of developing a computational model for counting the number of objects from arbitrary semantic categories, using arbitrary number of "exemplars", i. e. zero-shot or few-shot counting.

Ranked #3 on Object Counting on CARPK

Object Counting Self-Supervised Learning

Paper
Code

ReAct: Temporal Action Detection with Relational Queries

1 code implementation • 14 Jul 2022 • Dingfeng Shi, Yujie Zhong, Qiong Cao, Jing Zhang, Lin Ma, Jia Li, DaCheng Tao

Moreover, we propose two losses to facilitate and stabilize the training of action classification.

Ranked #15 on Temporal Action Localization on THUMOS’14

Action Classification Action Detection +5

Paper
Code

Cross-Architecture Self-supervised Video Representation Learning

1 code implementation • CVPR 2022 • Sheng Guo, Zihua Xiong, Yujie Zhong, LiMin Wang, Xiaobo Guo, Bing Han, Weilin Huang

In this paper, we present a new cross-architecture contrastive learning (CACL) framework for self-supervised video representation learning.

Action Recognition Contrastive Learning +4

Paper
Code

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

no code implementations • CVPR 2022 • Xianing Chen, Qiong Cao, Yujie Zhong, Jing Zhang, Shenghua Gao, DaCheng Tao

Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation.

Knowledge Distillation

Paper
Add Code

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

2 code implementations • 30 Mar 2022 • Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.

Language Modelling Object

279

Paper
Code

InsCLR: Improving Instance Retrieval with Self-Supervision

1 code implementation • 2 Dec 2021 • Zelu Deng, Yujie Zhong, Sheng Guo, Weilin Huang

This work aims at improving instance retrieval with self-supervision.

Retrieval

Paper
Code

OH-Former: Omni-Relational High-Order Transformer for Person Re-Identification

no code implementations • 23 Sep 2021 • Xianing Chen, Chunlin Xu, Qiong Cao, Jialang Xu, Yujie Zhong, Jiale Xu, Zhengxin Li, Jingya Wang, Shenghua Gao

Transformers have shown preferable performance on many vision tasks.

Person Re-Identification Vocal Bursts Intensity Prediction

Paper
Add Code

Exploring Classification Equilibrium in Long-Tailed Object Detection

1 code implementation • ICCV 2021 • Chengjian Feng, Yujie Zhong, Weilin Huang

Specifically, EBL increases the intensity of the adjustment of the decision boundary for the weak classes by a designed score-guided loss margin between any two classes.

Ranked #10 on Object Detection on LVIS v1.0 val

Classification imbalanced classification +5

Paper
Code

TOOD: Task-aligned One-stage Object Detection

5 code implementations • ICCV 2021 • Chengjian Feng, Yujie Zhong, Yu Gao, Matthew R. Scott, Weilin Huang

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks.

Ranked #3 on 2D Object Detection on CeyMo

Object object-detection +1

27,947

Paper
Code

Mutually-aware Sub-Graphs Differentiable Architecture Search

no code implementations • 9 Jul 2021 • Haoxian Tan, Sheng Guo, Yujie Zhong, Matthew R. Scott, Weilin Huang

In this paper, we propose a conceptually simple yet efficient method to bridge these two paradigms, referred as Mutually-aware Sub-Graphs Differentiable Architecture Search (MSG-DAS).

Paper
Add Code

Unchain the Search Space with Hierarchical Differentiable Architecture Search

1 code implementation • 11 Jan 2021 • Guanting Liu, Yujie Zhong, Sheng Guo, Matthew R. Scott, Weilin Huang

To overcome this limitation, in this paper, we propose a Hierarchical Differentiable Architecture Search (H-DAS) that performs architecture search both at the cell level and at the stage level.

Paper
Code

Watch and Learn: Mapping Language and Noisy Real-world Videos with Self-supervision

1 code implementation • 19 Nov 2020 • Yujie Zhong, Linhai Xie, Sen Wang, Lucia Specia, Yishu Miao

In this paper, we teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.

Retrieval Self-Supervised Learning

Paper
Code

Representation Sharing for Fast Object Detector Search and Beyond

1 code implementation • ECCV 2020 • Yujie Zhong, Zelu Deng, Sheng Guo, Matthew R. Scott, Weilin Huang

FAD consists of a designed search space and an efficient architecture search algorithm.

FAD Instance Segmentation +6

Paper
Code

Compact Deep Aggregation for Set Retrieval

no code implementations • 26 Mar 2020 • Yujie Zhong, Relja Arandjelović, Andrew Zisserman

The objective of this work is to learn a compact embedding of a set of descriptors that is suitable for efficient retrieval and ranking, whilst maintaining discriminability of the individual descriptors.

Retrieval

Paper
Add Code

GhostVLAD for set-based face recognition

2 code implementations • 23 Oct 2018 • Yujie Zhong, Relja Arandjelović, Andrew Zisserman

The objective of this paper is to learn a compact representation of image sets for template-based face recognition.

Ranked #3 on Face Verification on IJB-A

Face Recognition Face Verification

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.