no code implementations • 30 Apr 2024 • Longlong Jing, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng, Sangjin Lee, Chris Sweeney, Qiurui He, Wei-Chih Hung, Tong He, Xingyi Zhou, Farshid Moussavi, Zijian Guo, Yin Zhou, Mingxing Tan, Weilong Yang, CongCong Li
In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately.
1 code implementation • 27 Sep 2023 • Libo Zhang, Xin Gu, CongCong Li, Tiejian Luo, Heng Fan
Specifically, we use lightweight ConvNets to extract features of the P-frames in the GOPs and spatial-channel attention module (SCAM) is designed to refine the feature representations of the P-frames based on the compressed information with bidirectional information flow.
no code implementations • 1 Jun 2023 • Jiachen Li, Xinwei Shi, Feiyu Chen, Jonathan Stroud, Zhishuai Zhang, Tian Lan, Junhua Mao, Jeonhyung Kang, Khaled S. Refaat, Weilong Yang, Eugene Ie, CongCong Li
Accurate understanding and prediction of human behaviors are critical prerequisites for autonomous vehicles, especially in highly dynamic and interactive scenarios such as intersections in dense urban areas.
1 code implementation • 25 Jun 2022 • Dexiang Hong, Xiaoqi Ma, Xinyao Wang, CongCong Li, YuFei Wang, Longyin Wen
This report presents the algorithm used in the submission of Generic Event Boundary Detection (GEBD) Challenge at CVPR 2022.
no code implementations • 8 Jun 2022 • Longlong Jing, Ruichi Yu, Henrik Kretzschmar, Kang Li, Charles R. Qi, Hang Zhao, Alper Ayvaci, Xu Chen, Dillon Cower, Yingwei Li, Yurong You, Han Deng, CongCong Li, Dragomir Anguelov
Monocular image-based 3D perception has become an active research area in recent years owing to its applications in autonomous driving.
no code implementations • 7 Jun 2022 • CongCong Li, Xinyao Wang, Dexiang Hong, YuFei Wang, Libo Zhang, Tiejian Luo, Longyin Wen
To capture temporal context information of each frame, we design the structure context transformer (SC-Transformer) by re-partitioning input frame sequence.
no code implementations • CVPR 2022 • CongCong Li, Xinyao Wang, Longyin Wen, Dexiang Hong, Tiejian Luo, Libo Zhang
Generic event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks.
no code implementations • 22 Dec 2021 • Jingxiao Zheng, Xinwei Shi, Alexander Gorban, Junhua Mao, Yang song, Charles R. Qi, Ting Liu, Visesh Chari, Andre Cornman, Yin Zhou, CongCong Li, Dragomir Anguelov
3D human pose estimation (HPE) in autonomous vehicles (AV) differs from other use cases in many factors, including the 3D resolution and range of data, absence of dense depth maps, failure modes for LiDAR, relative location between the camera and LiDAR, and a high bar for estimation accuracy.
1 code implementation • 1 Jul 2021 • Dexiang Hong, CongCong Li, Longyin Wen, Xinyao Wang, Libo Zhang
In this work, we design a Cascaded Temporal Attention Network (CASTANET) for GEBD, which is formed by three parts, the backbone network, the temporal attention module, and the classification module.
Ranked #1 on Boundary Detection on Kinetics-400
no code implementations • 18 Dec 2019 • Shuang Zhang, Liyao Xiang, CongCong Li, YiXuan Wang, Quanshi Zhang, Wei Wang, Bo Li
Powered by machine learning services in the cloud, numerous learning-driven mobile applications are gaining popularity in the market.