no code implementations • 14 Sep 2023 • Yunshui Li, Binyuan Hui, Zhaochao Yin, Wanwei He, Run Luo, Yuxing Long, Min Yang, Fei Huang, Yongbin Li
Visually-grounded dialog systems, which integrate multiple modes of communication such as text and visual inputs, have become an increasingly popular area of investigation.
1 code implementation • 19 Aug 2023 • Run Luo, Zikai Song, Lintao Ma, JinLin Wei, Wei Yang, Min Yang
In inference, the model refines a set of paired randomly generated boxes to the detection and tracking results in a flexible one-step or multi-step denoising diffusion process.
1 code implementation • 26 Jan 2023 • Zikai Song, Run Luo, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang
Transformer framework has been showing superior performances in visual object tracking for its great strength in information aggregation across the template and search image with the well-known attention mechanism.
no code implementations • 12 Mar 2022 • Run Luo, JinLin Wei, Qiao Lin
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos.