no code implementations • 14 Mar 2023 • Xing Cheng, Xiangyu Wu, Dong Shen, Hezheng Lin, Fan Yang
Video grounding aims to locate the timestamps best matching the query description within an untrimmed video.
2 code implementations • 24 Dec 2021 • Gang Li, Di Xu, Xing Cheng, Lingyu Si, Changwen Zheng
Although vision Transformers have achieved excellent performance as backbone models in many vision tasks, most of them intend to capture global relations of all tokens in an image or a window, which disrupts the inherent spatial and local correlations between patches in 2D structure.
2 code implementations • 9 Sep 2021 • Xing Cheng, Hezheng Lin, Xiangyu Wu, Fan Yang, Dong Shen
In this paper, we propose a multi-stream Corpus Alignment network with single gate Mixture-of-Experts (CAMoE) and a novel Dual Softmax Loss (DSL) to solve the two heterogeneity.
Ranked #9 on Video Retrieval on MSVD (using extra training data)
1 code implementation • 11 Jun 2021 • Xing Cheng, Hezheng Lin, Xiangyu Wu, Fan Yang, Dong Shen, Zhongyuan Wang, Nian Shi, Honglin Liu
The task of multi-label image classification is to recognize all the object labels presented in an image.
Ranked #12 on Multi-Label Classification on MS-COCO
1 code implementation • 10 Jun 2021 • Hezheng Lin, Xing Cheng, Xiangyu Wu, Fan Yang, Dong Shen, Zhongyuan Wang, Qing Song, Wei Yuan
In this paper, we propose a new attention mechanism in Transformer termed Cross Attention, which alternates attention inner the image patch instead of the whole image to capture local information and apply attention between image patches which are divided from single-channel feature maps capture global information.
1 code implementation • 12 Jun 2020 • Shiqi Yang, Xiaolong Xu, Yaozheng Zhu, Ruirui Niu, Chunqiang Xu, Yuxuan Peng, Xing Cheng, Xionghui Jia, Xiaofeng Xu, Jianming Lu, Yu Ye
However, the layer-dependent magnetism of MnBi2Te4, which is fundamental and crucial for further exploration of quantum phenomena in this system, remains elusive.
Materials Science