no code implementations • 26 Jun 2023 • Jiaxin Deng, Dong Shen, Shiyao Wang, Xiangyu Wu, Fan Yang, Guorui Zhou, Gaofeng Meng
However, most previous works treat the live as a whole item and explore the Click-through-Rate (CTR) prediction framework on item-level, neglecting that the dynamic changes that occur even within the same live room.
no code implementations • 14 Mar 2023 • Xing Cheng, Xiangyu Wu, Dong Shen, Hezheng Lin, Fan Yang
Video grounding aims to locate the timestamps best matching the query description within an untrimmed video.
no code implementations • 19 Nov 2022 • Jiaxin Deng, Dong Shen, Haojie Pan, Xiangyu Wu, Ximan Liu, Gaofeng Meng, Fan Yang, Size Li, Ruiji Fu, Zhongyuan Wang
Furthermore, based on this dataset, we propose an end-to-end model that jointly optimizes the video understanding objective with knowledge graph embedding, which can not only better inject factual knowledge into video understanding but also generate effective multi-modal entity embedding for KG.
2 code implementations • 9 Sep 2021 • Xing Cheng, Hezheng Lin, Xiangyu Wu, Fan Yang, Dong Shen
In this paper, we propose a multi-stream Corpus Alignment network with single gate Mixture-of-Experts (CAMoE) and a novel Dual Softmax Loss (DSL) to solve the two heterogeneity.
Ranked #9 on Video Retrieval on MSVD (using extra training data)
1 code implementation • 11 Jun 2021 • Xing Cheng, Hezheng Lin, Xiangyu Wu, Fan Yang, Dong Shen, Zhongyuan Wang, Nian Shi, Honglin Liu
The task of multi-label image classification is to recognize all the object labels presented in an image.
Ranked #12 on Multi-Label Classification on MS-COCO
1 code implementation • 10 Jun 2021 • Hezheng Lin, Xing Cheng, Xiangyu Wu, Fan Yang, Dong Shen, Zhongyuan Wang, Qing Song, Wei Yuan
In this paper, we propose a new attention mechanism in Transformer termed Cross Attention, which alternates attention inner the image patch instead of the whole image to capture local information and apply attention between image patches which are divided from single-channel feature maps capture global information.
no code implementations • 10 Mar 2021 • Dong Shen, Shuai Zhao, Jinming Hu, Hao Feng, Deng Cai, Xiaofei He
In this paper, we propose a novel network, Erasing-Salient Net (ES-Net), to learn comprehensive features by erasing the salient areas in an image.
no code implementations • 29 Jan 2021 • Hao Feng, Minghao Chen, Jinming Hu, Dong Shen, Haifeng Liu, Deng Cai
In this paper, to complement these low recall neighbor pseudo labels, we propose a joint learning framework to learn better feature embeddings via high precision neighbor pseudo labels and high recall group pseudo labels.
1 code implementation • 7 Aug 2019 • Zhengxu Yu, Dong Shen, Zhongming Jin, Jianqiang Huang, Deng Cai, Xian-Sheng Hua
Model fine-tuning is a widely used transfer learning approach in person Re-identification (ReID) applications, which fine-tuning a pre-trained feature extraction model into the target scenario instead of training a model from scratch.