no code implementations • 15 Mar 2023 • Chenbin Pan, Zhiqi Zhang, Senem Velipasalar, Yi Xu
Different from previous video transformers, which use the same static embedding as the class token for diverse inputs, we propose a dynamic class token generator that produces a class token for each input video by analyzing the hand-object interaction and the related motion information.
no code implementations • 25 Oct 2022 • Zhiqi Zhang, Nitin Bansal, Changjiang Cai, Pan Ji, Qingan Yan, Xiangyu Xu, Yi Xu
To this end, we propose CLIP-FLow, a semi-supervised iterative pseudo-labeling framework to transfer the pretraining knowledge to the target real domain.
no code implementations • 28 Jul 2022 • Zhiqi Zhang, Wen Lu, Jinshan Cao, Guangqi Xie
Limited by hardware computational resources and memory capacity, most existing studies preprocessed original remote sensing images by down sampling or cropping them into small patches less than 512*512 pixels before sending them to a deep neural network.
no code implementations • 18 Jun 2021 • Rui Song, Xingbing Chen, Zelong Liu, Haining An, Zhiqi Zhang, Xiaoguang Wang, Hao Xu
In this paper, we propose a Label Mask multi-label text classification model (LM-MTC), which is inspired by the idea of cloze questions of language model.
31 code implementations • 23 May 2019 • Tongwen Huang, Zhiqi Zhang, Junlin Zhang
In this paper, a new model named FiBiNET as an abbreviation for Feature Importance and Bilinear feature Interaction NETwork is proposed to dynamically learn the feature importance and fine-grained feature interactions.
Ranked #18 on Click-Through Rate Prediction on Criteo
12 code implementations • 15 May 2019 • Junlin Zhang, Tongwen Huang, Zhiqi Zhang
Although some CTR model such as Attentional Factorization Machine (AFM) has been proposed to model the weight of second order interaction features, we posit the evaluation of feature importance before explicit feature interaction procedure is also important for CTR prediction tasks because the model can learn to selectively highlight the informative features and suppress less useful ones if the task has many input features.
Ranked #17 on Click-Through Rate Prediction on Criteo
no code implementations • 9 Aug 2014 • Andrei Barbu, Alexander Bridge, Zachary Burchill, Dan Coroian, Sven Dickinson, Sanja Fidler, Aaron Michaux, Sam Mussman, Siddharth Narayanaswamy, Dhaval Salvi, Lara Schmidt, Jiangnan Shangguan, Jeffrey Mark Siskind, Jarrell Waggoner, Song Wang, Jinlian Wei, Yifan Yin, Zhiqi Zhang
We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it.