no code implementations • 25 Mar 2024 • Yingshan Chang, Yasi Zhang, Zhiyuan Fang, YingNian Wu, Yonatan Bisk, Feng Gao
We hypothesize that the underlying phenomenological coverage has not been proportionally scaled up, leading to a skew of the presented phenomenon which harms generalization.
1 code implementation • 1 Jun 2023 • Man Luo, Zhiyuan Fang, Tejas Gokhale, Yezhou Yang, Chitta Baral
We investigate knowledge retrieval with multi-modal queries, i. e. queries containing information split across image and text inputs, a challenging task that differs from previous work on cross-modal retrieval.
1 code implementation • 13 Nov 2022 • Zekang Zhang, Guangyu Gao, Zhiyuan Fang, Jianbo Jiao, Yunchao Wei
Our MicroSeg is based on the assumption that background regions with strong objectness possibly belong to those concepts in the historical or future stages.
Class-Incremental Semantic Segmentation Continual Learning +1
1 code implementation • 28 Apr 2022 • Arnav Chakravarthy, Zhiyuan Fang, Yezhou Yang
In videos that contain actions performed unintentionally, agents do not achieve their desired goals.
1 code implementation • CVPR 2022 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu
In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.
no code implementations • ICCV 2021 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu
In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model.
1 code implementation • ICLR 2021 • Zhiyuan Fang, JianFeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu
This paper is concerned with self-supervised learning for small models.
no code implementations • 21 Jun 2020 • Zhiyuan Fang, Shu Kong, Zhe Wang, Charless Fowlkes, Yezhou Yang
The referring attention is our designed mechanism acting as a scoring function for grounding the given queries over frames temporally.
no code implementations • 13 Jun 2020 • Ziming Liu, Guangyu Gao, Lin Sun, Zhiyuan Fang
By extracting various features from high to low resolutions, the MD-IPN is able to improve the performance of small object detection as well as maintaining the performance of middle and large objects.
2 code implementations • ECCV 2020 • Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang
Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions.
Ranked #18 on Text based Person Retrieval on CUHK-PEDES
2 code implementations • EMNLP 2020 • Zhiyuan Fang, Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
In videos that involve active agents such as humans, the agent's actions can bring about myriad changes in the scene.
no code implementations • 28 May 2019 • Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral
The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence.
1 code implementation • CVPR 2019 • Zhiyuan Fang, Shu Kong, Charless Fowlkes, Yezhou Yang
Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries.
no code implementations • 1 May 2018 • Zhiyuan Fang, Shu Kong, Tianshu Yu, Yezhou Yang
Grounding textual phrases in visual content is a meaningful yet challenging problem with various potential applications such as image-text inference or text-driven multimedia interaction.
no code implementations • ICCV 2017 • Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao
Unlike these work, this paper investigated how long-tailed data impact the training of face CNNs and develop a novel loss function, called range loss, to effectively utilize the tailed data in training process.
2 code implementations • 28 Nov 2016 • Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao
Convolutional neural networks have achieved great improvement on face recognition in recent years because of its extraordinary ability in learning discriminative features of people with different identities.