no code implementations • 18 Apr 2024 • Suyuan Huang, Haoxin Zhang, Yan Gao, Yao Hu, Zengchang Qin
Multimodal Large Language Models (MLLMs) have demonstrated profound capabilities in understanding multimodal information, covering from Image LLMs to the more complex Video LLMs.
no code implementations • 18 Mar 2024 • Yuhe Liu, Mengxue Kang, Zengchang Qin, Xiangxiang Chu
Experiments show that our model has achieved better logical performance, and the extracted logical knowledge can be effectively applied to other scenarios.
no code implementations • 6 Feb 2024 • Zijie Zhong, Yunhui Zhang, Ziyi Chang, Zengchang Qin
CADReN is also proven to match the performance of previous models on single-graph NIE task.
no code implementations • ICCV 2023 • Yuhe Liu, Chuanjian Liu, Kai Han, Quan Tang, Zengchang Qin
Following this observation, we propose ECENet, a new segmentation paradigm, in which class embeddings are obtained and enhanced explicitly during interacting with multi-stage image features.
1 code implementation • 17 Jun 2022 • Zheng He, Zeke Xie, Quanzhi Zhu, Zengchang Qin
People usually believe that network pruning not only reduces the computational cost of deep networks, but also prevents overfitting by decreasing model capacity.
no code implementations • 10 Apr 2022 • Shunyu Zhang, Xiaoze Jiang, Zequn Yang, Tao Wan, Zengchang Qin
In our model, the external knowledge is represented with sentence-level facts and graph-level facts, to properly suit the scenario of the composite of dialog history and image.
no code implementations • 29 Sep 2021 • Zheng He, Quanzhi Zhu, Zengchang Qin
Network pruning is a widely-used technique to reduce the computational cost of over-parameterized neural networks.
no code implementations • 11 Aug 2020 • Xiaoze Jiang, Siyi Du, Zengchang Qin, Yajing Sun, Jing Yu
Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts.
4 code implementations • 7 Jul 2020 • Xiaoze Jiang, Jing Yu, Yajing Sun, Zengchang Qin, Zihao Zhu, Yue Hu, Qi Wu
The ability of generating detailed and non-repetitive responses is crucial for the agent to achieve human-like conversation.
no code implementations • 26 Nov 2019 • Ying Huang, Jiankai Zhuang, Zengchang Qin
In multi-person pose estimation, the left/right joint type discrimination is always a hard problem because of the similar appearance.
Multi-Person Pose Estimation Vocal Bursts Intensity Prediction
no code implementations • 19 Nov 2019 • Ying Huang, Bin Sun, Haipeng Kan, Jiankai Zhuang, Zengchang Qin
Human pose estimation has made significant advancement in recent years.
1 code implementation • 17 Nov 2019 • Xiaoze Jiang, Jing Yu, Zengchang Qin, Yingying Zhuang, Xingxing Zhang, Yue Hu, Qi Wu
More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values.
Ranked #6 on Visual Dialog on VisDial v0.9 val
no code implementations • 23 Dec 2018 • Zhuoqian Yang, Zengchang Qin, Jing Yu, Yue Hu
Upon the constructed graph, we propose a Scene Graph Convolutional Network (SceneGCN) to jointly reason the object properties and relational semantics for the correct answer.
no code implementations • 1 Nov 2018 • Shuangting Liu, Jia-Qi Zhang, Yuxin Chen, Yifan Liu, Zengchang Qin, Tao Wan
Semantic segmentation is one of the basic topics in computer vision, it aims to assign semantic labels to every pixel of an image.
no code implementations • 1 Nov 2018 • Daouda Sow, Zengchang Qin, Mouhamed Niasse, Tao Wan
The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description generation from natural images.
1 code implementation • 31 Oct 2018 • Jing Yu, Chenghao Yang, Zengchang Qin, Zhuoqian Yang, Yue Hu, Yanbing Liu
A joint neural model is proposed to learn feature representation individually in each modality.
Multimedia
no code implementations • 3 Feb 2018 • Jing Yu, Yuhang Lu, Zengchang Qin, Yanbing Liu, Jianlong Tan, Li Guo, Weifeng Zhang
A dual-path neural network model is proposed for couple feature learning in cross-modal information retrieval.
1 code implementation • 1 Dec 2017 • Heng Wang, Zengchang Qin, Tao Wan
We propose the VGAN model where the generative model is composed of recurrent neural network and VAE.
no code implementations • 2 Nov 2017 • Xinyue Zhu, Yifan Liu, Zengchang Qin, Jiahong Li
In this paper, we propose a data augmentation method using generative adversarial networks (GAN).
no code implementations • 9 May 2017 • Liang Li, Pengyu Li, Yifan Liu, Tao Wan, Zengchang Qin
Under our learning policy, the Seq2Seq model can learn mappings gradually with noises.
no code implementations • 8 May 2017 • Qiangeng Xu, Zengchang Qin, Tao Wan
In this paper, we explore a generative model for the task of generating unseen images with desired features.
4 code implementations • 4 May 2017 • Yifan Liu, Zengchang Qin, Zhenbo Luo, Hua Wang
Learning to generate colorful cartoon images from black-and-white sketches is not only an interesting research problem, but also a potential application in digital entertainment.