no code implementations • 28 Mar 2024 • Yanglin Feng, Yang Qin, Dezhong Peng, Hongyuan Zhu, Xi Peng, Peng Hu
We observe that the data is challenging and with noisy correspondence due to the sparsity, noise, or disorder of point clouds and the ambiguity, vagueness, or incompleteness of texts, which make existing cross-modal matching methods ineffective for PTM.
1 code implementation • 29 Jan 2024 • Zhijing Wan, Zhixiang Wang, Yuran Wang, Zheng Wang, Hongyuan Zhu, Shin'ichi Satoh
Existing methods typically measure both the representation and diversity of data based on similarity metrics, such as L2-norm.
no code implementations • 12 Jan 2024 • Jialiang Tang, Shuo Chen, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou, Chen Gong, Masashi Sugiyama
Then, we build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network, while simultaneously encouraging the adapter within the teacher network to learn the domain-specific knowledge of the target data.
1 code implementation • 17 Dec 2023 • Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Gang Yu, Tao Chen
Furthermore, we establish a new benchmark for assessing the performance of large models in understanding multi-modal 3D prompts.
1 code implementation • 30 Nov 2023 • Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen
However, developing LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation-invariant point cloud 3D representations of the 3D scene.
no code implementations • 10 Oct 2023 • Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng
We attribute this issue to the inappropriate alignment criteria, which disrupt the semantic distance consistency between the feature space and the input space.
no code implementations • 17 Sep 2023 • Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim
Many studies focus on improving pretraining or developing new backbones in text-video retrieval.
1 code implementation • 6 Sep 2023 • Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen
Moreover, we argue that object localization and description generation require different levels of scene understanding, which could be challenging for a shared set of queries to capture.
no code implementations • 19 Jul 2023 • Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng
Therefore, considerable efforts have been made to address the challenge of insufficient data in deep learning by leveraging SSL algorithms.
no code implementations • 7 Jun 2023 • Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim
Text-video retrieval contains various challenges, including biases coming from diverse sources.
no code implementations • 20 Apr 2023 • Haoyang Peng, Baopu Li, Bo Zhang, Xin Chen, Tao Chen, Hongyuan Zhu
Then, a novel multi-view prompt fusion module is developed to effectively fuse information from different views to bridge the gap between 3D point cloud data and 2D pre-trained models.
1 code implementation • 31 Mar 2023 • Chuangguan Ye, Hongyuan Zhu, Yongbin Liao, Yanggang Zhang, Tao Chen, Jiayuan Fan
Due to the emergence of powerful computing resources and large-scale annotated datasets, deep learning has seen wide applications in our daily life.
1 code implementation • 31 Mar 2023 • Chuangguan Ye, Hongyuan Zhu, Bo Zhang, Tao Chen
In recent years, research on few-shot learning (FSL) has been fast-growing in the 2D image domain due to the less requirement for labeled training data and greater generalization for novel classes.
1 code implementation • CVPR 2023 • Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang Yu
Compared with prior arts, our framework has several appealing advantages: 1) Without resorting to numerous hand-crafted components, our method is based on a full transformer encoder-decoder architecture with a learnable vote query driven object decoder, and a caption decoder that produces the dense captions in a set-prediction manner.
1 code implementation • CVPR 2023 • Yanglin Feng, Hongyuan Zhu, Dezhong Peng, Xi Peng, Peng Hu
Recently, with the advent of Metaverse and AI Generated Content, cross-modal retrieval becomes popular with a burst of 2D and 3D data.
no code implementations • CVPR 2023 • Yuanbiao Gou, Peng Hu, Jiancheng Lv, Hongyuan Zhu, Xi Peng
Existing studies have empirically observed that the resolution of the low-frequency region is easier to enhance than that of the high-frequency one.
1 code implementation • ICCV 2023 • Yuwei Yang, Munawar Hayat, Zhao Jin, Hongyuan Zhu, Yinjie Lei
Given only the class-level semantic information for unseen objects, we strive to enhance the correspondence, alignment and consistency between the visual and semantic spaces, to synthesise diverse, generic and transferable visual features.
1 code implementation • 29 Jun 2022 • Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim
In this report, we present our approach for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.
Ranked #9 on Multi-Instance Retrieval on EPIC-KITCHENS-100
1 code implementation • 26 Jun 2022 • Burak Satar, Hongyuan Zhu, Xavier Bresson, Joo Hwee Lim
With the emergence of social media, voluminous video clips are uploaded every day, and retrieving the most relevant visual content with a language query becomes critical.
Ranked #13 on Video Retrieval on YouCook2
1 code implementation • 26 Jun 2022 • Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim
Most methods consider only one joint embedding space between global visual and textual features without considering the local structures of each modality.
Ranked #12 on Video Retrieval on YouCook2
no code implementations • 23 May 2022 • Peng Hu, Xi Peng, Hongyuan Zhu, Mohamed M. Sabry Aly, Jie Lin
Numerous network compression methods such as pruning and quantization are proposed to reduce the model size significantly, of which the key is to find suitable compression allocation (e. g., pruning sparsity and quantization codebook) of each layer.
1 code implementation • CVPR 2022 • Xiuchao Sui, Shaohua Li, Xue Geng, Yan Wu, Xinxing Xu, Yong liu, Rick Goh, Hongyuan Zhu
This is mainly because the correlation volume, the basis of pixel matching, is computed as the dot product of the convolutional features of the two images.
Ranked #9 on Optical Flow Estimation on KITTI 2015 (train)
no code implementations • 13 Feb 2022 • En Yen Puang, Hao Zhang, Hongyuan Zhu, Wei Jing
In this paper we present SA-CNN, a hierarchical and lightweight self-attention based encoding and decoding architecture for representation learning of point cloud data.
1 code implementation • 30 Nov 2021 • Yongbin Liao, Hongyuan Zhu, Yanggang Zhang, Chuangguan Ye, Tao Chen, Jiayuan Fan
For stage two, the bounding box proposals with SPCR are grouped into some subsets, and the instance masks are mined inside each subset with a novel semantic propagation module and a property consistency graph module.
1 code implementation • CVPR 2021 • Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin
Recently, cross-modal retrieval is emerging with the help of deep multimodal learning.
no code implementations • 8 Mar 2021 • Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, Cheston Tan
This paper aims to provide an encyclopedic survey for the field of embodied AI, from its simulators to its research.
no code implementations • 11 Dec 2019 • Tianying Wang, Hao Zhang, Wei Qi Toh, Hongyuan Zhu, Cheston Tan, Yan Wu, Yong liu, Wei Jing
The proposed method is able to efficiently generalize the previously learned task by model fusion to solve the environment adaptation problem.
1 code implementation • NeurIPS 2019 • Jianwei Yang, Zhile Ren, Chuang Gan, Hongyuan Zhu, Devi Parikh
Convolutional neural networks process input data by sending channel-wise feature response maps to subsequent layers.
no code implementations • 24 Sep 2019 • Yi Cheng, Hongyuan Zhu, Ying Sun, Cihan Acar, Wei Jing, Yan Wu, Liyuan Li, Cheston Tan, Joo-Hwee Lim
To our best knowledge, this is the first work to explore effective intra- and inter-modality fusion in 6D pose estimation.
no code implementations • ACL 2019 • Joey Tianyi Zhou, Hao Zhang, Di Jin, Hongyuan Zhu, Meng Fang, Rick Siow Mong Goh, Kenneth Kwok
We propose a new neural transfer method termed Dual Adversarial Transfer Network (DATNet) for addressing low-resource Named Entity Recognition (NER).
no code implementations • 21 May 2019 • Zhao Kang, Honghui Xu, Boyu Wang, Hongyuan Zhu, Zenglin Xu
A key step of graph-based approach is the similarity graph construction.
no code implementations • ICLR 2019 • Joey Tianyi Zhou, Hao Zhang, Di Jin, Hongyuan Zhu, Rick Siow Mong Goh, Kenneth Kwok
We propose a new architecture termed Dual Adversarial Transfer Network (DATNet) for addressing low-resource Named Entity Recognition (NER).
Low Resource Named Entity Recognition named-entity-recognition +2
no code implementations • 26 Jan 2019 • Changgong Zhang, Fangneng Zhan, Hongyuan Zhu, Shijian Lu
Experiments over a number of public datasets demonstrate the effectiveness of our proposed image synthesis technique - the use of our synthesized images in deep network training is capable of achieving similar or even better scene text detection and scene text recognition performance as compared with using real images.
no code implementations • CVPR 2019 • Fangneng Zhan, Hongyuan Zhu, Shijian Lu
Recent advances in generative adversarial networks (GANs) have shown great potentials in realistic image synthesis whereas most existing works address synthesis realism in either appearance space or geometry space but few in both.
no code implementations • 12 Nov 2018 • Anran Wang, Anh Tuan Luu, Chuan-Sheng Foo, Hongyuan Zhu, Yi Tay, Vijay Chandrasekhar
In this paper, we present the Holistic Multi-modal Memory Network (HMMN) framework which fully considers the interactions between different input sources (multi-modal context, question) in each hop.
no code implementations • 22 Aug 2018 • Xi Peng, Yunnan Li, Ivor W. Tsang, Hongyuan Zhu, Jiancheng Lv, Joey Tianyi Zhou
The second is implementing discrete $k$-means with a differentiable neural network that embraces the advantages of parallel computing, online clustering, and clustering-favorable representation learning.
no code implementations • ICCV 2017 • Hongyuan Zhu, Romain Vial, Shijian Lu
Recently, the regression-based object detectors and long-term recurrent convolutional network (LRCN) have demonstrated superior performance in human action detection and recognition.
no code implementations • 26 Jun 2017 • Hongyuan Zhu, Romain Vial, Shijian Lu, Yonghong Tian, Xian-Bin Cao
In this paper, we present YoTube-a novel network fusion framework for searching action proposals in untrimmed videos, where each action proposal corresponds to a spatialtemporal video tube that potentially locates one human action.
1 code implementation • 17 Jun 2017 • Zhe Wang, Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Sibo Song, Yuan Fang, Seokhwan Kim, Nancy Chen, Luis Fernando D'Haro, Luu Anh Tuan, Hongyuan Zhu, Zeng Zeng, Ngai Man Cheung, Georgios Piliouras, Jie Lin, Vijay Chandrasekhar
Beyond that, we extend the original competition by including text information in the classification, making this a truly multi-modal approach with vision, audio and text.
no code implementations • CVPR 2016 • Hongyuan Zhu, Jean-Baptiste Weibel, Shijian Lu
RGBD scene recognition has attracted increasingly attention due to the rapid development of depth sensors and their wide application scenarios.
no code implementations • 16 Jul 2015 • Hongyuan Zhu, Shijian Lu, Jianfei Cai, Quangqing Lee
Recently, Hosang et al. conduct the first unified study of existing methods' in terms of various image-level degradations.
no code implementations • 3 Feb 2015 • Hongyuan Zhu, Fanman Meng, Jianfei Cai, Shijian Lu
Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision.