Search Results for author: Hongyuan Zhu

Found 42 papers, 17 papers with code

PointCloud-Text Matching: Benchmark Datasets and a Baseline

no code implementations • 28 Mar 2024 • Yanglin Feng, Yang Qin, Dezhong Peng, Hongyuan Zhu, Xi Peng, Peng Hu

We observe that the data is challenging and with noisy correspondence due to the sparsity, noise, or disorder of point clouds and the ambiguity, vagueness, or incompleteness of texts, which make existing cross-modal matching methods ineffective for PTM.

Contrastive Learning Retrieval +1

Paper
Add Code

Contributing Dimension Structure of Deep Feature for Coreset Selection

1 code implementation • 29 Jan 2024 • Zhijing Wan, Zhixiang Wang, Yuran Wang, Zheng Wang, Hongyuan Zhu, Shin'ichi Satoh

Existing methods typically measure both the representation and diversity of data based on similarity metrics, such as L2-norm.

Paper
Code

Direct Distillation between Different Domains

no code implementations • 12 Jan 2024 • Jialiang Tang, Shuo Chen, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou, Chen Gong, Masashi Sugiyama

Then, we build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network, while simultaneously encouraging the adapter within the teacher network to learn the domain-specific knowledge of the target data.

Domain Adaptation Knowledge Distillation

Paper
Add Code

M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts

1 code implementation • 17 Dec 2023 • Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Gang Yu, Tao Chen

Furthermore, we establish a new benchmark for assessing the performance of large models in understanding multi-modal 3D prompts.

Instruction Following

Paper
Code

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

1 code implementation • 30 Nov 2023 • Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen

However, developing LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation-invariant point cloud 3D representations of the 3D scene.

3D dense captioning Dense Captioning +1

150

Paper
Code

Antenna Response Consistency Driven Self-supervised Learning for WIFI-based Human Activity Recognition

no code implementations • 10 Oct 2023 • Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng

We attribute this issue to the inappropriate alignment criteria, which disrupt the semantic distance consistency between the feature space and the input space.

Attribute Contrastive Learning +2

Paper
Add Code

Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention

no code implementations • 17 Sep 2023 • Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim

Many studies focus on improving pretraining or developing new backbones in text-video retrieval.

Action Recognition Graph Generation +4

Paper
Add Code

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning

1 code implementation • 6 Sep 2023 • Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen

Moreover, we argue that object localization and description generation require different levels of scene understanding, which could be challenging for a shared set of queries to capture.

3D dense captioning Caption Generation +4

Paper
Code

Self-Supervised Learning for WiFi CSI-Based Human Activity Recognition: A Systematic Study

no code implementations • 19 Jul 2023 • Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng

Therefore, considerable efforts have been made to address the challenge of insufficient data in deep learning by leveraging SSL algorithms.

Human Activity Recognition Self-Supervised Learning

Paper
Add Code

An Overview of Challenges in Egocentric Text-Video Retrieval

no code implementations • 7 Jun 2023 • Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim

Text-video retrieval contains various challenges, including biases coming from diverse sources.

Retrieval Video Retrieval

Paper
Add Code

Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost 3D Point Cloud Data-scarce Learning?

no code implementations • 20 Apr 2023 • Haoyang Peng, Baopu Li, Bo Zhang, Xin Chen, Tao Chen, Hongyuan Zhu

Then, a novel multi-view prompt fusion module is developed to effectively fuse information from different views to bridge the gap between 3D point cloud data and 2D pre-trained models.

Autonomous Driving Classification +3

Paper
Add Code

A Closer Look at Few-Shot 3D Point Cloud Classification

1 code implementation • 31 Mar 2023 • Chuangguan Ye, Hongyuan Zhu, Bo Zhang, Tao Chen

In recent years, research on few-shot learning (FSL) has been fast-growing in the 2D image domain due to the less requirement for labeled training data and greater generalization for novel classes.

Few-Shot 3D Point Cloud Classification Few-Shot Learning +1

Paper
Code

What Makes for Effective Few-shot Point Cloud Classification?

1 code implementation • 31 Mar 2023 • Chuangguan Ye, Hongyuan Zhu, Yongbin Liao, Yanggang Zhang, Tao Chen, Jiayuan Fan

Due to the emergence of powerful computing resources and large-scale annotated datasets, deep learning has seen wide applications in our daily life.

Benchmarking Classification +2

Paper
Code

End-to-End 3D Dense Captioning with Vote2Cap-DETR

1 code implementation • CVPR 2023 • Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang Yu

Compared with prior arts, our framework has several appealing advantages: 1) Without resorting to numerous hand-crafted components, our method is based on a full transformer encoder-decoder architecture with a learnable vote query driven object decoder, and a caption decoder that produces the dense captions in a set-prediction manner.

3D dense captioning Dense Captioning +1

Paper
Code

Rethinking Image Super Resolution From Long-Tailed Distribution Learning Perspective

no code implementations • CVPR 2023 • Yuanbiao Gou, Peng Hu, Jiancheng Lv, Hongyuan Zhu, Xi Peng

Existing studies have empirically observed that the resolution of the low-frequency region is easier to enhance than that of the high-frequency one.

Image Super-Resolution

Paper
Add Code

Zero-Shot Point Cloud Segmentation by Semantic-Visual Aware Synthesis

1 code implementation • ICCV 2023 • Yuwei Yang, Munawar Hayat, Zhao Jin, Hongyuan Zhu, Yinjie Lei

Given only the class-level semantic information for unseen objects, we strive to enhance the correspondence, alignment and consistency between the visual and semantic spaces, to synthesise diverse, generic and transferable visual features.

Point Cloud Segmentation Segmentation +2

Paper
Code

RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal Retrieval

1 code implementation • CVPR 2023 • Yanglin Feng, Hongyuan Zhu, Dezhong Peng, Xi Peng, Peng Hu

Recently, with the advent of Metaverse and AI Generated Content, cross-modal retrieval becomes popular with a burst of 2D and 3D data.

Cross-Modal Retrieval Learning with noisy labels +1

Paper
Code

Exploiting Semantic Role Contextualized Video Features for Multi-Instance Text-Video Retrieval EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022

1 code implementation • 29 Jun 2022 • Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim

In this report, we present our approach for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.

Ranked #9 on Multi-Instance Retrieval on EPIC-KITCHENS-100

Multi-Instance Retrieval Retrieval +3

Paper
Code

RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval

1 code implementation • 26 Jun 2022 • Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim

Most methods consider only one joint embedding space between global visual and textual features without considering the local structures of each modality.

Ranked #12 on Video Retrieval on YouCook2

Retrieval Text to Video Retrieval +1

Paper
Code

Semantic Role Aware Correlation Transformer for Text to Video Retrieval

1 code implementation • 26 Jun 2022 • Burak Satar, Hongyuan Zhu, Xavier Bresson, Joo Hwee Lim

With the emergence of social media, voluminous video clips are uploaded every day, and retrieving the most relevant visual content with a language query becomes critical.

Ranked #13 on Video Retrieval on YouCook2

Retrieval Text to Video Retrieval +1

Paper
Code

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

no code implementations • 23 May 2022 • Peng Hu, Xi Peng, Hongyuan Zhu, Mohamed M. Sabry Aly, Jie Lin

Numerous network compression methods such as pruning and quantization are proposed to reduce the model size significantly, of which the key is to find suitable compression allocation (e. g., pruning sparsity and quantization codebook) of each layer.

Quantization

Paper
Add Code

CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow

1 code implementation • CVPR 2022 • Xiuchao Sui, Shaohua Li, Xue Geng, Yan Wu, Xinxing Xu, Yong liu, Rick Goh, Hongyuan Zhu

This is mainly because the correlation volume, the basis of pixel matching, is computed as the dot product of the convolutional features of the two images.

Ranked #9 on Optical Flow Estimation on KITTI 2015 (train)

Optical Flow Estimation

Paper
Code

Hierarchical Point Cloud Encoding and Decoding with Lightweight Self-Attention based Model

no code implementations • 13 Feb 2022 • En Yen Puang, Hao Zhang, Hongyuan Zhu, Wei Jing

In this paper we present SA-CNN, a hierarchical and lightweight self-attention based encoding and decoding architecture for representation learning of point cloud data.

Representation Learning Retrieval

Paper
Add Code

Point Cloud Instance Segmentation with Semi-supervised Bounding-Box Mining

1 code implementation • 30 Nov 2021 • Yongbin Liao, Hongyuan Zhu, Yanggang Zhang, Chuangguan Ye, Tao Chen, Jiayuan Fan

For stage two, the bounding box proposals with SPCR are grouped into some subsets, and the instance masks are mined inside each subset with a novel semantic propagation module and a property consistency graph module.

Instance Segmentation Semantic Segmentation

Paper
Code

Learning Cross-Modal Retrieval With Noisy Labels

1 code implementation • CVPR 2021 • Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin

Recently, cross-modal retrieval is emerging with the help of deep multimodal learning.

Cross-Modal Retrieval Retrieval

Paper
Code

A Survey of Embodied AI: From Simulators to Research Tasks

no code implementations • 8 Mar 2021 • Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, Cheston Tan

This paper aims to provide an encyclopedic survey for the field of embodied AI, from its simulators to its research.

Embodied Question Answering Question Answering +1

Paper
Add Code

Efficient Robotic Task Generalization Using Deep Model Fusion Reinforcement Learning

no code implementations • 11 Dec 2019 • Tianying Wang, Hao Zhang, Wei Qi Toh, Hongyuan Zhu, Cheston Tan, Yan Wu, Yong liu, Wei Jing

The proposed method is able to efficiently generalize the previously learned task by model fusion to solve the environment adaptation problem.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cross-channel Communication Networks

1 code implementation • NeurIPS 2019 • Jianwei Yang, Zhile Ren, Chuang Gan, Hongyuan Zhu, Devi Parikh

Convolutional neural networks process input data by sending channel-wise feature response maps to subsequent layers.

Paper
Code

6D Pose Estimation with Correlation Fusion

no code implementations • 24 Sep 2019 • Yi Cheng, Hongyuan Zhu, Ying Sun, Cihan Acar, Wei Jing, Yan Wu, Liyuan Li, Cheston Tan, Joo-Hwee Lim

To our best knowledge, this is the first work to explore effective intra- and inter-modality fusion in 6D pose estimation.

6D Pose Estimation 6D Pose Estimation using RGB

Paper
Add Code

Dual Adversarial Neural Transfer for Low-Resource Named Entity Recognition

no code implementations • ACL 2019 • Joey Tianyi Zhou, Hao Zhang, Di Jin, Hongyuan Zhu, Meng Fang, Rick Siow Mong Goh, Kenneth Kwok

We propose a new neural transfer method termed Dual Adversarial Transfer Network (DATNet) for addressing low-resource Named Entity Recognition (NER).

Language Modelling Low Resource Named Entity Recognition +3

Paper
Add Code

Clustering with Similarity Preserving

no code implementations • 21 May 2019 • Zhao Kang, Honghui Xu, Boyu Wang, Hongyuan Zhu, Zenglin Xu

A key step of graph-based approach is the similarity graph construction.

Clustering graph construction +1

Paper
Add Code

DATNet: Dual Adversarial Transfer for Low-resource Named Entity Recognition

no code implementations • ICLR 2019 • Joey Tianyi Zhou, Hao Zhang, Di Jin, Hongyuan Zhu, Rick Siow Mong Goh, Kenneth Kwok

We propose a new architecture termed Dual Adversarial Transfer Network (DATNet) for addressing low-resource Named Entity Recognition (NER).

Low Resource Named Entity Recognition named-entity-recognition +2

Paper
Add Code

Scene Text Synthesis for Efficient and Effective Deep Network Training

no code implementations • 26 Jan 2019 • Changgong Zhang, Fangneng Zhan, Hongyuan Zhu, Shijian Lu

Experiments over a number of public datasets demonstrate the effectiveness of our proposed image synthesis technique - the use of our synthesized images in deep network training is capable of achieving similar or even better scene text detection and scene text recognition performance as compared with using real images.

Image Generation Scene Text Detection +2

Paper
Add Code

Spatial Fusion GAN for Image Synthesis

no code implementations • CVPR 2019 • Fangneng Zhan, Hongyuan Zhu, Shijian Lu

Recent advances in generative adversarial networks (GANs) have shown great potentials in realistic image synthesis whereas most existing works address synthesis realism in either appearance space or geometry space but few in both.

Image Generation

Paper
Add Code

Holistic Multi-modal Memory Network for Movie Question Answering

no code implementations • 12 Nov 2018 • Anran Wang, Anh Tuan Luu, Chuan-Sheng Foo, Hongyuan Zhu, Yi Tay, Vijay Chandrasekhar

In this paper, we present the Holistic Multi-modal Memory Network (HMMN) framework which fully considers the interactions between different input sources (multi-modal context, question) in each hop.

Question Answering Retrieval +1

Paper
Add Code

XAI Beyond Classification: Interpretable Neural Clustering

no code implementations • 22 Aug 2018 • Xi Peng, Yunnan Li, Ivor W. Tsang, Hongyuan Zhu, Jiancheng Lv, Joey Tianyi Zhou

The second is implementing discrete $k$-means with a differentiable neural network that embraces the advantages of parallel computing, online clustering, and clustering-favorable representation learning.

Classification Clustering +3

Paper
Add Code

TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal

no code implementations • ICCV 2017 • Hongyuan Zhu, Romain Vial, Shijian Lu

Recently, the regression-based object detectors and long-term recurrent convolutional network (LRCN) have demonstrated superior performance in human action detection and recognition.

Action Detection regression

Paper
Add Code

YoTube: Searching Action Proposal via Recurrent and Static Regression Networks

no code implementations • 26 Jun 2017 • Hongyuan Zhu, Romain Vial, Shijian Lu, Yonghong Tian, Xian-Bin Cao

In this paper, we present YoTube-a novel network fusion framework for searching action proposals in untrimmed videos, where each action proposal corresponds to a spatialtemporal video tube that potentially locates one human action.

Optical Flow Estimation regression

Paper
Add Code

Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

1 code implementation • 17 Jun 2017 • Zhe Wang, Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Sibo Song, Yuan Fang, Seokhwan Kim, Nancy Chen, Luis Fernando D'Haro, Luu Anh Tuan, Hongyuan Zhu, Zeng Zeng, Ngai Man Cheung, Georgios Piliouras, Jie Lin, Vijay Chandrasekhar

Beyond that, we extend the original competition by including text information in the classification, making this a truly multi-modal approach with vision, audio and text.

Classification General Classification +1

Paper
Code

Discriminative Multi-Modal Feature Fusion for RGBD Indoor Scene Recognition

no code implementations • CVPR 2016 • Hongyuan Zhu, Jean-Baptiste Weibel, Shijian Lu

RGBD scene recognition has attracted increasingly attention due to the rapid development of depth sensors and their wide application scenarios.

Image Segmentation Object Recognition +3

Paper
Add Code

Diagnosing State-Of-The-Art Object Proposal Methods

no code implementations • 16 Jul 2015 • Hongyuan Zhu, Shijian Lu, Jianfei Cai, Quangqing Lee

Recently, Hosang et al. conduct the first unified study of existing methods' in terms of various image-level degradations.

Object object-detection +1

Paper
Add Code

Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation

no code implementations • 3 Feb 2015 • Hongyuan Zhu, Fanman Meng, Jianfei Cai, Shijian Lu

Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision.

Image Segmentation Segmentation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.