Search Results for author: Hehe Fan

Found 31 papers, 13 papers with code

Clustering for Protein Representation Learning

no code implementations • 30 Mar 2024 • Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang

We select the highest-scoring clusters and use their medoid nodes for the next iteration of clustering, until we obtain a hierarchical and informative representation of the protein.

Clustering Protein Folding +1

Paper
Add Code

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing

no code implementations • 24 Mar 2024 • Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Yang

We find that the crux of the issue stems from the imprecise distribution of attention weights across designated regions, including inaccurate text-to-attribute control and attention leakage.

Attribute Video Editing

Paper
Add Code

ProtChatGPT: Towards Understanding Proteins with Large Language Models

no code implementations • 15 Feb 2024 • Chao Wang, Hehe Fan, Ruijie Quan, Yi Yang

The protein first undergoes protein encoders and PLP-former to produce protein embeddings, which are then projected by the adapter to conform with the LLM.

Paper
Add Code

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

no code implementations • 9 Feb 2024 • Zhenglin Zhou, Fan Ma, Hehe Fan, Yi Yang

Specifically, we incorporate the FLAME into both 3D representation and score distillation: 1) FLAME-based 3D Gaussian splatting, driving 3D Gaussian points by rigging each point to a FLAME mesh.

Paper
Add Code

Hand-Centric Motion Refinement for 3D Hand-Object Interaction via Hierarchical Spatial-Temporal Modeling

1 code implementation • 29 Jan 2024 • Yuze Hao, Jianrong Zhang, Tao Zhuo, Fuan Wen, Hehe Fan

To address this problem, we propose a data-driven method for coarse motion refinement.

Object

Paper
Code

DocMSU: A Comprehensive Benchmark for Document-level Multimodal Sarcasm Understanding

1 code implementation • 26 Dec 2023 • Hang Du, Guoshun Nan, Sicheng Zhang, Binzhu Xie, Junrui Xu, Hehe Fan, Qimei Cui, Xiaofeng Tao, Xudong Jiang

Multimodal Sarcasm Understanding (MSU) has a wide range of applications in the news field such as public opinion analysis and forgery detection.

Object Detection Sarcasm Detection +1

Paper
Code

Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation

no code implementations • 6 Dec 2023 • Xiaobo Hu, Youfang Lin, Hehe Fan, Shuo Wang, Zhihao Wu, Kai Lv

To this end, an agent needs to 1) learn a piece of certain knowledge about the relations of object categories in the world during training and 2) look for the target object based on the pre-learned object category relations and its moving trajectory in the current unseen environment.

Object Visual Navigation

Paper
Add Code

A Reliable Representation with Bidirectional Transition Model for Visual Reinforcement Learning Generalization

no code implementations • 4 Dec 2023 • Xiaobo Hu, Youfang Lin, Yue Liu, Jinwen Wang, Shuo Wang, Hehe Fan, Kai Lv

Visual reinforcement learning has proven effective in solving control tasks with high-dimensional observations.

Paper
Add Code

FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax

no code implementations • 27 Nov 2023 • Yu Lu, Linchao Zhu, Hehe Fan, Yi Yang

Text-to-video (T2V) generation is a rapidly growing research area that aims to translate the scenes, objects, and actions within complex video text into a sequence of coherent visual frames.

Video Generation

Paper
Add Code

Prior-Free Continual Learning with Unlabeled Data in the Wild

1 code implementation • 16 Oct 2023 • Tao Zhuo, Zhiyong Cheng, Hehe Fan, Mohan Kankanhalli

Existing CL methods usually reduce forgetting with task priors, \ie using task identity or a subset of previously seen samples for model training.

Continual Learning Image Classification

Paper
Code

Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos

no code implementations • ICCV 2023 • Xiaoxiao Sheng, Zhiqiang Shen, Gang Xiao, Longguang Wang, Yulan Guo, Hehe Fan

Instead of contrasting the representations of clips or frames, in this paper, we propose a unified self-supervised framework by conducting contrastive learning at the point level.

Contrastive Learning Representation Learning +1

Paper
Add Code

Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos

1 code implementation • ICCV 2023 • Zhiqiang Shen, Xiaoxiao Sheng, Hehe Fan, Longguang Wang, Yulan Guo, Qiong Liu, Hao Wen, Xi Zhou

In this paper, we propose a Masked Spatio-Temporal Structure Prediction (MaST-Pre) method to capture the structure of point cloud videos without human annotations.

point cloud video understanding Self-Supervised Learning +1

Paper
Code

DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation

no code implementations • 31 Jul 2023 • Yue Zhang, Hehe Fan, Yi Yang, Mohan Kankanhalli

The proposed method, named Mixture of Depth and Point cloud video experts (DPMix), achieved the first place in the 4D Action Segmentation Track of the HOI4D Challenge 2023.

Action Segmentation Human-Object Interaction Detection +2

Paper
Add Code

Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering

no code implementations • 25 Jul 2023 • Yi Cheng, Hehe Fan, Dongyun Lin, Ying Sun, Mohan Kankanhalli, Joo-Hwee Lim

The main challenge in video question answering (VideoQA) is to capture and understand the complex spatial and temporal relations between objects based on given questions.

graph construction Question Answering +2

Paper
Add Code

A Study on Differentiable Logic and LLMs for EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2023

no code implementations • 13 Jul 2023 • Yi Cheng, Ziwei Xu, Fen Fang, Dongyun Lin, Hehe Fan, Yongkang Wong, Ying Sun, Mohan Kankanhalli

Our research focuses on the innovative application of a differentiable logic loss in the training to leverage the co-occurrence relations between verb and noun, as well as the pre-trained Large Language Models (LLMs) to generate the logic rules for the adaptation to unseen action labels.

Action Recognition Unsupervised Domain Adaptation

Paper
Add Code

Continual Learning with Strong Experience Replay

1 code implementation • 23 May 2023 • Tao Zhuo, Zhiyong Cheng, Zan Gao, Hehe Fan, Mohan Kankanhalli

Experience Replay (ER) is a simple and effective rehearsal-based strategy, which optimizes the model with current training data and a subset of old samples stored in a memory buffer.

Continual Learning Image Classification

Paper
Code

Text to Point Cloud Localization with Relation-Enhanced Transformer

no code implementations • 13 Jan 2023 • Guangzhi Wang, Hehe Fan, Mohan Kankanhalli

To overcome these two challenges, we propose a unified Relation-Enhanced Transformer (RET) to improve representation discriminability for both point cloud and natural language queries.

Natural Language Queries Relation

Paper
Add Code

STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition

no code implementations • ICCV 2023 • Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan

For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i. e., sparsification and anonymization, to remove privacy from a spatio-temporal perspective.

Action Recognition Facial Expression Recognition (FER) +2

Paper
Add Code

PointListNet: Deep Learning on 3D Point Lists

no code implementations • CVPR 2023 • Hehe Fan, Linchao Zhu, Yi Yang, Mohan Kankanhalli

Deep neural networks on regular 1D lists (e. g., natural languages) and irregular 3D sets (e. g., point clouds) have made tremendous achievements.

Paper
Add Code

Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?

2 code implementations • 15 Sep 2022 • Yi Wang, Zhiwen Fan, Tianlong Chen, Hehe Fan, Zhangyang Wang

Vision Transformers (ViTs) have proven to be effective, in solving 2D image understanding tasks by training over large-scale image datasets; and meanwhile as a somehow separate track, in modeling the 3D visual world too such as voxels or point clouds.

Point Cloud Segmentation

Paper
Code

SEFormer: Structure Embedding Transformer for 3D Object Detection

no code implementations • 5 Sep 2022 • Xiaoyu Feng, Heming Du, Yueqi Duan, Yongpan Liu, Hehe Fan

Effectively preserving and encoding structure features from objects in irregular and sparse LiDAR points is a key challenge to 3D object detection on point cloud.

3D Object Detection Autonomous Driving +2

Paper
Add Code

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

1 code implementation • ICLR 2021 • Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli

Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.

Ranked #3 on 3D Action Recognition on NTU RGB+D

3D Action Recognition Semantic Segmentation

Paper
Code

Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation With Reliable Voted Pseudo Labels

no code implementations • CVPR 2022 • Hehe Fan, Xiaojun Chang, Wanyue Zhang, Yi Cheng, Ying Sun, Mohan Kankanhalli

In this paper, we propose an unsupervised domain adaptation method for deep point cloud representation learning.

Representation Learning Unsupervised Domain Adaptation

Paper
Add Code

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

1 code implementation • CVPR 2021 • Hehe Fan, Yi Yang, Mohan Kankanhalli

To capture the dynamics in point cloud videos, point tracking is usually employed.

Ranked #4 on 3D Action Recognition on NTU RGB+D

3D Action Recognition Point Tracking +1

158

Paper
Code

PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing

2 code implementations • 18 Oct 2019 • Hehe Fan, Yi Yang

We apply PointRNN, PointGRU and PointLSTM to moving point cloud prediction, which aims to predict the future trajectories of points in a set given their history movements.

Moving Point Cloud Processing

141

Paper
Code

Cascaded Revision Network for Novel Object Captioning

1 code implementation • 6 Aug 2019 • Qianyu Feng, Yu Wu, Hehe Fan, Chenggang Yan, Yi Yang

By this novel cascaded captioning-revising mechanism, CRN can accurately describe images with unseen objects.

Image Captioning Object +3

Paper
Code

Attract or Distract: Exploit the Margin of Open Set

1 code implementation • ICCV 2019 • Qianyu Feng, Guoliang Kang, Hehe Fan, Yi Yang

In this paper, we exploit the semantic structure of open set data from two aspects: 1) Semantic Categorical Alignment, which aims to achieve good separability of target known classes by categorically aligning the centroid of target with the source.

Domain Adaptation

Paper
Code

Adaptive Exploration for Unsupervised Person Re-Identification

1 code implementation • 9 Jul 2019 • Yuhang Ding, Hehe Fan, Mingliang Xu, Yi Yang

However, a problem of the adaptive selection is that, when an image has too many neighborhoods, it is more likely to attract other images as its neighborhoods.

Unsupervised Person Re-Identification

Paper
Code

Cubic LSTMs for Video Prediction

no code implementations • 20 Apr 2019 • Hehe Fan, Linchao Zhu, Yi Yang

Predicting future frames in videos has become a promising direction of research for both computer vision and robot learning communities.

motion prediction Video Prediction

Paper
Add Code

Complex Event Detection by Identifying Reliable Shots From Untrimmed Videos

no code implementations • ICCV 2017 • Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, Alexander G. Hauptmann

relevant) to the given event class, we formulate this task as a multi-instance learning (MIL) problem by taking each video as a bag and the video shots in each video as instances.

Event Detection

Paper
Add Code

Unsupervised Person Re-identification: Clustering and Fine-tuning

1 code implementation • 30 May 2017 • Hehe Fan, Liang Zheng, Yi Yang

Progressively, pedestrian clustering and the CNN model are improved simultaneously until algorithm convergence.

Ranked #12 on Unsupervised Person Re-Identification on DukeMTMC-reID

Clustering Unsupervised Person Re-Identification

218

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.