Search Results for author: Wenhao Chai

Found 26 papers, 10 papers with code

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

1 code implementation • 26 Apr 2024 • Enxin Song, Wenhao Chai, Tian Ye, Jenq-Neng Hwang, Xi Li, Gaoang Wang

Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.

Ranked #2 on Question Answering on NExT-QA (Open-ended VideoQA)

2k Question Answering +2

386

Paper
Code

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

no code implementations • 7 Apr 2024 • Hou-I Liu, Christine Wu, Jen-Hao Cheng, Wenhao Chai, Shian-Yun Wang, Gaowen Liu, Jenq-Neng Hwang, Hong-Han Shuai, Wen-Huang Cheng

Subsequently, we introduce the cross-modal residual distillation to transfer the 3D spatial cues.

Autonomous Driving Knowledge Distillation +3

Paper
Add Code

Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model

no code implementations • 6 Apr 2024 • Zhonghan Zhao, Ke Ma, Wenhao Chai, Xuan Wang, Kewei Chen, Dongxu Guo, Yanting Zhang, Hongwei Wang, Gaoang Wang

After distillation, embodied agents can complete complex, open-ended tasks without additional expert guidance, utilizing the performance and knowledge of a versatile MLM.

Knowledge Distillation

Paper
Add Code

VersaT2I: Improving Text-to-Image Models with Versatile Reward

no code implementations • 27 Mar 2024 • Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang

Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance.

Paper
Add Code

Exploring Learning-based Motion Models in Multi-Object Tracking

no code implementations • 16 Mar 2024 • Hsiang-Wei Huang, Cheng-Yen Yang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang

In the field of multi-object tracking (MOT), traditional methods often rely on the Kalman Filter for motion prediction, leveraging its strengths in linear motion scenarios.

motion prediction Multi-Object Tracking

Paper
Add Code

Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation

no code implementations • 13 Mar 2024 • Zhonghan Zhao, Kewei Chen, Dongxu Guo, Wenhao Chai, Tian Ye, Yanting Zhang, Gaoang Wang

To assess organizational behavior, we design a series of navigation tasks in the Minecraft environment, which includes searching and exploring.

Navigate

Paper
Add Code

User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning

no code implementations • 8 Dec 2023 • Xuan Wang, Guanhong Wang, Wenhao Chai, Jiayu Zhou, Gaoang Wang

Moreover, we employ GPT-2 as the frozen large language model.

Image Captioning Language Modelling +1

Paper
Add Code

CityGen: Infinite and Controllable 3D City Layout Generation

no code implementations • 3 Dec 2023 • Jie Deng, Wenhao Chai, Jianshu Guo, Qixuan Huang, Wenhao Hu, Jenq-Neng Hwang, Gaoang Wang

In this paper, we propose CityGen, a novel end-to-end framework for infinite, diverse and controllable 3D city layout generation. First, we propose an outpainting pipeline to extend the local layout to an infinite city layout.

Paper
Add Code

See and Think: Embodied Agent in Virtual Environment

no code implementations • 26 Nov 2023 • Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang

Vision perception involves the interpretation of visual information in the environment, which is then integrated into the LLMs component with agent state and task instruction.

Question Answering Retrieval

Paper
Add Code

UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning

no code implementations • 24 Nov 2023 • Zhongyu Jiang, Wenhao Chai, Lei LI, Zhuoran Zhou, Cheng-Yen Yang, Jenq-Neng Hwang

In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i. e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline.

2D Human Pose Estimation 3D Human Pose Estimation +3

Paper
Add Code

Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation

1 code implementation • 17 Nov 2023 • Zhuoran Zhou, Zhongyu Jiang, Wenhao Chai, Cheng-Yen Yang, Lei LI, Jenq-Neng Hwang

We further apply a guided diffusion model to domain adapt 3D adult pose to infant pose to supplement small datasets.

3D Human Pose Estimation Data Augmentation +1

Paper
Code

Devil in the Number: Towards Robust Multi-modality Data Filter

no code implementations • 24 Sep 2023 • Yichen Xu, Zihan Xu, Wenhao Chai, Zhonghan Zhao, Enxin Song, Gaoang Wang

In order to appropriately filter multi-modality data sets on a web-scale, it becomes crucial to employ suitable filtering methods to boost performance and reduce training costs.

Paper
Add Code

Chasing Consistency in Text-to-3D Generation from a Single Image

no code implementations • 7 Sep 2023 • Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang Wang

In light of the above issues, we present Consist3D, a three-stage framework Chasing for semantic-, geometric-, and saturation-Consistent Text-to-3D generation from a single image, in which the first two stages aim to learn parameterized consistency tokens, and the last stage is for optimization.

3D Generation Text to 3D

Paper
Add Code

UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning

no code implementations • 19 Aug 2023 • Meiqi Sun, Zhonghan Zhao, Wenhao Chai, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang

Our proposed model takes support images and labels as prompt guidance for a query image.

Few-Shot Learning Pose Estimation

Paper
Add Code

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

1 code implementation • 18 Aug 2023 • Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie

Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain.

3D Human Pose Estimation Domain Adaptation

Paper
Code

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

1 code implementation • ICCV 2023 • Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu

In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects.

Video Editing

1,328

Paper
Code

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

1 code implementation • 31 Jul 2023 • Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang

Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.

Ranked #1 on zero-shot long video global-mode question answering on MovieChat-1K

Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +10

386

Paper
Code

Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

1 code implementation • 7 Jul 2023 • Zhongyu Jiang, Zhuoran Zhou, Lei LI, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang

Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods.

Ranked #10 on 3D Human Pose Estimation on 3DPW (PA-MPJPE metric)

3D Human Pose Estimation Image to 3D

Paper
Code

A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

no code implementations • 7 Jul 2023 • Zhonghan Zhao, Wenhao Chai, Shengyu Hao, Wenhao Hu, Guanhong Wang, Shidong Cao, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

Deep learning has the potential to revolutionize sports performance, with applications ranging from perception and comprehension to decision.

Paper
Add Code

MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling

no code implementations • 29 Jun 2023 • Zhenyu Zhang, Wenhao Chai, Zhongyu Jiang, Tian Ye, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

In this paper, we propose MPM, a unified 2D-3D human pose representation framework via masked pose modeling.

3D Human Pose Estimation 3D Pose Estimation

Paper
Add Code

Five A$^{+}$ Network: You Only Need 9K Parameters for Underwater Image Enhancement

1 code implementation • 15 May 2023 • Jingxia Jiang, Tian Ye, Jinbin Bai, Sixiang Chen, Wenhao Chai, Shi Jun, Yun Liu, ErKang Chen

In this work, we propose the Five A$^{+}$ Network (FA$^{+}$Net), a highly efficient and lightweight real-time underwater image enhancement network with only $\sim$ 9k parameters and $\sim$ 0. 01s processing time.

Computational Efficiency Image Enhancement

Paper
Code

Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation

1 code implementation • ICCV 2023 • Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang

We observe that the degradation is caused by two factors: 1) the large distribution gap over global positions of poses between the source and target datasets due to variant camera parameters and settings, and 2) the deficient diversity of local structures of poses in training.

Ranked #1 on 3D Human Pose Estimation in Limited Data on Human3.6M

3D Human Pose Estimation 3D Human Pose Estimation in Limited Data +3

Paper
Code

Blind Inpainting with Object-aware Discrimination for Artificial Marker Removal

no code implementations • 27 Mar 2023 • Xuechen Guo, Wenhao Hu, Chiming Ni, Wenhao Chai, Shiyan Li, Gaoang Wang

The reconstruction network consists of two branches that predict the corrupted regions with artificial markers and simultaneously recover the missing visual contents.

Object

Paper
Add Code

Deep Learning Methods for Small Molecule Drug Discovery: A Survey

no code implementations • 1 Mar 2023 • Wenhao Hu, Yingying Liu, Xuanyu Chen, Wenhao Chai, Hangyue Chen, Hongwei Wang, Gaoang Wang

With the development of computer-assisted techniques, research communities including biochemistry and deep learning have been devoted into the drug discovery field for over a decade.

Drug Discovery Molecular Property Prediction +2

Paper
Add Code

DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models

1 code implementation • 14 Feb 2023 • Shidong Cao, Wenhao Chai, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang

We focus on a new fashion design task, where we aim to transfer a reference appearance image onto a clothing image while preserving the structure of the clothing image.

Denoising Style Transfer

Paper
Code

Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model

1 code implementation • 23 Sep 2022 • Zhenting Qi, Ruike Zhu, Zheyu Fu, Wenhao Chai, Volodymyr Kindratenko

In this paper, we propose a simple but effective method that solves the task from a new perspective: we design the fight detection model as a composition of an action-aware feature extractor and an anomaly score generator.

Action Recognition Anomaly Detection

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.