Search Results for author: Zhidong Deng

Found 18 papers, 6 papers with code

Unifying 3D Vision-Language Understanding via Promptable Queries

no code implementations • 19 May 2024 • Ziyu Zhu, Zhuofan Zhang, Xiaojian Ma, Xuesong Niu, Yixin Chen, Baoxiong Jia, Zhidong Deng, Siyuan Huang, Qing Li

A unified model for 3D vision-language (3D-VL) understanding is expected to take various scene representations and perform a wide range of tasks in a 3D scene.

Decoder Information Retrieval +2

Paper
Add Code

Improving Detection in Aerial Images by Capturing Inter-Object Relationships

no code implementations • 5 Apr 2024 • Botao Ren, Botian Xu, Yifan Pu, Jingyi Wang, Zhidong Deng

In many image domains, the spatial distribution of objects in a scene exhibits meaningful patterns governed by their semantic relationships.

Paper
Add Code

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

no code implementations • 15 Dec 2023 • Yifeng Ma, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yingya Zhang, Zhidong Deng

In this work, we propose a DreamTalk framework to fulfill this gap, which employs meticulous design to unlock the potential of diffusion models in generating expressive talking heads.

Denoising Talking Head Generation

Paper
Add Code

Feedback RoI Features Improve Aerial Object Detection

no code implementations • 28 Nov 2023 • Botao Ren, Botian Xu, Tengyu Liu, Jingyi Wang, Zhidong Deng

Neuroscience studies have shown that the human visual system utilizes high-level feedback information to guide lower-level perception, enabling adaptation to signals of different characteristics.

feature selection Object +2

Paper
Add Code

3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment

1 code implementation • ICCV 2023 • Ziyu Zhu, Xiaojian Ma, Yixin Chen, Zhidong Deng, Siyuan Huang, Qing Li

3D vision-language grounding (3D-VL) is an emerging field that aims to connect the 3D physical world with natural language, which is crucial for achieving embodied intelligence.

Dense Captioning Question Answering +3

163

Paper
Code

Improving Scene Graph Generation with Superpixel-Based Interaction Learning

no code implementations • 4 Aug 2023 • Jingyi Wang, Can Zhang, Jinfa Huang, Botao Ren, Zhidong Deng

(ii) We explore intra-entity and cross-entity interactions among the superpixels to enrich fine-grained interactions between entities at an earlier stage.

Graph Generation Scene Graph Generation +1

Paper
Add Code

Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs

no code implementations • 19 May 2023 • IokTong Lei, Zhidong Deng

As a way of communicating with users and any LLMs like GPT or PaLM2, prompting becomes an increasingly important research topic for better utilization of LLMs.

Arithmetic Reasoning GSM8K +4

Paper
Add Code

Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

1 code implementation • 15 May 2023 • Jingyi Wang, Jinfa Huang, Can Zhang, Zhidong Deng

In this paper, we propose a Time-variant Relation-aware TRansformer (TR$^2$), which aims to model the temporal change of relations in dynamic scene graphs.

Relation Scene Graph Generation +1

Paper
Code

TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles

no code implementations • 1 Apr 2023 • Yifeng Ma, Suzhen Wang, Yu Ding, Bowen Ma, Tangjie Lv, Changjie Fan, Zhipeng Hu, Zhidong Deng, Xin Yu

In this work, we propose an expression-controllable one-shot talking head method, dubbed TalkCLIP, where the expression in a speech is specified by the natural language.

2D Semantic Segmentation task 3 (25 classes) Talking Head Generation

Paper
Add Code

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

1 code implementation • 3 Jan 2023 • Yifeng Ma, Suzhen Wang, Zhipeng Hu, Changjie Fan, Tangjie Lv, Yu Ding, Zhidong Deng, Xin Yu

In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio.

Decoder Talking Face Generation +1

472

Paper
Code

DuMLP-Pin: A Dual-MLP-dot-product Permutation-invariant Network for Set Feature Extraction

1 code implementation • 8 Mar 2022 • Jiajun Fei, Ziyu Zhu, Wenlei Liu, Zhidong Deng, Mingyang Li, Huanjun Deng, Shuo Zhang

We strictly prove that any permutation-invariant function implemented by DuMLP-Pin can be decomposed into two or more permutation-equivariant ones in a dot-product way as the cardinality of the given input set is greater than a threshold.

Attribute Point Cloud Classification

Paper
Code

Phase Space Reconstruction Network for Lane Intrusion Action Recognition

no code implementations • 22 Feb 2021 • Ruiwen Zhang, Zhidong Deng, Hongsen Lin, Hongchao Lu

In a complex road traffic scene, illegal lane intrusion of pedestrians or cyclists constitutes one of the main safety challenges in autonomous driving application.

Action Recognition Autonomous Driving +5

Paper
Add Code

A Deep Graph Wavelet Convolutional Neural Network for Semi-supervised Node Classification

1 code implementation • 19 Feb 2021 • Jingyi Wang, Zhidong Deng

Graph convolutional neural network provides good solutions for node classification and other tasks with non-Euclidean data.

General Classification Node Classification

Paper
Code

DETR for Crowd Pedestrian Detection

1 code implementation • 12 Dec 2020 • Matthieu Lin, Chuming Li, Xingyuan Bu, Ming Sun, Chen Lin, Junjie Yan, Wanli Ouyang, Zhidong Deng

Furthermore, the bipartite match of ED harms the training efficiency due to the large ground truth number in crowd scenes.

Decoder Pedestrian Detection

Paper
Code

Fast Object Detection in Compressed Video

no code implementations • ICCV 2019 • Shiyao Wang, Hongchao Lu, Zhidong Deng

To our best knowledge, the MMNet is the first work that investigates a deep convolutional detector on compressed videos.

Object object-detection +2

Paper
Add Code

Recent progress in semantic image segmentation

no code implementations • 20 Sep 2018 • Xiaolong Liu, Zhidong Deng, Yuhan Yang

In this paper, we divide semantic image segmentation methods into two categories: traditional and recent DNN method.

Image Segmentation Segmentation +1

Paper
Add Code

Fully Motion-Aware Network for Video Object Detection

no code implementations • ECCV 2018 • Shiyao Wang, Yucong Zhou, Junjie Yan, Zhidong Deng

Video objection detection is challenging in the presence of appearance deterioration in certain video frames.

Object object-detection +1

Paper
Add Code

SegStereo: Exploiting Semantic Information for Disparity Estimation

no code implementations • ECCV 2018 • Guorun Yang, Hengshuang Zhao, Jianping Shi, Zhidong Deng, Jiaya Jia

Disparity estimation for binocular stereo images finds a wide range of applications.

Ranked #6 on Semantic Segmentation on KITTI Semantic Segmentation

Disparity Estimation Semantic Segmentation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.