no code implementations • 19 May 2024 • Ziyu Zhu, Zhuofan Zhang, Xiaojian Ma, Xuesong Niu, Yixin Chen, Baoxiong Jia, Zhidong Deng, Siyuan Huang, Qing Li
A unified model for 3D vision-language (3D-VL) understanding is expected to take various scene representations and perform a wide range of tasks in a 3D scene.
no code implementations • 5 Apr 2024 • Botao Ren, Botian Xu, Yifan Pu, Jingyi Wang, Zhidong Deng
In many image domains, the spatial distribution of objects in a scene exhibits meaningful patterns governed by their semantic relationships.
no code implementations • 15 Dec 2023 • Yifeng Ma, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yingya Zhang, Zhidong Deng
In this work, we propose a DreamTalk framework to fulfill this gap, which employs meticulous design to unlock the potential of diffusion models in generating expressive talking heads.
no code implementations • 28 Nov 2023 • Botao Ren, Botian Xu, Tengyu Liu, Jingyi Wang, Zhidong Deng
Neuroscience studies have shown that the human visual system utilizes high-level feedback information to guide lower-level perception, enabling adaptation to signals of different characteristics.
1 code implementation • ICCV 2023 • Ziyu Zhu, Xiaojian Ma, Yixin Chen, Zhidong Deng, Siyuan Huang, Qing Li
3D vision-language grounding (3D-VL) is an emerging field that aims to connect the 3D physical world with natural language, which is crucial for achieving embodied intelligence.
no code implementations • 4 Aug 2023 • Jingyi Wang, Can Zhang, Jinfa Huang, Botao Ren, Zhidong Deng
(ii) We explore intra-entity and cross-entity interactions among the superpixels to enrich fine-grained interactions between entities at an earlier stage.
no code implementations • 19 May 2023 • IokTong Lei, Zhidong Deng
As a way of communicating with users and any LLMs like GPT or PaLM2, prompting becomes an increasingly important research topic for better utilization of LLMs.
1 code implementation • 15 May 2023 • Jingyi Wang, Jinfa Huang, Can Zhang, Zhidong Deng
In this paper, we propose a Time-variant Relation-aware TRansformer (TR$^2$), which aims to model the temporal change of relations in dynamic scene graphs.
no code implementations • 1 Apr 2023 • Yifeng Ma, Suzhen Wang, Yu Ding, Bowen Ma, Tangjie Lv, Changjie Fan, Zhipeng Hu, Zhidong Deng, Xin Yu
In this work, we propose an expression-controllable one-shot talking head method, dubbed TalkCLIP, where the expression in a speech is specified by the natural language.
2D Semantic Segmentation task 3 (25 classes) Talking Head Generation
1 code implementation • 3 Jan 2023 • Yifeng Ma, Suzhen Wang, Zhipeng Hu, Changjie Fan, Tangjie Lv, Yu Ding, Zhidong Deng, Xin Yu
In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio.
1 code implementation • 8 Mar 2022 • Jiajun Fei, Ziyu Zhu, Wenlei Liu, Zhidong Deng, Mingyang Li, Huanjun Deng, Shuo Zhang
We strictly prove that any permutation-invariant function implemented by DuMLP-Pin can be decomposed into two or more permutation-equivariant ones in a dot-product way as the cardinality of the given input set is greater than a threshold.
no code implementations • 22 Feb 2021 • Ruiwen Zhang, Zhidong Deng, Hongsen Lin, Hongchao Lu
In a complex road traffic scene, illegal lane intrusion of pedestrians or cyclists constitutes one of the main safety challenges in autonomous driving application.
1 code implementation • 19 Feb 2021 • Jingyi Wang, Zhidong Deng
Graph convolutional neural network provides good solutions for node classification and other tasks with non-Euclidean data.
1 code implementation • 12 Dec 2020 • Matthieu Lin, Chuming Li, Xingyuan Bu, Ming Sun, Chen Lin, Junjie Yan, Wanli Ouyang, Zhidong Deng
Furthermore, the bipartite match of ED harms the training efficiency due to the large ground truth number in crowd scenes.
no code implementations • ICCV 2019 • Shiyao Wang, Hongchao Lu, Zhidong Deng
To our best knowledge, the MMNet is the first work that investigates a deep convolutional detector on compressed videos.
no code implementations • 20 Sep 2018 • Xiaolong Liu, Zhidong Deng, Yuhan Yang
In this paper, we divide semantic image segmentation methods into two categories: traditional and recent DNN method.
no code implementations • ECCV 2018 • Shiyao Wang, Yucong Zhou, Junjie Yan, Zhidong Deng
Video objection detection is challenging in the presence of appearance deterioration in certain video frames.
no code implementations • ECCV 2018 • Guorun Yang, Hengshuang Zhao, Jianping Shi, Zhidong Deng, Jiaya Jia
Disparity estimation for binocular stereo images finds a wide range of applications.
Ranked #6 on Semantic Segmentation on KITTI Semantic Segmentation