no code implementations • 18 Mar 2024 • Jiaxiang Tang, Ruijie Lu, Xiaokang Chen, Xiang Wen, Gang Zeng, Ziwei Liu
Text-to-texture synthesis has become a new frontier in 3D content creation thanks to the recent advances in text-to-image models.
1 code implementation • 7 Feb 2024 • Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, Ziwei Liu
2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models.
1 code implementation • 25 May 2023 • Yan Liu, Yan Gao, Zhe Su, Xiaokang Chen, Elliott Ash, Jian-Guang Lou
In this work, we aim to uncover and categorize social biases in Text-to-SQL models.
no code implementations • 25 May 2023 • Xiaokang Chen, Jiaxiang Tang, Diwen Wan, Jingbo Wang, Gang Zeng
We propose to imitate the backbone feature of off-the-shelf perception models to achieve zero-shot semantic segmentation with NeRF.
2 code implementations • NeurIPS 2023 • Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie zhou, Yu Qiao, Jifeng Dai
We hope this model can set a new baseline for generalist vision and language models.
no code implementations • 20 Mar 2023 • Xiaokang Chen, Yajie Xing, Gang Zeng
In this paper, we propose a real-time semantic scene completion method with a feature aggregation strategy and conditioned prediction module.
1 code implementation • ICCV 2023 • Jiaxiang Tang, Hang Zhou, Xiaokang Chen, Tianshu Hu, Errui Ding, Jingdong Wang, Gang Zeng
Neural Radiance Fields (NeRF) have constituted a remarkable breakthrough in image-based 3D reconstruction.
no code implementations • 21 Feb 2023 • Yan Liu, Xiaokang Chen, Qi Dai
However, current works pursuing sentence-level explanations rely heavily on annotated training data, which limits the development of interpretability to only a few tasks.
1 code implementation • 27 Jan 2023 • Jie Zhu, Jiyang Qi, Mingyu Ding, Xiaokang Chen, Ping Luo, Xinggang Wang, Wenyu Liu, Leye Wang, Jingdong Wang
The study is mainly motivated by that random views, used in contrastive learning, and random masked (visible) patches, used in masked image modeling, are often about object parts.
1 code implementation • 22 Nov 2022 • Jiaxiang Tang, Kaisiyuan Wang, Hang Zhou, Xiaokang Chen, Dongliang He, Tianshu Hu, Jingtuo Liu, Gang Zeng, Jingdong Wang
While dynamic Neural Radiance Fields (NeRF) have shown success in high-fidelity 3D modeling of talking portraits, the slow training and inference speed severely obstruct their potential usage.
no code implementations • 17 Nov 2022 • Xinyu Zhang, Jiahui Chen, Junkun Yuan, Qiang Chen, Jian Wang, Xiaodi Wang, Shumin Han, Xiaokang Chen, Jimin Pi, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang
That is to say, the smaller the model, the lower the mask ratio needs to be.
no code implementations • 17 Nov 2022 • Xiaokang Chen, Jiahui Chen, Yan Liu, Gang Zeng
Specifically, Adaptive Matching applies bipartite matching to adaptively match the outputs of the teacher and the student in each decoder layer, while Fixed Matching fixes the correspondence between the outputs of the teacher and the student with the same object queries, with the teacher's fixed object queries fed to the decoder of the student as an auxiliary group.
no code implementations • arXiv 2022 • Qiang Chen, Jian Wang, Chuchu Han, Shan Zhang, Zexian Li, Xiaokang Chen, Jiahui Chen, Xiaodi Wang, Shuming Han, Gang Zhang, Haocheng Feng, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang
The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO.
Ranked #8 on Object Detection on COCO test-dev
2 code implementations • ICCV 2023 • Qiang Chen, Xiaokang Chen, Jian Wang, Shan Zhang, Kun Yao, Haocheng Feng, Junyu Han, Errui Ding, Gang Zeng, Jingdong Wang
Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth object to one prediction, for end-to-end detection without NMS post-processing.
no code implementations • 18 Jul 2022 • Xiaokang Chen, Fangyun Wei, Gang Zeng, Jingdong Wang
Inspired by Conditional DETR, an improved DETR with fast training convergence, that presented box queries (originally called spatial queries) for internal decoder layers, we reformulate the object query into the format of the box query that is a composition of the embeddings of the reference point and the transformation of the box with respect to the reference point.
2 code implementations • 30 May 2022 • Jiaxiang Tang, Xiaokang Chen, Jingbo Wang, Gang Zeng
To circumvent the hurdle, in this paper, we present an explicit neural field representation that enables efficient and convenient manipulation of models.
1 code implementation • 31 Mar 2022 • Jiaxiang Tang, Xiaokang Chen, Jingbo Wang, Gang Zeng
Semantic scene reconstruction from point cloud is an essential and challenging task for 3D scene understanding.
no code implementations • 28 Mar 2022 • Min Zhong, Xinghao Chen, Xiaokang Chen, Gang Zeng, Yunhe Wang
For instance, our approach achieves a 66. 4\% mAP with the 0. 5 IoU threshold on the ScanNetV2 test set, which is 1. 9\% higher than the state-of-the-art method.
Ranked #6 on 3D Instance Segmentation on S3DIS
6 code implementations • 7 Feb 2022 • Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang
The pretraining tasks include two tasks: masked representation prediction - predict the representations for the masked patches, and masked patch reconstruction - reconstruct the masked patches.
no code implementations • 24 Dec 2021 • Xiaokang Chen, Jiaxiang Tang, Jingbo Wang, Gang Zeng
Firstly, we transfer the voxelized scenes to point clouds by removing these visible empty voxels and adopt a deep point stream to capture semantic information from the scene efficiently.
Ranked #4 on 3D Semantic Scene Completion on NYUv2
3 code implementations • ICCV 2021 • Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang
Our approach, named conditional DETR, learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention.
1 code implementation • 19 Jul 2021 • Jiaxiang Tang, Xiaokang Chen, Gang Zeng
Inspired by the recent progress in implicit neural representation, we propose to formulate the guided super-resolution as a neural implicit image interpolation problem, where we take the form of a general image interpolation but use a novel Joint Implicit Image Function (JIIF) representation to learn both the interpolation weights and values.
3 code implementations • CVPR 2021 • Xiaokang Chen, Yuhui Yuan, Gang Zeng, Jingdong Wang
Our approach imposes the consistency on two segmentation networks perturbed with different initialization for the same input image.
Ranked #2 on Semi-Supervised Semantic Segmentation on WoodScape
2 code implementations • ECCV 2020 • Xiaokang Chen, Kwan-Yee Lin, Jingbo Wang, Wayne Wu, Chen Qian, Hongsheng Li, Gang Zeng
Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing a geometric counterpart to the RGB representation.
2 code implementations • CVPR 2020 • Xiaokang Chen, Kwan-Yee Lin, Chen Qian, Gang Zeng, Hongsheng Li
To this end, we first propose a novel 3D sketch-aware feature embedding to explicitly encode geometric information effectively and efficiently.
3D Semantic Scene Completion from a single RGB image Hallucination
11 code implementations • ECCV 2020 • Yuhui Yuan, Xiaokang Chen, Xilin Chen, Jingdong Wang
We empirically demonstrate that the proposed approach achieves competitive performance on various challenging semantic segmentation benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff.
Ranked #5 on Semantic Segmentation on LIP val