Search Results for author: Yandong Guo

Found 68 papers, 23 papers with code

Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision

1 code implementation14 Feb 2024 Zhaoqing Wang, Xiaobo Xia, Ziye Chen, Xiao He, Yandong Guo, Mingming Gong, Tongliang Liu

With this unpaired mask-text supervision, we propose a new weakly-supervised open-vocabulary segmentation framework (Uni-OVSeg) that leverages confident pairs of mask predictions and entities in text descriptions.

Language Modelling

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding

no code implementations21 Dec 2023 Senqiao Yang, Jiaming Liu, Ray Zhang, Mingjie Pan, Zoey Guo, Xiaoqi Li, Zehui Chen, Peng Gao, Yandong Guo, Shanghang Zhang

In this paper, we introduce LiDAR-LLM, which takes raw LiDAR data as input and harnesses the remarkable reasoning capabilities of LLMs to gain a comprehensive understanding of outdoor 3D scenes.

Instruction Following Language Modelling +1

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

no code implementations19 Dec 2023 Jiaming Liu, ran Xu, Senqiao Yang, Renrui Zhang, Qizhe Zhang, Zehui Chen, Yandong Guo, Shanghang Zhang

To tackle these issues, we propose a continual self-supervised method, Adaptive Distribution Masked Autoencoders (ADMA), which enhances the extraction of target domain knowledge while mitigating the accumulation of distribution shifts.

Self-Supervised Learning Test-time Adaptation

Seeing through the Mask: Multi-task Generative Mask Decoupling Face Recognition

no code implementations20 Nov 2023 Zhaohui Wang, Sufang Zhang, Jianteng Peng, Xinyi Wang, Yandong Guo

Therefore, this paper proposes a Multi-task gEnerative mask dEcoupling face Recognition (MEER) network to jointly handle these two tasks, which can learn occlusionirrelevant and identity-related representation while achieving unmasked face synthesis.

Face Generation Face Recognition

NOC: High-Quality Neural Object Cloning with 3D Lifting of Segment Anything

no code implementations22 Sep 2023 Xiaobao Wei, Renrui Zhang, Jiarui Wu, Jiaming Liu, Ming Lu, Yandong Guo, Shanghang Zhang

Firstly, to separate the target object from the scene, we propose a novel strategy to lift the multi-view 2D segmentation masks of SAM into a unified 3D variation field.

3D Object Reconstruction Object

AdvFAS: A robust face anti-spoofing framework against adversarial examples

no code implementations4 Aug 2023 Jiawei Chen, Xiao Yang, Heng Yin, Mingzhi Ma, Bihui Chen, Jianteng Peng, Yandong Guo, Zhaoxia Yin, Hang Su

Ensuring the reliability of face recognition systems against presentation attacks necessitates the deployment of face anti-spoofing techniques.

Adversarial Defense Face Anti-Spoofing +1

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation

1 code implementation7 Jun 2023 Jiaming Liu, Senqiao Yang, Peidong Jia, Renrui Zhang, Ming Lu, Yandong Guo, Wei Xue, Shanghang Zhang

Note that, our method can be regarded as a novel transfer paradigm for large-scale models, delivering promising results in adaptation to continually changing distributions.

Test-time Adaptation

Recognize Anything: A Strong Image Tagging Model

2 code implementations6 Jun 2023 Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang

We are releasing the RAM at \url{https://recognize-anything. github. io/} to foster the advancements of large models in computer vision.

Semantic Parsing

ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds

no code implementations25 Apr 2023 Xiangze Jia, Hui Zhou, Xinge Zhu, Yandong Guo, Ji Zhang, Yuexin Ma

In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation.

Autonomous Driving Contrastive Learning +2

CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input

1 code implementation CVPR 2023 Senmao Tian, Ming Lu, Jiaming Liu, Yandong Guo, Yurong Chen, Shunli Zhang

Therefore, we design a strategy to build an Edge-to-Bit lookup table that maps the edge score of a patch to the bit of each layer during inference.

Image Super-Resolution Quantization

A Comprehensive Comparison of Projections in Omnidirectional Super-Resolution

no code implementations13 Apr 2023 Huicheng Pi, Senmao Tian, Ming Lu, Jiaming Liu, Yandong Guo, Shunli Zhang

In these works, omnidirectional frames are projected from the 3D sphere to a 2D plane by Equi-Rectangular Projection (ERP).

Super-Resolution

SGL: Structure Guidance Learning for Camera Localization

no code implementations12 Apr 2023 Xudong Zhang, Shuang Gao, Xiaohu Nan, Haikuan Ning, Yuchen Yang, Yishan Ping, Jixiang Wan, Shuzhou Dong, Jijunnan Li, Yandong Guo

Camera localization is a classical computer vision task that serves various Artificial Intelligence and Robotics applications.

Camera Localization Visual Localization

CloSET: Modeling Clothed Humans on Continuous Surface with Explicit Template Decomposition

no code implementations CVPR 2023 Hongwen Zhang, Siyou Lin, Ruizhi Shao, Yuxiang Zhang, Zerong Zheng, Han Huang, Yandong Guo, Yebin Liu

In this way, the clothing deformations are disentangled such that the pose-dependent wrinkles can be better learned and applied to unseen poses.

Box-Level Active Detection

1 code implementation CVPR 2023 Mengyao Lyu, Jundong Zhou, Hui Chen, YiJie Huang, Dongdong Yu, Yaqian Li, Yandong Guo, Yuchen Guo, Liuyu Xiang, Guiguang Ding

Active learning selects informative samples for annotation within budget, which has proven efficient recently on object detection.

Active Learning object-detection +1

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

1 code implementation CVPR 2023 Weixuan Sun, Jiayi Zhang, Jianyuan Wang, Zheyuan Liu, Yiran Zhong, Tianpeng Feng, Yandong Guo, Yanhao Zhang, Nick Barnes

Based on this observation, we propose a new learning strategy named False Negative Aware Contrastive (FNAC) to mitigate the problem of misleading the training with such false negative samples.

Contrastive Learning

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

1 code implementation CVPR 2023 Anthony Chen, Kevin Zhang, Renrui Zhang, Zihan Wang, Yuheng Lu, Yandong Guo, Shanghang Zhang

Masked Autoencoders learn strong visual representations and achieve state-of-the-art results in several independent modalities, yet very few works have addressed their capabilities in multi-modality settings.

3D Object Detection object-detection +2

Tag2Text: Guiding Vision-Language Model via Image Tagging

2 code implementations10 Mar 2023 Xinyu Huang, Youcai Zhang, Jinyu Ma, Weiwei Tian, Rui Feng, Yuejie Zhang, Yaqian Li, Yandong Guo, Lei Zhang

This paper presents Tag2Text, a vision language pre-training (VLP) framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features.

Language Modelling TAG

Neural Reconstruction of Relightable Human Model from Monocular Video

no code implementations ICCV 2023 Wenzhang Sun, Yunlong Che, Han Huang, Yandong Guo

In this paper, we introduce a novel self-supervised framework that takes a monocular video of a moving human as input and generates a 3D neural representation capable of being rendered with novel poses under arbitrary lighting conditions.

A Survey of Face Recognition

no code implementations26 Dec 2022 Xinyi Wang, Jianteng Peng, Sufang Zhang, Bihui Chen, Yi Wang, Yandong Guo

Recent years witnessed the breakthrough of face recognition with deep convolutional neural networks.

Face Recognition

BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks

no code implementations CVPR 2023 Xiaowei Chi, Jiaming Liu, Ming Lu, Rongyu Zhang, Zhaoqing Wang, Yandong Guo, Shanghang Zhang

In order to find them, we further propose a LiDAR-guided sampling strategy to leverage the statistical distribution of LiDAR to determine the heights of local slices.

3D Object Detection Autonomous Driving +1

BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection

no code implementations30 Nov 2022 Jiaming Liu, Rongyu Zhang, Xiaoqi Li, Xiaowei Chi, Zehui Chen, Ming Lu, Yandong Guo, Shanghang Zhang

In this paper, we propose a Multi-space Alignment Teacher-Student (MATS) framework to ease the domain shift accumulation, which consists of a Depth-Aware Teacher (DAT) and a Geometric-space Aligned Student (GAS) model.

3D Object Detection Autonomous Driving +2

A Real-Time Fusion Framework for Long-term Visual Localization

no code implementations18 Oct 2022 Yuchen Yang, Xudong Zhang, Shuang Gao, Jixiang Wan, Yishan Ping, Yuyue Liu, Jijunnan Li, Yandong Guo

In this paper, we present an efficient client-server visual localization architecture that fuses global and local pose estimations to realize promising precision and efficiency.

Visual Localization

SoccerNet 2022 Challenges Results

7 code implementations5 Oct 2022 Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li

The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.

Action Spotting Camera Calibration +3

CrossHuman: Learning Cross-Guidance from Multi-Frame Images for Human Reconstruction

no code implementations20 Jul 2022 Liliang Chen, Jiaqi Li, Han Huang, Yandong Guo

We propose CrossHuman, a novel method that learns cross-guidance from parametric human model and multi-frame RGB images to achieve high-quality 3D human reconstruction.

3D Human Reconstruction

Efficient Meta-Tuning for Content-aware Neural Video Delivery

1 code implementation20 Jul 2022 Xiaoqi Li, Jiaming Liu, Shizun Wang, Cheng Lyu, Ming Lu, Yurong Chen, Anbang Yao, Yandong Guo, Shanghang Zhang

Our method significantly reduces the computational cost and achieves even better performance, paving the way for applying neural video delivery techniques to practical applications.

Super-Resolution

Mixed Sample Augmentation for Online Distillation

no code implementations24 Jun 2022 Yiqing Shen, Liwu Xu, Yuzhe Yang, Yaqian Li, Yandong Guo

Mixed Sample Regularization (MSR), such as MixUp or CutMix, is a powerful data augmentation strategy to generalize convolutional neural networks.

Data Augmentation Knowledge Distillation

BANet: Motion Forecasting with Boundary Aware Network

no code implementations16 Jun 2022 Chen Zhang, Honglin Sun, Chen Chen, Yandong Guo

We propose a motion forecasting model called BANet, which means Boundary-Aware Network, and it is a variant of LaneGCN.

Motion Forecasting

Situational Perception Guided Image Matting

no code implementations20 Apr 2022 Bo Xu, Jiake Xie, Han Huang, Ziwen Li, Cheng Lu, Yong Tang, Yandong Guo

In this paper, we propose a Situational Perception Guided Image Matting (SPG-IM) method that mitigates subjective bias of matting annotations and captures sufficient situational perception information for better global saliency distilled from the visual-to-textual task.

Image Matting Object

Personalized Image Aesthetics Assessment with Rich Attributes

no code implementations CVPR 2022 Yuzhe Yang, Liwu Xu, Leida Li, Nan Qie, Yaqian Li, Peng Zhang, Yandong Guo

To solve the dilemma, we conduct so far, the most comprehensive subjective study of personalized image aesthetics and introduce a new Personalized image Aesthetics database with Rich Attributes (PARA), which consists of 31, 220 images with annotations by 438 subjects.

Structured Local Radiance Fields for Human Avatar Modeling

no code implementations CVPR 2022 Zerong Zheng, Han Huang, Tao Yu, Hongwen Zhang, Yandong Guo, Yebin Liu

These local radiance fields not only leverage the flexibility of implicit representation in shape and appearance modeling, but also factorize cloth deformations into skeleton motions, node residual translations and the dynamic detail variations inside each individual radiance field.

Adaptive Patch Exiting for Scalable Single Image Super-Resolution

1 code implementation22 Mar 2022 Shizun Wang, Jiaming Liu, Kaixin Chen, Xiaoqi Li, Ming Lu, Yandong Guo

Once the incremental capacity is below the threshold, the patch can exit at the specific layer.

Image Super-Resolution

Semantic Distillation Guided Salient Object Detection

no code implementations8 Mar 2022 Bo Xu, Guanze Liu, Han Huang, Cheng Lu, Yandong Guo

Most existing CNN-based salient object detection methods can identify local segmentation details like hair and animal fur, but often misinterpret the real saliency due to the lack of global contextual information caused by the subjectiveness of the SOD task and the locality of convolution layers.

Image Captioning Object +3

Single-Stage Is Enough: Multi-Person Absolute 3D Pose Estimation

no code implementations CVPR 2022 Lei Jin, Chenyang Xu, Xiaojuan Wang, Yabo Xiao, Yandong Guo, Xuecheng Nie, Jian Zhao

The existing multi-person absolute 3D pose estimation methods are mainly based on two-stage paradigm, i. e., top-down or bottom-up, leading to redundant pipelines with high computation cost.

3D Pose Estimation Depth Estimation +1

CRIS: CLIP-Driven Referring Image Segmentation

1 code implementation CVPR 2022 Zhaoqing Wang, Yu Lu, Qiang Li, Xunqiang Tao, Yandong Guo, Mingming Gong, Tongliang Liu

In addition, we present text-to-pixel contrastive learning to explicitly enforce the text feature similar to the related pixel-level features and dissimilar to the irrelevances.

Contrastive Learning Generalized Referring Expression Segmentation +3

Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation

no code implementations22 Oct 2021 Ziwen Li, Bo Xu, Han Huang, Cheng Lu, Yandong Guo

In this paper, we propose a new framework Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation (DTS-VIBE), to generate 3D human pose and mesh from RGB videos.

3D Human Pose Estimation Optical Flow Estimation

Pose Refinement with Joint Optimization of Visual Points and Lines

no code implementations8 Oct 2021 Shuang Gao, Jixiang Wan, Yishan Ping, Xudong Zhang, Shuzhou Dong, Yuchen Yang, Haikuan Ning, Jijunnan Li, Yandong Guo

High-precision camera re-localization technology in a pre-established 3D environment map is the basis for many tasks, such as Augmented Reality, Robotics and Autonomous Driving.

Autonomous Driving

Virtual Multi-Modality Self-Supervised Foreground Matting for Human-Object Interaction

1 code implementation ICCV 2021 Bo Xu, Han Huang, Cheng Lu, Ziwen Li, Yandong Guo

In this paper, we propose a Virtual Multi-modality Foreground Matting (VMFM) method to learn human-object interactive foreground (human and objects interacted with him or her) from a raw RGB image.

Human-Object Interaction Detection Image Matting

Towards Communication-Efficient and Privacy-Preserving Federated Representation Learning

no code implementations29 Sep 2021 Haizhou Shi, Youcai Zhang, Zijin Shen, Siliang Tang, Yaqian Li, Yandong Guo, Yueting Zhuang

This paper investigates the feasibility of federated representation learning under the constraints of communication cost and privacy protection.

Contrastive Learning Federated Learning +2

Improving the Robustness of Adversarial Attacks Using an Affine-Invariant Gradient Estimator

no code implementations13 Sep 2021 Wenzhao Xiang, Hang Su, Chang Liu, Yandong Guo, Shibao Zheng

As designers of artificial intelligence try to outwit hackers, both sides continue to hone in on AI's inherent vulnerabilities.

Adversarial Attack

The 2nd Anti-UAV Workshop & Challenge: Methods and Results

no code implementations23 Aug 2021 Jian Zhao, Gang Wang, Jianan Li, Lei Jin, Nana Fan, Min Wang, Xiaojuan Wang, Ting Yong, Yafeng Deng, Yandong Guo, Shiming Ge, Guodong Guo

The 2nd Anti-UAV Workshop \& Challenge aims to encourage research in developing novel and accurate methods for multi-scale object tracking.

Object Tracking

Generator Pyramid for High-Resolution Image Inpainting

no code implementations4 Dec 2020 Leilei Cao, Tong Yang, Yixu Wang, Bo Yan, Yandong Guo

Thus, our model consists of a pyramid of fully convolutional GANs, wherein the content GAN is responsible for completing contents in the lowest-resolution masked image, and each texture GAN is responsible for synthesizing textures in a higher-resolution image.

Image Inpainting Texture Synthesis +1

Perceptual Extreme Super Resolution Network with Receptive Field Block

1 code implementation26 May 2020 Taizhang Shang, Qiuju Dai, Shengchen Zhu, Tong Yang, Yandong Guo

Third, we alternately use different upsampling methods in the upsampling stage to reduce the high computation complexity and still remain satisfactory performance.

Image Super-Resolution object-detection +1

Discriminative Multi-modality Speech Recognition

2 code implementations CVPR 2020 Bo Xu, Cheng Lu, Yandong Guo, Jacob Wang

Vision is often used as a complementary modality for audio speech recognition (ASR), especially in the noisy environment where performance of solo audio modality significantly deteriorates.

Ranked #6 on Audio-Visual Speech Recognition on LRS3-TED (using extra training data)

Audio-Visual Speech Recognition Lipreading +2

Learning to Detect Head Movement in Unconstrained Remote Gaze Estimation in the Wild

no code implementations7 Apr 2020 Zhecan Wang, Jian Zhao, Cheng Lu, Han Huang, Fan Yang, Lianji Li, Yandong Guo

To better demonstrate the advantage of our methods, we further propose a new benchmark dataset with the most rich distribution of head-gaze combination reflecting real-world scenarios.

Gaze Estimation

To See in the Dark: N2DGAN for Background Modeling in Nighttime Scene

no code implementations12 Dec 2019 Zhenfeng Zhu, Yingying Meng, Deqiang Kong, Xingxing Zhang, Yandong Guo, Yao Zhao

Due to the deteriorated conditions of \mbox{illumination} lack and uneven lighting, nighttime images have lower contrast and higher noise than their daytime counterparts of the same scene, which limits seriously the performances of conventional background modeling methods.

Dually Supervised Feature Pyramid for Object Detection and Segmentation

1 code implementation8 Dec 2019 Fan Yang, Cheng Lu, Yandong Guo, Longin Jan Latecki, Haibin Ling

Feature pyramid architecture has been broadly adopted in object detection and segmentation to deal with multi-scale problem.

Object object-detection +2

Generative One-Shot Face Recognition

no code implementations28 Sep 2019 Zhengming Ding, Yandong Guo, Lei Zhang, Yun Fu

Specifically, we target at building a more effective general face classifier for both normal persons and one-shot persons.

Face Recognition One-Shot Learning +1

Edge Heuristic GAN for Non-uniform Blind Deblurring

no code implementations11 Jul 2019 Shuai Zheng, Zhenfeng Zhu, Jian Cheng, Yandong Guo, Yao Zhao

Non-uniform blur, mainly caused by camera shake and motions of multiple objects, is one of the most common causes of image quality degradation.

Deblurring Generative Adversarial Network

Large Scale Incremental Learning

4 code implementations CVPR 2019 Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Yun Fu

We believe this is because of the combination of two factors: (a) the data imbalance between the old and new classes, and (b) the increasing number of visually similar classes.

Class Incremental Learning Incremental Learning

Learning to Count Objects with Few Exemplar Annotations

no code implementations20 May 2019 Jianfeng Wang, Rong Xiao, Yandong Guo, Lei Zhang

In this paper, we study the problem of object counting with incomplete annotations.

Object Object Counting +2

Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition

no code implementations17 Sep 2018 Bowen Cheng, Rong Xiao, Yandong Guo, Yuxiao Hu, Jian-Feng Wang, Lei Zhang

We study in this paper how to initialize the parameters of multinomial logistic regression (a fully connected layer followed with softmax and cross entropy loss), which is widely used in deep neural network (DNN) models for classification problems.

General Classification Image Classification +4

Incremental Classifier Learning with Generative Adversarial Networks

no code implementations2 Feb 2018 Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Zhengyou Zhang, Yun Fu

To address these problems, we propose (a) a new loss function to combine the cross-entropy loss and distillation loss, (b) a simple way to estimate and remove the unbalance between the old and new classes , and (c) using Generative Adversarial Networks (GANs) to generate historical data and select representative exemplars during generation.

General Classification

One-shot Face Recognition by Promoting Underrepresented Classes

1 code implementation18 Jul 2017 Yandong Guo, Lei Zhang

First, we build a face feature extraction model, and improve its performance, especially for the persons with very limited training samples, by introducing a regularizer to the cross entropy loss for the multi-nomial logistic regression (MLR) learning.

Face Identification Face Recognition +1

Model-based Iterative Restoration for Binary Document Image Compression with Dictionary Learning

no code implementations CVPR 2017 Yandong Guo, Cheng Lu, Jan P. Allebach, Charles A. Bouman

Experimental results with a variety of document images demonstrate that our method improves the image quality compared with the observed image, and simultaneously improves the compression ratio.

Dictionary Learning Image Compression

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

11 code implementations27 Jul 2016 Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, Jianfeng Gao

In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base.

Face Recognition Image Captioning

Cannot find the paper you are looking for? You can Submit a new open access paper.