Search Results for author: Guanglu Song

Found 41 papers, 23 papers with code

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

1 code implementation19 Apr 2024 Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu

Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understanding, e. g., the CLIP vision encoder leads to outstanding results on general image understanding but poor performance on document or chart content.

Language Modelling Large Language Model

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

2 code implementations8 Apr 2024 Dazhong Shen, Guanglu Song, Zeyue Xue, Fu-Yun Wang, Yu Liu

Classifier-Free Guidance (CFG) has been widely used in text-to-image diffusion models, where the CFG scale is introduced to control the strength of text guidance on the whole image space.

Denoising Semantic Segmentation

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

2 code implementations4 Apr 2024 Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li

We further attribute this phenomenon to the diffusion model's insufficient condition utilization, which is caused by its training paradigm.

Attribute Image Captioning +1

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

1 code implementation25 Mar 2024 Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li

This paper presents Visual CoT, a novel pipeline that leverages the reasoning capabilities of multi-modal large language models (MLLMs) by incorporating visual Chain-of-Thought (CoT) reasoning.

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

1 code implementation20 Mar 2024 Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li

We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation, a diffusion-based pipeline that leverages both the intrinsic data-specific patterns of the source video and the image/video generative prior for effective outpainting.

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

1 code implementation19 Mar 2024 Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li

In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions.

Text-to-Image Generation

Towards Large-scale Masked Face Recognition

no code implementations25 Oct 2023 Manyuan Zhang, Bingqi Ma, Guanglu Song, Yunxiao Wang, Hongsheng Li, Yu Liu

During the COVID-19 coronavirus epidemic, almost everyone is wearing masks, which poses a huge challenge for deep learning-based face recognition algorithms.

Face Recognition

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

no code implementations ICCV 2023 Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li

We observe that different regions of interest in the visual feature map are suitable for performing query classification and box localization tasks, even for the same object.

Classification object-detection +1

Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising

1 code implementation29 May 2023 Fu-Yun Wang, Wenshuo Chen, Guanglu Song, Han-Jia Ye, Yu Liu, Hongsheng Li

To address this challenge, we introduce a novel paradigm dubbed as Gen-L-Video, capable of extending off-the-shelf short video diffusion models for generating and editing videos comprising hundreds of frames with diverse semantic segments without introducing additional training, all while preserving content consistency.

Denoising Image Generation +2

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

1 code implementation ICCV 2023 Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu

The HoP approach is straightforward: given the current timestamp t, we generate a pseudo Bird's-Eye View (BEV) feature of timestamp t-k from its adjacent frames and utilize this feature to predict the object set at timestamp t-k. Our approach is motivated by the observation that enforcing the detector to capture both the spatial location and temporal motion of objects occurring at historical timestamps can lead to more accurate BEV feature learning.

3D Object Detection Object

UniKD: Universal Knowledge Distillation for Mimicking Homogeneous or Heterogeneous Object Detectors

no code implementations ICCV 2023 Shanshan Lao, Guanglu Song, Boxiao Liu, Yu Liu, Yujiu Yang

Bridging this semantic gap now requires case-by-case algorithm design which is time-consuming and heavily relies on experienced adjustment.

Knowledge Distillation

Masked Autoencoders Are Stronger Knowledge Distillers

no code implementations ICCV 2023 Shanshan Lao, Guanglu Song, Boxiao Liu, Yu Liu, Yujiu Yang

In MKD, random patches of the input image are masked, and the corresponding missing feature is recovered by forcing it to imitate the output of the teacher.

Knowledge Distillation object-detection +2

Teach-DETR: Better Training DETR with Teachers

1 code implementation22 Nov 2022 Linjiang Huang, Kaixin Lu, Guanglu Song, Liang Wang, Si Liu, Yu Liu, Hongsheng Li

In this paper, we present a novel training scheme, namely Teach-DETR, to learn better DETR-based detectors from versatile teacher detectors.

DETRs with Collaborative Hybrid Assignments Training

3 code implementations ICCV 2023 Zhuofan Zong, Guanglu Song, Yu Liu

This new training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training the multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS and Faster RCNN.

 Ranked #1 on Object Detection on LVIS v1.0 val (using extra training data)

Instance Segmentation Object Detection +1

Large-batch Optimization for Dense Visual Predictions

1 code implementation20 Oct 2022 Zeyue Xue, Jianming Liang, Guanglu Song, Zhuofan Zong, Liang Chen, Yu Liu, Ping Luo

To address this challenge, we propose a simple yet effective algorithm, named Adaptive Gradient Variance Modulator (AGVM), which can train dense visual predictors with very large batch size, enabling several benefits more appealing than prior arts.

Instance Segmentation object-detection +3

Towards Robust Face Recognition with Comprehensive Search

no code implementations29 Aug 2022 Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li

To eliminate the bias of single-aspect research and provide an overall understanding of the face recognition model design, we first carefully design the search space for each aspect, then a comprehensive search method is introduced to jointly search optimal data cleaning, architecture, and loss function design.

Face Recognition Robust Face Recognition

Unifying Visual Perception by Dispersible Points Learning

1 code implementation18 Aug 2022 Jianming Liang, Guanglu Song, Biao Leng, Yu Liu

The method, called UniHead, views different visual perception tasks as the dispersible points learning via the transformer encoder architecture.

Instance Segmentation Object +5

Rethinking Robust Representation Learning Under Fine-grained Noisy Faces

no code implementations8 Aug 2022 Bingqi Ma, Guanglu Song, Boxiao Liu, Yu Liu

To better understand this, we reformulate the noise type of each class in a more fine-grained manner as N-identities|K^C-clusters.

Face Recognition Representation Learning

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

2 code implementations12 Jul 2022 Jihao Liu, Xin Huang, Guanglu Song, Hongsheng Li, Yu Liu

Finally, we integrate configurable operators and DSMs into a unified search space and search with a Reinforcement Learning-based search algorithm to fully explore the optimal combination of the operators.

Image Classification Neural Architecture Search

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

7 code implementations24 Jan 2022 Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao

Different from the typical transformer blocks, the relation aggregators in our UniFormer block are equipped with local and global token affinity respectively in shallow and deep layers, allowing to tackle both redundancy and dependency for efficient and effective representation learning.

Image Classification object-detection +5

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning

2 code implementations12 Jan 2022 Kunchang Li, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao

For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60. 9% and 71. 2% top-1 accuracy respectively.

Representation Learning

Self-slimmed Vision Transformer

1 code implementation24 Nov 2021 Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu

Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation.

Knowledge Distillation

INTERN: A New Learning Paradigm Towards General Vision

no code implementations16 Nov 2021 Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao

Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.

Rectifying the Data Bias in Knowledge Distillation

no code implementations ICCV 2021 Boxiao Liu, Shenghan Zhang, Guanglu Song, Haihang You, Yu Liu

In this paper, we first quantitatively define the uniformity of the sampled data for training, providing a unified view for methods that learn from biased data.

 Ranked #1 on Face Verification on IJB-C (training dataset metric)

Face Recognition Face Verification +3

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

no code implementations8 Oct 2021 Jihao Liu, Hongsheng Li, Guanglu Song, Xin Huang, Yu Liu

Recently, transformer and multi-layer perceptron (MLP) architectures have achieved impressive results on various vision tasks.

Image Classification object-detection +2

Self-Slimming Vision Transformer

no code implementations29 Sep 2021 Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu

Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation.

Knowledge Distillation

FNAS: Uncertainty-Aware Fast Neural Architecture Search

no code implementations25 May 2021 Jihao Liu, Ming Zhang, Yangting Sun, Boxiao Liu, Guanglu Song, Yu Liu, Hongsheng Li

Further, an architecture knowledge pool together with a block similarity function is proposed to utilize parameter knowledge and reduces the searching time by 2 times.

Fairness Neural Architecture Search +1

Switchable K-Class Hyperplanes for Noise-Robust Representation Learning

no code implementations ICCV 2021 Boxiao Liu, Guanglu Song, Manyuan Zhang, Haihang You, Yu Liu

When collaborated with the popular ArcFace on million-level data representation learning, we found that the switchable manner in SKH can effectively eliminate the gradient conflict generated by real-world label noise on a single K-class hyperplane.

Model Optimization Representation Learning +1

Discriminability Distillation in Group Representation Learning

no code implementations ECCV 2020 Manyuan Zhang, Guanglu Song, Hang Zhou, Yu Liu

We show the discrimiability knowledge has good properties that can be distilled by a light-weight distillation network and can be generalized on the unseen target set.

Representation Learning

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

2 code implementations16 Jun 2020 Siyu Chen, Junting Pan, Guanglu Song, Manyuan Zhang, Hao Shao, Ziyi Lin, Jing Shao, Hongsheng Li, Yu Liu

This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020.

Relation Network Spatio-Temporal Action Localization +1

1st Place Solutions for OpenImage2019 -- Object Detection and Instance Segmentation

2 code implementations17 Mar 2020 Yu Liu, Guanglu Song, Yuhang Zang, Yan Gao, Enze Xie, Junjie Yan, Chen Change Loy, Xiaogang Wang

Given such good instance bounding box, we further design a simple instance-level semantic segmentation pipeline and achieve the 1st place on the segmentation challenge.

General Classification Instance Segmentation +6

Revisiting the Sibling Head in Object Detector

2 code implementations CVPR 2020 Guanglu Song, Yu Liu, Xiaogang Wang

The ``shared head for classification and localization'' (sibling head), firstly denominated in Fast RCNN~\cite{girshick2015fast}, has been leading the fashion of the object detection community in the past five years.

Disentanglement General Classification +4

KPNet: Towards Minimal Face Detector

no code implementations17 Mar 2020 Guanglu Song, Yu Liu, Yuhang Zang, Xiaogang Wang, Biao Leng, Qingsheng Yuan

The small receptive field and capacity of minimal neural networks limit their performance when using them to be the backbone of detectors.

Face Detection

Top-1 Solution of Multi-Moments in Time Challenge 2019

1 code implementation12 Mar 2020 Manyuan Zhang, Hao Shao, Guanglu Song, Yu Liu, Junjie Yan

In this technical report, we briefly introduce the solutions of our team 'Efficient' for the Multi-Moments in Time challenge in ICCV 2019.

Action Recognition Video Understanding

Towards Flops-constrained Face Recognition

1 code implementation2 Sep 2019 Yu Liu, Guanglu Song, Manyuan Zhang, Jihao Liu, Yucong Zhou, Junjie Yan

Large scale face recognition is challenging especially when the computational budget is limited.

Lightweight Face Recognition

Transductive Centroid Projection for Semi-supervised Large-scale Recognition

no code implementations ECCV 2018 Yu Liu, Guanglu Song, Jing Shao, Xiao Jin, Xiaogang Wang

It is inspired by the observation of the weights in classification layer (called extit{anchors}) converge to the central direction of each class in hyperspace.

Clustering General Classification

Beyond Trade-off: Accelerate FCN-based Face Detector with Higher Accuracy

no code implementations CVPR 2018 Guanglu Song, Yu Liu, Ming Jiang, Yujie Wang, Junjie Yan, Biao Leng

Fully convolutional neural network (FCN) has been dominating the game of face detection task for a few years with its congenital capability of sliding-window-searching with shared kernels, which boiled down all the redundant calculation, and most recent state-of-the-art methods such as Faster-RCNN, SSD, YOLO and FPN use FCN as their backbone.

Face Detection Philosophy +1

Region-based Quality Estimation Network for Large-scale Person Re-identification

no code implementations23 Nov 2017 Guanglu Song, Biao Leng, Yu Liu, Congrui Hetang, Shaofan Cai

One of the major restrictions on the performance of video-based person re-id is partial noise caused by occlusion, blur and illumination.

Large-Scale Person Re-Identification

Cannot find the paper you are looking for? You can Submit a new open access paper.