Search Results for author: Lu Sheng

Found 54 papers, 30 papers with code

From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

no code implementations • 23 Apr 2024 • Zehuan Huang, Hongxing Fan, Lipeng Wang, Lu Sheng

Addressing this, we introduce Parts2Whole, a novel framework designed for generating customized portraits from multiple reference images, including pose images and various aspects of human appearance.

Image Generation

Paper
Add Code

Self-Supervised Monocular Depth Estimation in the Dark: Towards Data Distribution Compensation

no code implementations • 22 Apr 2024 • Haolin Yang, Chaoqiang Zhao, Lu Sheng, Yang Tang

In this paper, we propose a self-supervised nighttime monocular depth estimation method that does not use any night images during training.

Domain Adaptation Monocular Depth Estimation

Paper
Add Code

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

no code implementations • 28 Mar 2024 • Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng

The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments.

Motion Planning

Paper
Add Code

Assessment of Multimodal Large Language Models in Alignment with Human Values

1 code implementation • 26 Mar 2024 • Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

Large Language Models (LLMs) aim to serve as versatile assistants aligned with human values, as defined by the principles of being helpful, honest, and harmless (hhh).

264

Paper
Code

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

1 code implementation • 18 Mar 2024 • Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing Shao

It is a long-lasting goal to design a generalist-embodied agent that can follow diverse instructions in human-like ways.

Instruction Following

Paper
Code

Data-Free Generalized Zero-Shot Learning

no code implementations • 28 Jan 2024 • Bowen Tang, Long Yan, Jing Zhang, Qian Yu, Lu Sheng, Dong Xu

Firstly, to recover the virtual features of the base data, we model the CLIP features of base class images as samples from a von Mises-Fisher (vMF) distribution based on the pre-trained classifier.

Generalized Zero-Shot Learning Zero-shot Generalization

Paper
Add Code

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

no code implementations • 26 Jan 2024 • Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, LiMin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin, Zhipin Wang

Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents.

Paper
Add Code

Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

1 code implementation • 27 Dec 2023 • Xiawei Li, Qingyuan Xu, Jing Zhang, Tianyi Zhang, Qian Yu, Lu Sheng, Dong Xu

The point affinity proposed in this paper is characterized by features from multiple modalities (e. g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution.

3D Semantic Segmentation Point Cloud Segmentation +1

Paper
Code

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

1 code implementation • 12 Dec 2023 • Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao

It is a long-lasting goal to design an embodied system that can solve long-horizon open-world tasks in human-like ways.

264

Paper
Code

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

no code implementations • 11 Dec 2023 • Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng

Generating multiview images from a single view facilitates the rapid generation of a 3D mesh conditioned on a single image.

SSIM

Paper
Add Code

ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models

1 code implementation • 5 Nov 2023 • Zhelun Shi, Zhipin Wang, Hongxing Fan, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

We will publicly release all the detailed implementations for further analysis, as well as an easy-to-use modular toolkit for the integration of new recipes and models, so that ChEF can be a growing evaluation framework for the MLLM community.

Hallucination In-Context Learning +2

264

Paper
Code

Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE

1 code implementation • 5 Nov 2023 • Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, Jing Shao

While this phenomenon has been overlooked in previous work, we propose a novel and extensible framework, called Octavius, for comprehensive studies and experimentation on multimodal learning with Multimodal Large Language Models (MLLMs).

Zero-shot Generalization

264

Paper
Code

Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting

1 code implementation • 4 Nov 2023 • Hao Ai, Lu Sheng

Therefore, we present a new method in this paper, Stable Diffusion Reference Only, a images-to-image self-supervised model that uses only two types of conditional images for precise control generation to accelerate secondary painting.

Image Generation

112

Paper
Code

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

1 code implementation • 6 Sep 2023 • Jinglong Wang, Xiawei Li, Jing Zhang, Qingyuan Xu, Qin Zhou, Qian Yu, Lu Sheng, Dong Xu

The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes.

Contrastive Learning Denoising +5

Paper
Code

Distortion-aware Transformer in 360° Salient Object Detection

1 code implementation • 7 Aug 2023 • Yinjie Zhao, Lichen Zhao, Qian Yu, Jing Zhang, Lu Sheng, Dong Xu

The first is a Distortion Mapping Module, which guides the model to pre-adapt to distorted features globally.

ERP Object +3

Paper
Code

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

1 code implementation • NeurIPS 2023 • Zhenfei Yin, Jiong Wang, JianJian Cao, Zhelun Shi, Dingning Liu, Mukai Li, Lu Sheng, Lei Bai, Xiaoshui Huang, Zhiyong Wang, Jing Shao, Wanli Ouyang

To the best of our knowledge, we present one of the very first open-source endeavors in the field, LAMM, encompassing a Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark.

264

Paper
Code

Siamese DETR

1 code implementation • CVPR 2023 • Zeren Chen, Gengshi Huang, Wei Li, Jianing Teng, Kun Wang, Jing Shao, Chen Change Loy, Lu Sheng

In this work, we present Siamese DETR, a Siamese self-supervised pretraining approach for the Transformer architecture in DETR.

MULTI-VIEW LEARNING Representation Learning

Paper
Code

VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud

1 code implementation • CVPR 2023 • Ziqin Wang, Bowen Cheng, Lichen Zhao, Dong Xu, Yang Tang, Lu Sheng

Since 2D images provide rich semantics and scene graphs are in nature coped with languages, in this study, we propose Visual-Linguistic Semantics Assisted Training (VL-SAT) scheme that can significantly empower 3DSSG prediction models with discrimination about long-tailed and ambiguous semantic relations.

Ranked #1 on 3d scene graph generation on 3DSSG (using extra training data)

3d scene graph generation Relation

Paper
Code

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

1 code implementation • 29 Jan 2023 • Yangguang Li, Bin Huang, Zeren Chen, Yufeng Cui, Feng Liang, Mingzhu Shen, Fenggang Liu, Enze Xie, Lu Sheng, Wanli Ouyang, Jing Shao

Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference.

Data Augmentation

530

Paper
Code

Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline

1 code implementation • 24 Sep 2022 • Lichen Zhao, Daigang Cai, Jing Zhang, Lu Sheng, Dong Xu, Rui Zheng, Yinjie Zhao, Lipeng Wang, Xibo Fan

We also propose a new 3D VQA framework to effectively predict the completely visually grounded and explainable answer.

Question Answering Visual Question Answering

Paper
Code

Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation

1 code implementation • 31 Aug 2022 • ZiMing Wang, Xiaoliang Huo, Zhenghao Chen, Jing Zhang, Lu Sheng, Dong Xu

In addition to previous methods that seek correspondences by hand-crafted or learnt geometric features, recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence.

Point Cloud Registration

Paper
Code

SketchSampler: Sketch-based 3D Reconstruction via View-dependent Depth Sampling

1 code implementation • 14 Aug 2022 • Chenjian Gao, Qian Yu, Lu Sheng, Yi-Zhe Song, Dong Xu

Reconstructing a 3D shape based on a single sketch image is challenging due to the large domain gap between a sparse, irregular sketch and a regular, dense 3D shape.

3D Reconstruction

Paper
Code

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

no code implementations • 16 Mar 2022 • Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao

2) Squeeze Stage: X-Learner condenses the model to a reasonable size and learns the universal and generalizable representation for various tasks transferring.

object-detection Object Detection +3

Paper
Add Code

Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy

2 code implementations • 15 Mar 2022 • Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu

This work thus proposes a novel active learning framework for realistic dataset annotation.

Ranked #1 on Image Classification on Food-101 (using extra training data)

Active Learning Classification +3

161

Paper
Code

3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds

no code implementations • CVPR 2022 • Daigang Cai, Lichen Zhao, Jing Zhang, Lu Sheng, Dong Xu

Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules.

Attribute Dense Captioning +1

Paper
Add Code

ForgeryNet -- Face Forgery Analysis Challenge 2021: Methods and Results

no code implementations • 15 Dec 2021 • Yinan He, Lu Sheng, Jing Shao, Ziwei Liu, Zhaofan Zou, Zhizhi Guo, Shan Jiang, Curitis Sun, Guosheng Zhang, Keyao Wang, Haixiao Yue, Zhibin Hong, Wanguo Wang, Zhenyu Li, Qi Wang, Zhenli Wang, Ronghao Xu, Mingwen Zhang, Zhiheng Wang, Zhenhang Huang, Tianming Zhang, Ningning Zhao

The rapid progress of photorealistic synthesis techniques has reached a critical point where the boundary between real and manipulated images starts to blur.

valid

Paper
Add Code

VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh Recovery from Partial Point Clouds

no code implementations • 17 Oct 2021 • Guanze Liu, Yu Rong, Lu Sheng

3D human mesh recovery from point clouds is essential for various tasks, including AR/VR and human behavior understanding.

Human Mesh Recovery

Paper
Add Code

Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

1 code implementation • CVPR 2021 • Bowen Cheng, Lu Sheng, Shaoshuai Shi, Ming Yang, Dong Xu

Inspired by the back-tracing strategy in the conventional Hough voting methods, in this work, we introduce a new 3D object detection method, named as Back-tracing Representative Points Network (BRNet), which generatively back-traces the representative points from the vote centers and also revisits complementary seed points around these generated points, so as to better capture the fine local structural features surrounding the potential objects from the raw point clouds.

Ranked #17 on 3D Object Detection on ScanNetV2

3D Object Detection Object +1

Paper
Code

DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer

2 code implementations • 18 Mar 2021 • Buyu Li, Yongchi Zhao, Zhelun Shi, Lu Sheng

In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction, where the key poses are easier to be synchronized with the music beats and the parametric curves can be efficiently regressed to render fluent rhythm-aligned movements.

Paper
Code

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

2 code implementations • CVPR 2021 • Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, Ziwei Liu

To counter this emerging threat, we construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across four tasks: 1) Image Forgery Classification, including two-way (real / fake), three-way (real / fake with identity-replaced forgery approaches / fake with identity-remained forgery approaches), and n-way (real and 15 respective forgery approaches) classification.

Benchmarking Classification +2

Paper
Code

StyleFormer: Real-Time Arbitrary Style Transfer via Parametric Style Composition

1 code implementation • ICCV 2021 • Xiaolei Wu, Zhihao Hu, Lu Sheng, Dong Xu

In this work, we propose a new feed-forward arbitrary style transfer method, referred to as StyleFormer, which can simultaneously fulfill fine-grained style diversity and semantic content coherency.

Style Transfer

Paper
Code

3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

no code implementations • ICCV 2021 • Lichen Zhao, Daigang Cai, Lu Sheng, Dong Xu

Visual grounding on 3D point clouds is an emerging vision and language task that benefits various applications in understanding the 3D visual world.

Object Object Proposal Generation +2

Paper
Add Code

PV-NAS: Practical Neural Architecture Search for Video Recognition

no code implementations • 2 Nov 2020 • ZiHao Wang, Chen Lin, Lu Sheng, Junjie Yan, Jing Shao

Recently, deep learning has been utilized to solve video recognition problem due to its prominent representation ability.

Neural Architecture Search Video Recognition

Paper
Add Code

Adaptive Gradient Method with Resilience and Momentum

no code implementations • 21 Oct 2020 • Jie Liu, Chen Lin, Chuming Li, Lu Sheng, Ming Sun, Junjie Yan, Wanli Ouyang

Several variants of stochastic gradient descent (SGD) have been proposed to improve the learning effectiveness and efficiency when training deep neural networks, among which some recent influential attempts would like to adaptively control the parameter-wise learning rate (e. g., Adam and RMSProp).

Paper
Add Code

Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues

2 code implementations • ECCV 2020 • Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, Jing Shao

As realistic facial manipulation technologies have achieved remarkable progress, social concerns about potential malicious abuse of these technologies bring out an emerging research topic of face forgery detection.

139

Paper
Code

Unsupervised Domain Expansion from Multiple Sources

no code implementations • 26 May 2020 • Jing Zhang, Wanqing Li, Lu Sheng, Chang Tang, Philip Ogunbona

Given an existing system learned from previous source domains, it is desirable to adapt the system to new domains without accessing and forgetting all the previous domains in some applications.

Domain Adaptation Unsupervised Domain Expansion

Paper
Add Code

Powering One-shot Topological NAS with Stabilized Share-parameter Proxy

no code implementations • ECCV 2020 • Ronghao Guo, Chen Lin, Chuming Li, Keyu Tian, Ming Sun, Lu Sheng, Junjie Yan

Specifically, the difficulties for architecture searching in such a complex space has been eliminated by the proposed stabilized share-parameter proxy, which employs Stochastic Gradient Langevin Dynamics to enable fast shared parameter sampling, so as to achieve stabilized measurement of architecture performance even in search space with complex topological structures.

Neural Architecture Search

Paper
Add Code

Morphing and Sampling Network for Dense Point Cloud Completion

2 code implementations • 30 Nov 2019 • Minghua Liu, Lu Sheng, Sheng Yang, Jing Shao, Shi-Min Hu

3D point cloud completion, the task of inferring the complete geometric shape from a partial point cloud, has been attracting attention in the community.

Ranked #8 on Point Cloud Completion on ShapeNet

Point Cloud Completion

382

Paper
Code

Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization

1 code implementation • ICCV 2019 • Chufeng Tang, Lu Sheng, Zhao-Xiang Zhang, Xiaolin Hu

To predict the existence of a particular attribute, it is demanded to localize the regions related to the attribute.

Ranked #1 on Pedestrian Attribute Recognition on RAP

Attribute Pedestrian Attribute Recognition

183

Paper
Code

Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM

no code implementations • ICCV 2019 • Lu Sheng, Dan Xu, Wanli Ouyang, Xiaogang Wang

In this paper we tackle the joint learning problem of keyframe detection and visual odometry towards monocular visual SLAM systems.

Camera Relocalization Pose Estimation +1

Paper
Add Code

CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval

1 code implementation • ICCV 2019 • Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao

Text-image cross-modal retrieval is a challenging task in the field of language and vision.

Ranked #9 on Image Retrieval on Flickr30K 1K test

Cross-Modal Retrieval Image Retrieval +1

124

Paper
Code

Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

no code implementations • 6 May 2019 • Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.

Face Model Pose Estimation +1

Paper
Add Code

Context and Attribute Grounded Dense Captioning

no code implementations • CVPR 2019 • Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao

Dense captioning aims at simultaneously localizing semantic regions and describing these regions-of-interest (ROIs) with short phrases or sentences in natural language.

Ranked #3 on Dense Captioning on Visual Genome

Attribute Dense Captioning

Paper
Add Code

Semantics Disentangling for Text-to-Image Generation

no code implementations • CVPR 2019 • Guojun Yin, Bin Liu, Lu Sheng, Nenghai Yu, Xiaogang Wang, Jing Shao

Synthesizing photo-realistic images from text descriptions is a challenging problem.

Text-to-Image Generation

Paper
Add Code

GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving

no code implementations • CVPR 2019 • Buyu Li, Wanli Ouyang, Lu Sheng, Xingyu Zeng, Xiaogang Wang

We present an efficient 3D object detection framework based on a single RGB image in the scenario of autonomous driving.

Ranked #18 on Vehicle Pose Estimation on KITTI Cars Hard

Autonomous Driving Monocular 3D Object Detection +3

Paper
Add Code

Video Generation from Single Semantic Label Map

2 code implementations • CVPR 2019 • Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang

This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.

Image Generation Image to Video Generation +1

139

Paper
Code

Unsupervised Bi-directional Flow-based Video Generation from one Snapshot

no code implementations • 3 Mar 2019 • Lu Sheng, Junting Pan, Jiaming Guo, Jing Shao, Xiaogang Wang, Chen Change Loy

Imagining multiple consecutive frames given one single snapshot is challenging, since it is difficult to simultaneously predict diverse motions from a single image and faithfully generate novel frames without visual distortions.

Video Generation

Paper
Add Code

Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection

1 code implementation • 16 Sep 2018 • Yongcheng Liu, Lu Sheng, Jing Shao, Junjie Yan, Shiming Xiang, Chunhong Pan

Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs.

Ranked #9 on Multi-Label Classification on NUS-WIDE

Classification General Classification +4

Paper
Code

Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition

no code implementations • ECCV 2018 • Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao, Chen Change Loy

We show that by encouraging deep message propagation and interactions between local object features and global predicate features, one can achieve compelling performance in recognizing complex relationships without using any linguistic priors.

Object

Paper
Add Code

Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration

3 code implementations • CVPR 2018 • Lu Sheng, Ziyi Lin, Jing Shao, Xiaogang Wang

Zero-shot artistic style transfer is an important image synthesis problem aiming at transferring arbitrary style into content images.

Image Generation Image Reconstruction +1

177

Paper
Code

Exploring Disentangled Feature Representation Beyond Face Identification

no code implementations • CVPR 2018 • Yu Liu, Fangyin Wei, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang

This paper proposes learning disentangled but complementary face features with minimal supervision by face identification.

Attribute Face Generation +1

Paper
Add Code

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

1 code implementation • CVPR 2018 • Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei zhang

In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach.

Ranked #36 on Action Recognition on UCF101

Action Recognition In Videos Optical Flow Estimation +1

196

Paper
Code

HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis

2 code implementations • ICCV 2017 • Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, Xiaogang Wang

Pedestrian analysis plays a vital role in intelligent video surveillance and is a key component for security-centric computer vision systems.

Ranked #2 on Pedestrian Attribute Recognition on RAP

Attribute Pedestrian Attribute Recognition +1

243

Paper
Code

A Generative Model for Depth-Based Robust 3D Facial Pose Tracking

no code implementations • CVPR 2017 • Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

We consider the problem of depth-based robust 3D facial pose tracking under unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.

Face Model Pose Estimation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.