Search Results for author: Xiaolong Wang

Found 192 papers, 68 papers with code

Test-Time Training for Generalization under Distribution Shifts

no code implementations • ICML 2020 • Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, University of California Moritz Hardt

We introduce a general approach, called test-time training, for improving the performance of predictive models when training and test data come from different distributions.

Image Classification Self-Supervised Learning

Paper
Add Code

Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos

no code implementations • 18 Apr 2024 • Isabella Liu, Hao Su, Xiaolong Wang

To this end, we introduce Dynamic Gaussians Mesh (DG-Mesh), a framework to reconstruct a high-fidelity and time-consistent mesh given a single monocular video.

Paper
Add Code

Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing

no code implementations • 1 Apr 2024 • Ri-Zhao Qiu, Ge Yang, Weijia Zeng, Xiaolong Wang

Scene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes.

Paper
Add Code

Visual Whole-Body Control for Legged Loco-Manipulation

no code implementations • 25 Mar 2024 • Minghuan Liu, Zixuan Chen, Xuxin Cheng, Yandong Ji, Ruihan Yang, Xiaolong Wang

That is, the robot can control the legs and the arm at the same time to extend its workspace.

Position

Paper
Add Code

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

no code implementations • 18 Mar 2024 • Mengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang

In this paper, we propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data.

6D Pose Estimation using RGB Image Generation +1

Paper
Add Code

Learning Generalizable Feature Fields for Mobile Manipulation

no code implementations • 12 Mar 2024 • Ri-Zhao Qiu, Yafei Hu, Ge Yang, Yuchen Song, Yang Fu, Jianglong Ye, Jiteng Mu, Ruihan Yang, Nikolay Atanasov, Sebastian Scherer, Xiaolong Wang

An open problem in mobile manipulation is how to represent objects and scenes in a unified manner, so that robots can use it both for navigating in the environment and manipulating objects.

Novel View Synthesis

Paper
Add Code

DNAct: Diffusion Guided Multi-Task 3D Policy Learning

no code implementations • 7 Mar 2024 • Ge Yan, Yueh-Hua Wu, Xiaolong Wang

To learn a generalizable multi-task policy with few demonstrations, the pre-training phase of DNAct leverages neural rendering to distill 2D semantic features from foundation models such as Stable Diffusion to a 3D space, which provides a comprehensive semantic understanding regarding the scene.

Neural Rendering

Paper
Add Code

Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models

no code implementations • 27 Feb 2024 • Xiaolong Wang, Yile Wang, Yuanchi Zhang, Fuwen Luo, Peng Li, Maosong Sun, Yang Liu

Based on the characteristics of the tasks and the strong dialogue-generation capabilities of LLMs, we propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation.

Dark Humor Detection Dialogue Generation +3

Paper
Add Code

Expressive Whole-Body Control for Humanoid Robots

no code implementations • 26 Feb 2024 • Xuxin Cheng, Yandong Ji, Junming Chen, Ruihan Yang, Ge Yang, Xiaolong Wang

Can we enable humanoid robots to generate rich, diverse, and expressive motions in the real world?

Imitation Learning

Paper
Add Code

DEEM: Dynamic Experienced Expert Modeling for Stance Detection

no code implementations • 23 Feb 2024 • Xiaolong Wang, Yile Wang, Sijie Cheng, Peng Li, Yang Liu

Recent work has made a preliminary attempt to use large language models (LLMs) to solve the stance detection task, showing promising results.

Stance Detection

Paper
Add Code

CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation

no code implementations • 22 Feb 2024 • Jun Wang, Yuzhe Qin, Kaiming Kuang, Yigit Korkmaz, Akhilan Gurumoorthy, Hao Su, Xiaolong Wang

We introduce CyberDemo, a novel approach to robotic imitation learning that leverages simulated human demonstrations for real-world tasks.

Data Augmentation Imitation Learning

Paper
Add Code

CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models

no code implementations • 21 Feb 2024 • Fuwen Luo, Chi Chen, Zihao Wan, Zhaolu Kang, Qidong Yan, Yingjie Li, Xiaolong Wang, Siyu Wang, Ziyue Wang, Xiaoyue Mi, Peng Li, Ning Ma, Maosong Sun, Yang Liu

Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language.

Benchmarking

Paper
Add Code

Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages

1 code implementation • 19 Feb 2024 • Yuanchi Zhang, Yile Wang, Zijun Liu, Shuo Wang, Xiaolong Wang, Peng Li, Maosong Sun, Yang Liu

While large language models (LLMs) have been pre-trained on multilingual corpora, their performance still lags behind in most languages compared to a few resource-rich languages.

Transfer Learning

16,643

Paper
Code

DexTouch: Learning to Seek and Manipulate Objects with Tactile Dexterity

no code implementations • 23 Jan 2024 • Kang-Won Lee, Yuzhe Qin, Xiaolong Wang, Soo-Chul Lim

In this paper, we introduce a multi-finger robot system designed to search for and manipulate objects using the sense of touch without relying on visual information.

Paper
Add Code

RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos

no code implementations • 23 Jan 2024 • Hongchi Xia, Yang Fu, Sifei Liu, Xiaolong Wang

WildRGB-D comprises large-scale category-level RGB-D object videos, which are taken using an iPhone to go around the objects in 360 degrees.

6D Pose Estimation Novel View Synthesis +2

Paper
Add Code

Pixel Aligned Language Models

no code implementations • 14 Dec 2023 • Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid

When taking locations as inputs, the model performs location-conditioned captioning, which generates captions for the indicated object or region.

Language Modelling

Paper
Add Code

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

no code implementations • 14 Dec 2023 • Yafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, Shibo Zhao, Yu Quan Chong, Chen Wang, Katia Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Zsolt Kira, Fei Xia, Yonatan Bisk

Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i. e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of robotics, and also exploring (ii) what a robotics-specific foundation model would look like.

Paper
Add Code

COLMAP-Free 3D Gaussian Splatting

no code implementations • 12 Dec 2023 • Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, Xiaolong Wang

While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses.

Neural Rendering Novel View Synthesis +1

Paper
Add Code

Harmonic Mobile Manipulation

no code implementations • 11 Dec 2023 • Ruihan Yang, Yejin Kim, Aniruddha Kembhavi, Xiaolong Wang, Kiana Ehsani

Recent advancements in robotics have enabled robots to navigate complex scenes or manipulate diverse objects independently.

Navigate

Paper
Add Code

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks

no code implementations • 4 Dec 2023 • Jiarui Xu, Yossi Gandelsman, Amir Bar, Jianwei Yang, Jianfeng Gao, Trevor Darrell, Xiaolong Wang

Given a textual description of a visual task (e. g. "Left: input image, Right: foreground segmentation"), a few input-output visual examples, or both, the model in-context learns to solve it for a new test input.

Colorization Foreground Segmentation +3

Paper
Add Code

Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing

no code implementations • 4 Dec 2023 • Ying Yuan, Haichuan Che, Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Kang-Won Lee, Yi Wu, Soo-Chul Lim, Xiaolong Wang

In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation.

Paper
Add Code

TD-MPC2: Scalable, Robust World Models for Continuous Control

1 code implementation • 25 Oct 2023 • Nicklas Hansen, Hao Su, Xiaolong Wang

TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model.

Continuous Control Model-based Reinforcement Learning +1

192

Paper
Code

Finetuning Offline World Models in the Real World

no code implementations • 24 Oct 2023 • Yunhai Feng, Nicklas Hansen, Ziyan Xiong, Chandramouli Rajagopalan, Xiaolong Wang

In this work, we seek to get the best of both worlds: we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model.

Offline RL Reinforcement Learning (RL)

Paper
Add Code

Learning to (Learn at Test Time)

1 code implementation • 20 Oct 2023 • Yu Sun, Xinhao Li, Karan Dalal, Chloe Hsu, Sanmi Koyejo, Carlos Guestrin, Xiaolong Wang, Tatsunori Hashimoto, Xinlei Chen

Our inner loop turns out to be equivalent to linear attention when the inner-loop learner is only a linear model, and to self-attention when it is a kernel estimator.

Paper
Code

Generalized Animal Imitator: Agile Locomotion with Versatile Motion Prior

no code implementations • 2 Oct 2023 • Ruihan Yang, Zhuoqun Chen, Jianhan Ma, Chongyi Zheng, Yiyu Chen, Quan Nguyen, Xiaolong Wang

To our understanding, this is the first work that allows a robot to concurrently learn diverse agile locomotion tasks using a singular controller.

Paper
Add Code

GenSim: Generating Robotic Simulation Tasks via Large Language Models

1 code implementation • 2 Oct 2023 • Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, Xiaolong Wang

Collecting large amounts of real-world interaction data to train general robotic policies is often prohibitively expensive, thus motivating the use of simulation data.

Code Generation

231

Paper
Code

3D Reconstruction with Generalizable Neural Fields using Scene Priors

no code implementations • 26 Sep 2023 • Yang Fu, Shalini De Mello, Xueting Li, Amey Kulkarni, Jan Kautz, Xiaolong Wang, Sifei Liu

NFP not only demonstrates SOTA scene reconstruction performance and efficiency, but it also supports single-image novel-view synthesis, which is underexplored in neural fields.

3D Reconstruction 3D Scene Reconstruction +1

Paper
Add Code

Dynamic Handover: Throw and Catch with Bimanual Hands

no code implementations • 11 Sep 2023 • Binghao Huang, Yuanpei Chen, Tianyu Wang, Yuzhe Qin, Yaodong Yang, Nikolay Atanasov, Xiaolong Wang

Humans throw and catch objects all the time.

Multi-agent Reinforcement Learning Trajectory Prediction

Paper
Add Code

SHAPE: A Sample-adaptive Hierarchical Prediction Network for Medication Recommendation

1 code implementation • 9 Sep 2023 • Sicen Liu, Xiaolong Wang, Jingcheng Du, Yongshuai Hou, Xianbing Zhao, Hui Xu, Hui Wang, Yang Xiang, Buzhou Tang

Effectively medication recommendation with complex multimorbidity conditions is a critical task in healthcare.

Paper
Code

Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf

no code implementations • 9 Sep 2023 • Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, Yang Liu

Communication games, which we refer to as incomplete information games that heavily depend on natural language communication, hold significant research value in fields such as economics, social science, and artificial intelligence.

Retrieval

Paper
Add Code

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

1 code implementation • 31 Aug 2023 • Yanjie Ze, Ge Yan, Yueh-Hua Wu, Annabella Macaluso, Yuying Ge, Jianglong Ye, Nicklas Hansen, Li Erran Li, Xiaolong Wang

To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model ($\textit{e. g.}$, Stable Diffusion) to distill rich semantic information into the deep 3D voxel.

Decision Making

Paper
Code

PointLLM: Empowering Large Language Models to Understand Point Clouds

3 code implementations • 31 Aug 2023 • Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin

The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding.

Ranked #3 on 3D Question Answering (3D-QA) on 3D MM-Vet

3D Object Classification 3D Question Answering (3D-QA) +2

379

Paper
Code

Learning Dense Correspondences between Photos and Sketches

no code implementations • 24 Jul 2023 • Xuanchen Lu, Xiaolong Wang, Judith E Fan

Humans effortlessly grasp the connection between sketches and real-world objects, even when these sketches are far from realistic.

Contrastive Learning

Paper
Add Code

Pluggable Neural Machine Translation Models via Memory-augmented Adapters

1 code implementation • 12 Jul 2023 • Yuzhuang Xu, Shuo Wang, Peng Li, Xuebo Liu, Xiaolong Wang, Weidong Liu, Yang Liu

Although neural machine translation (NMT) models perform well in the general domain, it remains rather challenging to control their generation behavior to satisfy the requirement of different users.

Machine Translation NMT +1

Paper
Code

Causal Kripke Models

no code implementations • 11 Jul 2023 • Yiwen Ding, Krishna Manoorkar, Apostolos Tzimoulis, Ruoding Wang, Xiaolong Wang

This work extends Halpern and Pearl's causal models for actual causality to a possible world semantics environment.

Paper
Add Code

Test-Time Training on Video Streams

no code implementations • 11 Jul 2023 • Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang

Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders.

Image Reconstruction Panoptic Segmentation

Paper
Add Code

AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System

no code implementations • 10 Jul 2023 • Yuzhe Qin, Wei Yang, Binghao Huang, Karl Van Wyk, Hao Su, Xiaolong Wang, Yu-Wei Chao, Dieter Fox

For real-world experiments, AnyTeleop can outperform a previous system that was designed for a specific robot hardware with a higher success rate, using the same robot.

Imitation Learning

Paper
Add Code

Elastic Decision Transformer

no code implementations • NeurIPS 2023 • Yueh-Hua Wu, Xiaolong Wang, Masashi Hamaya

This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants.

Atari Games D4RL +1

Paper
Add Code

Zero-shot Pose Transfer for Unrigged Stylized 3D Characters

1 code implementation • CVPR 2023 • Jiashun Wang, Xueting Li, Sifei Liu, Shalini De Mello, Orazio Gallo, Xiaolong Wang, Jan Kautz

We present a zero-shot approach that requires only the widely available deformed non-stylized avatars in training, and deforms stylized characters of significantly different shapes at inference.

Pose Transfer

Paper
Code

Medication Recommendation via Domain Knowledge Informed Deep Learning

no code implementations • 31 May 2023 • Sicen Liu, Xiaolong Wang, Xianbing Zhao, Hao Chen

However, most of them neglect incorporating domain knowledge according to the clinical manifestations in the EHR of the patient.

Paper
Add Code

DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects

no code implementations • CVPR 2023 • Chen Bao, Helin Xu, Yuzhe Qin, Xiaolong Wang

On the other hand, operating with a multi-finger robot hand will allow better approximation to human behavior and enable the robot to operate on diverse articulated objects.

Benchmarking Decision Making +2

Paper
Add Code

TUVF: Learning Generalizable Texture UV Radiance Fields

no code implementations • 4 May 2023 • An-Chieh Cheng, Xueting Li, Sifei Liu, Xiaolong Wang

This allows the texture to be disentangled from the underlying shape and transferable to other shapes that share the same UV space, i. e., from the same category.

3D Shape Modeling Texture Synthesis

Paper
Add Code

ContactArt: Learning 3D Interaction Priors for Category-level Articulated Object and Hand Poses Estimation

no code implementations • 2 May 2023 • Zehao Zhu, Jiashun Wang, Yuzhe Qin, Deqing Sun, Varun Jampani, Xiaolong Wang

We propose a new dataset and a novel approach to learning hand-object interaction priors for hand and articulated object pose estimation.

Hand Pose Estimation Object

Paper
Add Code

ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs

no code implementations • ICCV 2023 • Jiteng Mu, Shen Sang, Nuno Vasconcelos, Xiaolong Wang

While NeRF-based human representations have shown impressive novel view synthesis results, most methods still rely on a large number of images / views for training.

Novel View Synthesis

Paper
Add Code

Efficient bimanual handover and rearrangement via symmetry-aware actor-critic learning

1 code implementation • IEEE International Conference on Robotics and Automation (ICRA) 2023 • Yunfei Li;, Chaoyi Pan, Huazhe Xu, Xiaolong Wang, Yi Wu

We develop a symmetry-aware actor-critic framework that leverages the interchangeable roles of the two manipulators in the bimanual control setting to reduce the policy search space.

Reinforcement Learning (RL)

Paper
Code

Neural Volumetric Memory for Visual Locomotion Control

no code implementations • CVPR 2023 • Ruihan Yang, Ge Yang, Xiaolong Wang

To solve this problem, we follow the paradigm in computer vision that explicitly models the 3D geometry of the scene and propose Neural Volumetric Memory (NVM), a geometric memory architecture that explicitly accounts for the SE(3) equivariance of the 3D world.

Paper
Add Code

Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios

1 code implementation • 30 Mar 2023 • Jie Xu, Yazhou Ren, Xiaolong Wang, Lei Feng, Zheng Zhang, Gang Niu, Xiaofeng Zhu

Multi-view clustering (MVC) aims at exploring category structures among multi-view data in self-supervised manners.

Clustering Representation Learning

Paper
Code

FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models

no code implementations • ICCV 2023 • Jianglong Ye, Naiyan Wang, Xiaolong Wang

Recent works on generalizable NeRFs have shown promising results on novel view synthesis from single or few images.

Neural Rendering Novel View Synthesis

Paper
Add Code

Rotating without Seeing: Towards In-hand Dexterity through Touch

no code implementations • 20 Mar 2023 • Zhao-Heng Yin, Binghao Huang, Yuzhe Qin, Qifeng Chen, Xiaolong Wang

Relying on touch-only sensing, we can directly deploy the policy in a real robot hand and rotate novel objects that are not presented in training.

Object

Paper
Add Code

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

1 code implementation • CVPR 2023 • Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello

Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks.

Ranked #2 on Open-World Instance Segmentation on UVO (using extra training data)

Open Vocabulary Panoptic Segmentation Open Vocabulary Semantic Segmentation +4

798

Paper
Code

Dynamic Inference With Grounding Based Vision and Language Models

no code implementations • CVPR 2023 • Burak Uzkent, Amanmeet Garg, Wentao Zhu, Keval Doshi, Jingru Yi, Xiaolong Wang, Mohamed Omar

For example, recent image and language models with more than 200M parameters have been proposed to learn visual grounding in the pre-training step and show impressive results on downstream vision and language tasks.

Language Modelling Referring Expression +3

Paper
Add Code

Policy Adaptation from Foundation Model Feedback

no code implementations • CVPR 2023 • Yuying Ge, Annabella Macaluso, Li Erran Li, Ping Luo, Xiaolong Wang

When deploying the trained policy to a new task or a new environment, we first let the policy play with randomly generated instructions to record the demonstrations.

Decision Making

Paper
Add Code

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

2 code implementations • 13 Dec 2022 • Chenhongyi Yang, Jiarui Xu, Shalini De Mello, Elliot J. Crowley, Xiaolong Wang

In each GP Block, features are first grouped together by a fixed number of learnable group tokens; we then perform Group Propagation where global information is exchanged between the grouped features; finally, global information in the updated grouped features is returned back to the image features through a transformer decoder.

Image Classification Instance Segmentation +5

554

Paper
Code

MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations

1 code implementation • 12 Dec 2022 • Nicklas Hansen, Yixin Lin, Hao Su, Xiaolong Wang, Vikash Kumar, Aravind Rajeswaran

We identify key ingredients for leveraging demonstrations in model learning -- policy pretraining, targeted exploration, and oversampling of demonstration data -- which forms the three phases of our model-based RL framework.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Code

On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline

1 code implementation • 12 Dec 2022 • Nicklas Hansen, Zhecheng Yuan, Yanjie Ze, Tongzhou Mu, Aravind Rajeswaran, Hao Su, Huazhe Xu, Xiaolong Wang

In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks.

Benchmarking Data Augmentation

Paper
Code

DexPoint: Generalizable Point Cloud Reinforcement Learning for Sim-to-Real Dexterous Manipulation

no code implementations • 17 Nov 2022 • Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Hao Su, Xiaolong Wang

We empirically evaluate our method using an Allegro Hand to grasp novel objects in both simulation and real world.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Visual Reinforcement Learning with Self-Supervised 3D Representations

1 code implementation • 13 Oct 2022 • Yanjie Ze, Nicklas Hansen, Yinbo Chen, Mohit Jain, Xiaolong Wang

A prominent approach to visual Reinforcement Learning (RL) is to learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional learning signal and inductive biases.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Code

MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Pose

no code implementations • 13 Oct 2022 • Yang Fu, Ishan Misra, Xiaolong Wang

We propose a generalizable neural radiance fields - MonoNeRF, that can be trained on large-scale monocular videos of moving in static scenes without any ground-truth annotations of depth and camera poses.

Depth Estimation Disentanglement +2

Paper
Add Code

Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild

1 code implementation • 13 Oct 2022 • Kaifeng Zhang, Yang Fu, Shubhankar Borse, Hong Cai, Fatih Porikli, Xiaolong Wang

While 6D object pose estimation has wide applications across computer vision and robotics, it remains far from being solved due to the lack of annotations.

6D Pose Estimation 6D Pose Estimation using RGB +2

Paper
Code

Transformers as Meta-Learners for Implicit Neural Representations

1 code implementation • 4 Aug 2022 • Yinbo Chen, Xiaolong Wang

Motivated by a generalized formulation of gradient-based meta-learning, we propose a formulation that uses Transformers as hypernetworks for INRs, where it can directly build the whole set of INR weights with Transformers specialized as set-to-set mapping.

Meta-Learning

129

Paper
Code

Graph Inverse Reinforcement Learning from Diverse Videos

no code implementations • 28 Jul 2022 • Sateesh Kumar, Jonathan Zamora, Nicklas Hansen, Rishabh Jangir, Xiaolong Wang

Research on Inverse Reinforcement Learning (IRL) from third-person videos has shown encouraging results on removing the need for manual reward design for robotic tasks.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Learning Continuous Grasping Function with a Dexterous Hand from Human Demonstrations

1 code implementation • 11 Jul 2022 • Jianglong Ye, Jiashun Wang, Binghao Huang, Yuzhe Qin, Xiaolong Wang

We will first convert the large-scale human-object interaction trajectories to robot demonstrations via motion retargeting, and then use these demonstrations to train CGF.

Human-Object Interaction Detection motion retargeting

Paper
Code

Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset

no code implementations • 30 Jun 2022 • Yang Fu, Xiaolong Wang

6D object pose estimation is one of the fundamental problems in computer vision and robotics research.

6D Pose Estimation 6D Pose Estimation using RGB +1

Paper
Add Code

MSDF: A General Open-Domain Multi-Skill Dialog Framework

no code implementations • 17 Jun 2022 • Yu Zhao, Xinshuo Hu, Yunxin Li, Baotian Hu, Dongfang Li, Sichao Chen, Xiaolong Wang

In this paper, we propose a general Multi-Skill Dialog Framework, namely MSDF, which can be applied in different dialog tasks (e. g. knowledge grounded dialog and persona based dialog).

Paper
Add Code

Learning Implicit Feature Alignment Function for Semantic Segmentation

1 code implementation • 17 Jun 2022 • Hanzhe Hu, Yinbo Chen, Jiarui Xu, Shubhankar Borse, Hong Cai, Fatih Porikli, Xiaolong Wang

As such, IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions.

Segmentation Semantic Segmentation

Paper
Code

Medical Dialogue Response Generation with Pivotal Information Recalling

no code implementations • 17 Jun 2022 • Yu Zhao, Yunxin Li, Yuxiang Wu, Baotian Hu, Qingcai Chen, Xiaolong Wang, Yuxin Ding, Min Zhang

To mitigate this problem, we propose a medical response generation model with Pivotal Information Recalling (MedPIR), which is built on two components, i. e., knowledge-aware dialogue graph encoder and recall-enhanced generator.

Dialogue Generation Graph Attention +1

Paper
Add Code

VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution

1 code implementation • CVPR 2022 • Zeyuan Chen, Yinbo Chen, Jingwen Liu, Xingqian Xu, Vidit Goel, Zhangyang Wang, Humphrey Shi, Xiaolong Wang

The learned implicit neural representation can be decoded to videos of arbitrary spatial resolution and frame rate.

Space-time Video Super-resolution Video Frame Interpolation +1

257

Paper
Code

CATNet: Cross-event Attention-based Time-aware Network for Medical Event Prediction

no code implementations • 29 Apr 2022 • Sicen Liu, Xiaolong Wang, Yang Xiang, Hui Xu, Hui Wang, Buzhou Tang

It is a time-aware, event-aware and task-adaptive method with the following advantages: 1) modeling heterogeneous information and temporal information in a unified way and considering temporal irregular characteristics locally and globally respectively, 2) taking full advantage of correlations among different types of events via cross-event attention.

Time Series Analysis

Paper
Add Code

From One Hand to Multiple Hands: Imitation Learning for Dexterous Manipulation from Single-Camera Teleoperation

no code implementations • 26 Apr 2022 • Yuzhe Qin, Hao Su, Xiaolong Wang

We propose to perform imitation learning for dexterous manipulation with multi-finger robot hand from human demonstrations, and transfer the policy to the real robot hand.

Imitation Learning

Paper
Add Code

GIFS: Neural Implicit Function for General Shape Representation

1 code implementation • CVPR 2022 • Jianglong Ye, Yuntao Chen, Naiyan Wang, Xiaolong Wang

This limitation leads to tedious data processing (converting non-watertight raw data to watertight) as well as the incapability of representing general object shapes in the real world.

3D Shape Reconstruction

Paper
Code

Learning Generalizable Dexterous Manipulation from Human Grasp Affordance

no code implementations • 5 Apr 2022 • Yueh-Hua Wu, Jiashun Wang, Xiaolong Wang

In this paper, we propose to learn dexterous manipulation using large-scale demonstrations with diverse 3D objects in a category, which are generated from a human grasp affordance model.

Imitation Learning Representation Learning

Paper
Add Code

Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos

1 code implementation • CVPR 2022 • Shaowei Liu, Subarna Tripathi, Somdeb Majumdar, Xiaolong Wang

To tackle this task, we first provide an automatic way to collect trajectory and hotspots labels on large-scale data.

Object

Paper
Code

CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

1 code implementation • CVPR 2022 • Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu

We represent the correspondence maps of different images as warped coordinate frames transformed from a canonical coordinate frame, i. e., the correspondence map, which describes the structure (e. g., the shape of a face), is controlled via a transformation.

Disentanglement

Paper
Code

Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image

1 code implementation • CVPR 2022 • Xuanchi Ren, Xiaolong Wang

Novel view synthesis from a single image has recently attracted a lot of attention, and it has been primarily advanced by 3D deep learning and rendering techniques.

Novel View Synthesis

Paper
Code

Temporal Difference Learning for Model Predictive Control

2 code implementations • 9 Mar 2022 • Nicklas Hansen, Xiaolong Wang, Hao Su

Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases.

Continuous Control Model Predictive Control

268

Paper
Code

GroupViT: Semantic Segmentation Emerges from Text Supervision

2 code implementations • CVPR 2022 • Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang

With only text supervision and without any pixel-level annotations, GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner, i. e., without any further fine-tuning.

Ranked #3 on Unsupervised Semantic Segmentation with Language-image Pre-training on PascalVOC-20

Object Detection Scene Understanding +3

124,527

Paper
Code

Multimodal data matters: language model pre-training over structured and unstructured electronic health records

1 code implementation • 25 Jan 2022 • Sicen Liu, Xiaolong Wang, Yongshuai Hou, Ge Li, Hui Wang, Hui Xu, Yang Xiang, Buzhou Tang

As two important textual modalities in electronic health records (EHR), both structured data (clinical codes) and unstructured data (clinical narratives) have recently been increasingly applied to the healthcare domain.

Decision Making Language Modelling +1

Paper
Code

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

no code implementations • 19 Jan 2022 • Rishabh Jangir, Nicklas Hansen, Sambaran Ghosal, Mohit Jain, Xiaolong Wang

We propose a setting for robotic manipulation in which the agent receives visual feedback from both a third-person camera and an egocentric camera mounted on the robot's wrist.

Reinforcement Learning (RL)

Paper
Add Code

NovelD: A Simple yet Effective Exploration Criterion

1 code implementation • NeurIPS 2021 • Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian

We analyze NovelD thoroughly in MiniGrid and found that empirically it helps the agent explore the environment more uniformly with a focus on exploring beyond the boundary.

Efficient Exploration Montezuma's Revenge +1

Paper
Code

Learning Continuous Environment Fields via Implicit Functions

no code implementations • ICLR 2022 • Xueting Li, Shalini De Mello, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz, Sifei Liu

We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory.

Position Trajectory Prediction

Paper
Add Code

Online Adaptation for Implicit Object Tracking and Shape Reconstruction in the Wild

1 code implementation • 24 Nov 2021 • Jianglong Ye, Yuntao Chen, Naiyan Wang, Xiaolong Wang

Tracking and reconstructing 3D objects from cluttered scenes are the key components for computer vision, robotics and autonomous driving systems.

3D Shape Reconstruction Autonomous Driving +1

Paper
Code

Multi-Person 3D Motion Prediction with Multi-Range Transformers

1 code implementation • NeurIPS 2021 • Jiashun Wang, Huazhe Xu, Medhini Narasimhan, Xiaolong Wang

Thus, instead of predicting each human pose trajectory in isolation, we introduce a Multi-Range Transformers model which contains of a local-range encoder for individual motion and a global-range encoder for social interactions.

Ranked #3 on Multi-Person Pose forecasting on Expi - common actions split

motion prediction Multi-Person Pose forecasting +1

Paper
Code

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

no code implementations • ICCV 2021 • Zihang Lai, Sifei Liu, Alexei A. Efros, Xiaolong Wang

Relying on temporal continuity in videos, our work assumes that the 3D scene structure in nearby video frames remains static.

Disentanglement Novel View Synthesis +2

Paper
Add Code

Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization

1 code implementation • 29 Sep 2021 • Chieko Sarah Imai, Minghao Zhang, Yuchen Zhang, Marcin Kierebinski, Ruihan Yang, Yuzhe Qin, Xiaolong Wang

While Reinforcement Learning (RL) provides a promising paradigm for agile locomotion skills with vision inputs in simulation, it is still very challenging to deploy the RL policy in the real world.

Reinforcement Learning (RL)

194

Paper
Code

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

1 code implementation • 12 Aug 2021 • Yuzhe Qin, Yueh-Hua Wu, Shaowei Liu, Hanwen Jiang, Ruihan Yang, Yang Fu, Xiaolong Wang

While significant progress has been made on understanding hand-object interactions in computer vision, it is still very challenging for robots to perform complex dexterous manipulation.

Imitation Learning motion retargeting +1

Paper
Code

Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers

1 code implementation • ICLR 2022 • Ruihan Yang, Minghao Zhang, Nicklas Hansen, Huazhe Xu, Xiaolong Wang

Our key insight is that proprioceptive states only offer contact measurements for immediate reaction, whereas an agent equipped with visual sensory observations can learn to proactively maneuver environments with obstacles and uneven terrain by anticipating changes in the environment many steps ahead.

Reinforcement Learning (RL)

194

Paper
Code

Test-Time Personalization with a Transformer for Human Pose Estimation

no code implementations • NeurIPS 2021 • Yizhuo Li, Miao Hao, Zonglin Di, Nitesh B. Gundavarapu, Xiaolong Wang

During test time, we personalize and adapt our model by fine-tuning with the self-supervised objective.

Pose Estimation

Paper
Add Code

Sentence-level Online Handwritten Chinese Character Recognition

no code implementations • 4 Jul 2021 • Yunxin Li, Qian Yang, Qingcai Chen, Lin Ma, Baotian Hu, Xiaolong Wang, Yuxin Ding

Single online handwritten Chinese character recognition~(single OLHCCR) has achieved prominent performance.

Sentence Word Embeddings

Paper
Add Code

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

3 code implementations • NeurIPS 2021 • Nicklas Hansen, Hao Su, Xiaolong Wang

Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL in environments with unseen visuals.

Data Augmentation Q-Learning +1

152

Paper
Code

GlyphCRM: Bidirectional Encoder Representation for Chinese Character with its Glyph

no code implementations • 1 Jul 2021 • Yunxin Li, Yu Zhao, Baotian Hu, Qingcai Chen, Yang Xiang, Xiaolong Wang, Yuxin Ding, Lin Ma

Previous works indicate that the glyph of Chinese characters contains rich semantic information and has the potential to enhance the representation of Chinese characters.

Paper
Add Code

Single RGB-D Camera Teleoperation for General Robotic Manipulation

no code implementations • 28 Jun 2021 • Quan Vuong, Yuzhe Qin, Runlin Guo, Xiaolong Wang, Hao Su, Henrik Christensen

We propose a teleoperation system that uses a single RGB-D camera as the human motion capture device.

Paper
Add Code

DAIR: Disentangled Attention Intrinsic Regularization for Safe and Efficient Bimanual Manipulation

no code implementations • 10 Jun 2021 • Minghao Zhang, Pingcheng Jian, Yi Wu, Huazhe Xu, Xiaolong Wang

We address the problem of safely solving complex bimanual robot manipulation tasks with sparse rewards.

Robot Manipulation

Paper
Add Code

Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

no code implementations • CVPR 2021 • Shaowei Liu, Hanwen Jiang, Jiarui Xu, Sifei Liu, Xiaolong Wang

Estimating 3D hand and object pose from a single image is an extremely challenging problem: hands and objects are often self-occluded during interactions, and the 3D annotations are scarce as even humans cannot directly label the ground-truths from a single image perfectly.

Ranked #7 on hand-object pose on HO-3D

hand-object pose Object

Paper
Add Code

Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency

no code implementations • ICCV 2021 • Haiping Wu, Xiaolong Wang

In this paper, we propose a novel contrastive learning method which explores the cross-video relation by using cycle-consistency for general image representation learning.

Action Recognition Contrastive Learning +4

Paper
Add Code

Robust Object Detection via Instance-Level Temporal Cycle Confusion

1 code implementation • ICCV 2021 • Xin Wang, Thomas E. Huang, Benlin Liu, Fisher Yu, Xiaolong Wang, Joseph E. Gonzalez, Trevor Darrell

Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real-world applications.

Object object-detection +2

Paper
Code

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

1 code implementation • ICCV 2021 • Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, Xiaolong Wang

To deal with the large shape variance, we introduce Articulated Signed Distance Functions (A-SDF) to represent articulated shapes with a disentangled latent space, where we have separate codes for encoding shape and articulation.

Test-time Adaptation

Paper
Code

Hand-Object Contact Consistency Reasoning for Human Grasps Generation

no code implementations • ICCV 2021 • Hanwen Jiang, Shaowei Liu, Jiashun Wang, Xiaolong Wang

Based on the hand-object contact consistency, we design novel objectives in training the human grasp generation model and also a new self-supervised task which allows the grasp generation network to be adjusted even during test time.

Grasp Generation Object +1

Paper
Add Code

Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective

5 code implementations • ICCV 2021 • Jiarui Xu, Xiaolong Wang

To learn generalizable representation for correspondence in large-scale, a variety of self-supervised pretext tasks are proposed to explicitly perform object-level or patch-level similarity learning.

Contrastive Learning Object +5

3,866

Paper
Code

Region Similarity Representation Learning

1 code implementation • ICCV 2021 • Tete Xiao, Colorado J Reed, Xiaolong Wang, Kurt Keutzer, Trevor Darrell

We present Region Similarity Representation Learning (ReSim), a new approach to self-supervised representation learning for localization-based tasks such as object detection and segmentation.

Instance Segmentation Object +5

Paper
Code

Solving Compositional Reinforcement Learning Problems via Task Reduction

1 code implementation • ICLR 2021 • Yunfei Li, Yilin Wu, Huazhe Xu, Xiaolong Wang, Yi Wu

We propose a novel learning paradigm, Self-Imitation via Reduction (SIR), for solving compositional reinforcement learning problems.

Continuous Control reinforcement-learning +1

Paper
Code

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

2 code implementations • ICLR 2021 • Zhenggang Tang, Chao Yu, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Du, Yu Wang, Yi Wu

We propose a simple, general and effective technique, Reward Randomization for discovering diverse strategic policies in complex multi-agent games.

Paper
Code

Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency

1 code implementation • ICLR 2021 • Qiang Zhang, Tete Xiao, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

We propose \textit{dynamics cycles} that align dynamic robot behavior across two domains using a cycle-consistency constraint.

Friction Imitation Learning +1

Paper
Code

Learning Continuous Image Representation with Local Implicit Image Function

2 code implementations • CVPR 2021 • Yinbo Chen, Sifei Liu, Xiaolong Wang

How to represent an image?

Ranked #2 on Image Super-Resolution on DIV2K val - 4x upscaling (SSIM metric)

3D Reconstruction Image Super-Resolution

1,208

Paper
Code

BeBold: Exploration Beyond the Boundary of Explored Regions

2 code implementations • 15 Dec 2020 • Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian

In this paper, we analyze the pros and cons of each method and propose the regulated difference of inverse visitation counts as a simple but effective criterion for IR.

Efficient Exploration NetHack

932

Paper
Code

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

1 code implementation • CVPR 2021 • Jiashun Wang, Huazhe Xu, Jingwei Xu, Sifei Liu, Xiaolong Wang

Synthesizing 3D human motion plays an important role in many graphics applications as well as understanding human activity.

Motion Synthesis

Paper
Code

Online Adaptation for Consistent Mesh Reconstruction in the Wild

no code implementations • NeurIPS 2020 • Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz

This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild.

3D Reconstruction

Paper
Add Code

MedWriter: Knowledge-Aware Medical Text Generation

no code implementations • COLING 2020 • Youcheng Pan, Qingcai Chen, Weihua Peng, Xiaolong Wang, Baotian Hu, Xin Liu, Junying Chen, Wenxiu Zhou

To exploit the domain knowledge to guarantee the correctness of generated text has been a hot topic in recent years, especially for high professional domains such as medical.

Text Generation

Paper
Add Code

Generalization in Reinforcement Learning by Soft Data Augmentation

2 code implementations • 26 Nov 2020 • Nicklas Hansen, Xiaolong Wang

Extensive efforts have been made to improve the generalization ability of Reinforcement Learning (RL) methods via domain randomization and data augmentation.

Data Augmentation reinforcement-learning +1

152

Paper
Code

Continual Learning Long Short Term Memory

no code implementations • Findings of the Association for Computational Linguistics 2020 • Xin Guo, Yu Tian, Qinghan Xue, Panos Lampropoulos, Steven Eliuk, Kenneth Barner, Xiaolong Wang

Catastrophic forgetting in neural networks indicates the performance decreasing of deep learning models on previous tasks while learning new tasks.

Continual Learning Spoken Language Understanding

Paper
Add Code

Multi-Agent Collaboration via Reward Attribution Decomposition

2 code implementations • 16 Oct 2020 • Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian

In this work, we propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge and supports ad hoc team play.

Dota 2 Multi-agent Reinforcement Learning +2

2,505

Paper
Code

Reducing Class Collapse in Metric Learning with Easy Positive Sampling

no code implementations • 28 Sep 2020 • Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell

We theoretically prove and empirically show that under reasonable noise assumptions, prevalent embedding losses in metric learning, e. g., triplet loss, tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.

Image Retrieval Metric Learning +1

Paper
Add Code

Hierarchical Style-based Networks for Motion Synthesis

no code implementations • ECCV 2020 • Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Xiaolong Wang, Trevor Darrell

Generating diverse and natural human motion is one of the long-standing goals for creating intelligent characters in the animated world.

Motion Synthesis

Paper
Add Code

Fast ORB-SLAM without Keypoint Descriptors

no code implementations • 22 Aug 2020 • Qiang Fu, Hongshan Yu, Xiaolong Wang, Zhengeng Yang, Hong Zhang, Ajmal Mian

ORB-SLAM2 \cite{orbslam2} is a benchmark method in this domain, however, it consumes significant time for computing descriptors that never get reused unless a frame is selected as a keyframe.

Robotics Computational Geometry I.4.0; I.4.9

Paper
Add Code

What Should Not Be Contrastive in Contrastive Learning

no code implementations • ICLR 2021 • Tete Xiao, Xiaolong Wang, Alexei A. Efros, Trevor Darrell

Recent self-supervised contrastive methods have been able to produce impressive transferable visual representations by learning to be invariant to different data augmentations.

Contrastive Learning

Paper
Add Code

Learning Long-term Visual Dynamics with Region Proposal Interaction Networks

1 code implementation • ICLR 2021 • Haozhi Qi, Xiaolong Wang, Deepak Pathak, Yi Ma, Jitendra Malik

Learning long-term dynamics models is the key to understanding physical common sense.

Ranked #1 on Visual Reasoning on PHYRE-1B-Within

Common Sense Reasoning Object +2

110

Paper
Code

Self-Supervised Policy Adaptation during Deployment

2 code implementations • ICLR 2021 • Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal.

109

Paper
Code

Deep Isometric Learning for Visual Recognition

1 code implementation • ICML 2020 • Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, Jitendra Malik

Initialization, normalization, and skip connections are believed to be three indispensable techniques for training very deep convolutional neural networks and obtaining state-of-the-art performance.

142

Paper
Code

Compositional Video Synthesis with Action Graphs

1 code implementation • 27 Jun 2020 • Amir Bar, Roei Herzig, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson

Our generative model for this task (AG2Vid) disentangles motion and appearance features, and by incorporating a scheduling mechanism for actions facilitates a timely and coordinated video generation.

Scheduling Video Generation +2

Paper
Code

Rethinking preventing class-collapsing in metric learning with margin-based losses

no code implementations • ICCV 2021 • Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell

We theoretically prove and empirically show that under reasonable noise assumptions, margin-based losses tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.

Image Retrieval Metric Learning +1

Paper
Add Code

Class-Aware Domain Adaptation for Improving Adversarial Robustness

no code implementations • 10 May 2020 • Xianxu Hou, Jingxin Liu, Bolei Xu, Xiaolong Wang, Bozhi Liu, Guoping Qiu

To improve the adversarial robustness of neural networks, adversarial training has been proposed to train networks by injecting adversarial examples into the training data.

Adversarial Attack Adversarial Defense +2

Paper
Add Code

State-Only Imitation Learning for Dexterous Manipulation

no code implementations • 7 Apr 2020 • Ilija Radosavovic, Xiaolong Wang, Lerrel Pinto, Jitendra Malik

To tackle this setting, we train an inverse dynamics model and use it to predict actions for state-only demonstrations.

Imitation Learning

Paper
Add Code

Multi-Task Reinforcement Learning with Soft Modularization

1 code implementation • NeurIPS 2020 • Ruihan Yang, Huazhe Xu, Yi Wu, Xiaolong Wang

While training multiple tasks jointly allow the policies to share parameters across different tasks, the optimization problem becomes non-trivial: It remains unclear what parameters in the network should be reused across tasks, and how the gradients from different tasks may interfere with each other.

Ranked #1 on Meta-Learning on MT50

Meta-Learning Multi-Task Learning +2

100

Paper
Code

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

1 code implementation • ICLR 2020 • Qian Long, Zihan Zhou, Abhibav Gupta, Fei Fang, Yi Wu, Xiaolong Wang

In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large.

Multi-agent Reinforcement Learning reinforcement-learning +1

112

Paper
Code

Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning

10 code implementations • ICCV 2021 • Yinbo Chen, Zhuang Liu, Huijuan Xu, Trevor Darrell, Xiaolong Wang

The edge between these two lines of works has yet been underexplored, and the effectiveness of meta-learning in few-shot learning remains unclear.

Few-Shot Learning General Classification

586

Paper
Code

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

1 code implementation • CVPR 2020 • Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations.

Action Recognition Object

138

Paper
Code

A Deep Learning-Based System for PharmaCoNER

no code implementations • WS 2019 • Ying Xiong, Yedan Shen, Yuanhang Huang, Shuai Chen, Buzhou Tang, Xiaolong Wang, Qingcai Chen, Jun Yan, Yi Zhou

The Biological Text Mining Unit at BSC and CNIO organized the first shared task on chemical {\&} drug mention recognition from Spanish medical texts called PharmaCoNER (Pharmacological Substances, Compounds and proteins and Named Entity Recognition track) in 2019, which includes two tracks: one for NER offset and entity classification (track 1) and the other one for concept indexing (track 2).

General Classification named-entity-recognition +2

Paper
Add Code

Family history information extraction via deep joint learning

no code implementations • BMC Medical Informatics and Decision Making 2019 • Xue Shi, Dehuan Jiang, Yuanhang Huang, Xiaolong Wang, Qingcai Chen, Jun Yan and Buzhou Tang

For this task, we propose a system based on deep joint learning methods to extract FH information.

Decision Making

Paper
Add Code

Test-Time Training with Self-Supervision for Generalization under Distribution Shifts

3 code implementations • 29 Sep 2019 • Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei A. Efros, Moritz Hardt

In this paper, we propose Test-Time Training, a general approach for improving the performance of predictive models when training and test data come from different distributions.

Ranked #34 on Language Modelling on LAMBADA

Building change detection for remote sensing images CARLA MAP Leaderboard +6

108

Paper
Code

Joint-task Self-supervised Learning for Temporal Correspondence

2 code implementations • NeurIPS 2019 • Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang

Our learning process integrates two highly related tasks: tracking large image regions \emph{and} establishing fine-grained pixel-level associations between consecutive video frames.

Ranked #73 on Semi-Supervised Video Object Segmentation on DAVIS 2017 (val)

Object Tracking Self-Supervised Learning +2

176

Paper
Code

IS THE LABEL TRUSTFUL: TRAINING BETTER DEEP LEARNING MODEL VIA UNCERTAINTY MINING NET

no code implementations • 25 Sep 2019 • Yang Sun, Abhishek Kolagunda, Steven Eliuk, Xiaolong Wang

During the training stage, we utilize all the available data (labeled and unlabeled) to train the classifier via a semi-supervised generative framework.

Paper
Add Code

Test-Time Training for Out-of-Distribution Generalization

no code implementations • 25 Sep 2019 • Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei A. Efros, Moritz Hardt

We introduce a general approach, called test-time training, for improving the performance of predictive models when test and training data come from different distributions.

Image Classification Out-of-Distribution Generalization +1

Paper
Add Code

Deep Kinship Verification via Appearance-shape Joint Prediction and Adaptation-based Approach

no code implementations • 15 May 2019 • Heming Zhang, Xiaolong Wang, C. -C. Jay Kuo

Kinship verification aims to identify the kin relation between two given face images.

Face Recognition Kinship Verification

Paper
Add Code

Accelerating Proposal Generation Network for \\Fast Face Detection on Mobile Devices

no code implementations • 27 Apr 2019 • Heming Zhang, Xiaolong Wang, Jingwen Zhu, C. -C. Jay Kuo

In this work, we present a proposal generation acceleration framework for real-time face detection.

Face Detection

Paper
Add Code

Learning Correspondence from the Cycle-Consistency of Time

1 code implementation • CVPR 2019 • Xiaolong Wang, Allan Jabri, Alexei A. Efros

We introduce a self-supervised method for learning visual correspondence from unlabeled video.

Ranked #79 on Semi-Supervised Video Object Segmentation on DAVIS 2017 (val)

Optical Flow Estimation Semantic Segmentation +3

716

Paper
Code

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments

no code implementations • CVPR 2019 • Xueting Li, Sifei Liu, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz

In order to predict valid affordances and learn possible 3D human poses in indoor scenes, we need to understand the semantic and geometric structure of a scene as well as its potential interactions with a human.

valid

Paper
Add Code

MICIK: MIning Cross-Layer Inherent Similarity Knowledge for Deep Model Compression

no code implementations • 3 Feb 2019 • Jie Zhang, Xiaolong Wang, Dawei Li, Shalini Ghosh, Abhishek Kolagunda, Yalin Wang

State-of-the-art deep model compression methods exploit the low-rank approximation and sparsity pruning to remove redundant parameters from a learned hidden layer.

Knowledge Distillation Model Compression

Paper
Add Code

Spatio-Temporal Action Graph Networks

1 code implementation • 4 Dec 2018 • Roei Herzig, Elad Levi, Huijuan Xu, Hang Gao, Eli Brosh, Xiaolong Wang, Amir Globerson, Trevor Darrell

Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance.

Activity Recognition Autonomous Driving +3

Paper
Code

Visual Semantic Navigation using Scene Priors

1 code implementation • ICLR 2019 • Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi

Do we use the semantic/functional priors we have built over years to efficiently search and navigate?

Navigate

Paper
Code

Interpretable Intuitive Physics Model

no code implementations • ECCV 2018 • Tian Ye, Xiaolong Wang, James Davidson, Abhinav Gupta

In order to demonstrate that our system models these underlying physical properties, we train our model on collisions of different shapes (cube, cone, cylinder, spheres etc.)

Friction

Paper
Add Code

Videos as Space-Time Region Graphs

no code implementations • ECCV 2018 • Xiaolong Wang, Abhinav Gupta

These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects.

Ranked #34 on Action Classification on Charades (using extra training data)

Action Classification Action Recognition

Paper
Add Code

Dynamically Hierarchy Revolution: DirNet for Compressing Recurrent Neural Network on Mobile Devices

no code implementations • 4 Jun 2018 • Jie Zhang, Xiaolong Wang, Dawei Li, Yalin Wang

Recurrent neural networks (RNNs) achieve cutting-edge performance on a variety of problems.

Dictionary Learning Language Modelling +1

Paper
Add Code

LSDSCC: a Large Scale Domain-Specific Conversational Corpus for Response Generation with Diversity Oriented Evaluation Metrics

no code implementations • NAACL 2018 • Zhen Xu, Nan Jiang, Bingquan Liu, Wenge Rong, Bowen Wu, Baoxun Wang, Zhuoran Wang, Xiaolong Wang

The experimental results have shown that our proposed corpus can be taken as a new benchmark dataset for the NRG task, and the presented metrics are promising to guide the optimization of NRG models by quantifying the diversity of the generated responses reasonably.

Machine Translation Response Generation

Paper
Add Code

Talking Face Generation by Conditional Recurrent Adversarial Network

1 code implementation • 13 Apr 2018 • Yang Song, Jingwen Zhu, Dawei Li, Xiaolong Wang, Hairong Qi

Given an arbitrary face image and an arbitrary speech clip, the proposed work attempts to generating the talking face video with accurate lip synchronization while maintaining smooth transition of both lip and facial movement over the entire video clip.

Constrained Lip-synchronization Video Generation

Paper
Code

Binge Watching: Scaling Affordance Learning from Sitcoms

no code implementations • CVPR 2017 • Xiaolong Wang, Rohit Girdhar, Abhinav Gupta

In this paper, we tackle the challenge of creating one of the biggest dataset for learning affordances.

Paper
Add Code

3D Human Pose Estimation in the Wild by Adversarial Learning

no code implementations • CVPR 2018 • Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, Xiaogang Wang

Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild.

Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (Use Video Sequence metric)

Monocular 3D Human Pose Estimation valid

Paper
Add Code

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

3 code implementations • CVPR 2018 • Xiaolong Wang, Yufei Ye, Abhinav Gupta

Given a learned knowledge graph (KG), our approach takes as input semantic embeddings for each node (representing visual category).

Knowledge Graphs Zero-Shot Learning

912

Paper
Code

LCANet: End-to-End Lipreading with Cascaded Attention-CTC

no code implementations • 13 Mar 2018 • Kai Xu, Dawei Li, Nick Cassimatis, Xiaolong Wang

In this paper, we propose LCANet, an end-to-end deep neural network based lipreading system.

Ranked #2 on Lipreading on GRID corpus (mixed-speech)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Boundary-sensitive Network for Portrait Segmentation

no code implementations • 22 Dec 2017 • Xianzhi Du, Xiaolong Wang, Dawei Li, Jingwen Zhu, Serafettin Tasci, Cameron Upright, Stephen Walsh, Larry Davis

Compared to the general semantic segmentation problem, portrait segmentation has higher precision requirement on boundary area.

Attribute Image Segmentation +3

Paper
Add Code

Non-local Neural Networks

31 code implementations • CVPR 2018 • Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.

Ranked #8 on Action Classification on Toyota Smarthome dataset (using extra training data)

Action Classification Action Recognition +5

26,137

Paper
Code

Predicting Users' Negative Feedbacks in Multi-Turn Human-Computer Dialogues

no code implementations • IJCNLP 2017 • Xin Wang, Jianan Wang, Yuanchao Liu, Xiaolong Wang, Zhuoran Wang, Baoxun Wang

Besides, strategies of obtaining distance supervision data for pre-training are also discussed in this work.

Data Augmentation

Paper
Add Code

Neural Response Generation via GAN with an Approximate Embedding Layer

no code implementations • EMNLP 2017 • Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, Xiaolong Wang, Zhuoran Wang, Chao Qi

This paper presents a Generative Adversarial Network (GAN) to model single-turn short-text conversations, which trains a sequence-to-sequence (Seq2Seq) network for response generation simultaneously with a discriminative classifier that measures the differences between human-produced responses and machine-generated ones.

Generative Adversarial Network Machine Translation +1

Paper
Add Code

DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices

no code implementations • 16 Aug 2017 • Dawei Li, Xiaolong Wang, Deguang Kong

As observed in the experiment, DeepRebirth achieves more than 3x speed-up and 2. 5x run-time memory saving on GoogLeNet with only 0. 4% drop of top-5 accuracy on ImageNet.

Model Compression

Paper
Add Code

Transitive Invariance for Self-supervised Visual Representation Learning

no code implementations • ICCV 2017 • Xiaolong Wang, Kaiming He, Abhinav Gupta

The objects are connected by two types of edges which correspond to two types of invariance: "different instances but a similar viewpoint and category" and "different viewpoints of the same instance".

Multi-Task Learning object-detection +4

Paper
Add Code

Temporal Dynamic Graph LSTM for Action-driven Video Object Detection

no code implementations • ICCV 2017 • Yuan Yuan, Xiaodan Liang, Xiaolong Wang, Dit-yan Yeung, Abhinav Gupta

A common issue, however, is that objects of interest that are not involved in human actions are often absent in global action descriptions known as "missing label".

Ranked #3 on Weakly Supervised Object Detection on Charades

Object object-detection +3

Paper
Add Code

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

4 code implementations • CVPR 2017 • Xiaolong Wang, Abhinav Shrivastava, Abhinav Gupta

We propose to learn an adversarial network that generates examples with occlusions and deformations.

Ranked #20 on Object Detection on PASCAL VOC 2007 (using extra training data)

Object object-detection +1

480

Paper
Code

Incorporating Label Dependency for Answer Quality Tagging in Community Question Answering via CNN-LSTM-CRF

1 code implementation • COLING 2016 • Yang Xiang, Xiaoqiang Zhou, Qingcai Chen, Zhihui Zheng, Buzhou Tang, Xiaolong Wang, Yang Qin

In community question answering (cQA), the quality of answers are determined by the matching degree between question-answer pairs and the correlation among the answers.

Community Question Answering

Paper
Code

Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention

2 code implementations • 30 May 2016 • Yang Liu, Chengjie Sun, Lei Lin, Xiaolong Wang

In our approach, the encoding of sentence is a two-stage process.

Ranked #73 on Natural Language Inference on SNLI

Natural Language Inference Sentence

264

Paper
Code

Incorporating Loose-Structured Knowledge into Conversation Modeling via Recall-Gate LSTM

1 code implementation • 17 May 2016 • Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, Xiaolong Wang

Modeling human conversations is the essence for building satisfying chat-bots with multi-turn dialog ability.

Paper
Code

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

no code implementations • 6 Apr 2016 • Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, Abhinav Gupta

Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects.

Action Recognition Temporal Action Localization

Paper
Add Code

Generative Image Modeling using Style and Structure Adversarial Networks

no code implementations • 17 Mar 2016 • Xiaolong Wang, Abhinav Gupta

Current generative frameworks use end-to-end learning and generate images by sampling from uniform noise distribution.

Generative Adversarial Network Image Generation

Paper
Add Code

Actions ~ Transformations

1 code implementation • CVPR 2016 • Xiaolong Wang, Ali Farhadi, Abhinav Gupta

In this paper, we propose a novel representation for actions by modeling an action as a transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect).

Action Recognition Temporal Action Localization

Paper
Code

Computing Semantic Text Similarity Using Rich Features

no code implementations • PACLIC 2015 • Yang Liu, Chengjie Sun, Lei Lin, Xiaolong Wang, Yuming Zhao

Machine Translation Question Answering +4

Paper
Add Code

Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Memory

no code implementations • IJCNLP 2015 • Xin Wang, Yuanchao Liu, Chengjie Sun, Baoxun Wang, Xiaolong Wang

Feature Engineering Semantic Textual Similarity +1

Paper
Add Code

Chinese Grammatical Error Diagnosis Using Ensemble Learning

no code implementations • WS 2015 • Yang Xiang, Xiaolong Wang, Wenying Han, Qinghua Hong

Ensemble Learning Grammatical Error Detection

Paper
Add Code

Answer Sequence Learning with Neural Networks for Answer Selection in Community Question Answering

no code implementations • IJCNLP 2015 • Xiaoqiang Zhou, Baotian Hu, Qingcai Chen, Buzhou Tang, Xiaolong Wang

In this paper, the answer selection problem in community question answering (CQA) is regarded as an answer sequence labeling task, and a novel approach is proposed based on the recurrent architecture for this problem.

Answer Selection Community Question Answering

Paper
Add Code

HITSZ-ICRC: Exploiting Classification Approach for Answer Selection in Community Question Answering

no code implementations • SEMEVAL 2015 • Yongshuai Hou, Cong Tan, Xiaolong Wang, Yaoyun Zhang, Jun Xu, Qingcai Chen

Answer Selection Community Question Answering +1

Paper
Add Code

HITSZ-ICRC: An Integration Approach for QA TempEval Challenge

no code implementations • SEMEVAL 2015 • Yongshuai Hou, Cong Tan, Qingcai Chen, Xiaolong Wang

Information Retrieval Question Answering

Paper
Add Code

ICRC-HIT: A Deep Learning based Comment Sequence Labeling System for Answer Selection Challenge

no code implementations • SEMEVAL 2015 • Xiaoqiang Zhou, Baotian Hu, Jiaxin Lin, Yang Xiang, Xiaolong Wang

Answer Selection Community Question Answering +3

Paper
Add Code

yiGou: A Semantic Text Similarity Computing System Based on SVM

no code implementations • SEMEVAL 2015 • Yang Liu, Chengjie Sun, Lei Lin, Xiaolong Wang

Machine Translation Question Answering +3

Paper
Add Code

In Defense of the Direct Perception of Affordances

no code implementations • 5 May 2015 • David F. Fouhey, Xiaolong Wang, Abhinav Gupta

The field of functional recognition or affordance estimation from images has seen a revival in recent years.

Paper
Add Code

Unsupervised Learning of Visual Representations using Videos

no code implementations • ICCV 2015 • Xiaolong Wang, Abhinav Gupta

Is strong supervision necessary for learning a good visual representation?

Surface Normal Estimation Visual Tracking

Paper
Add Code

Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection

no code implementations • CVPR 2013 • Xiaolong Wang, Liang Lin, Lichao Huang, Shuicheng Yan

This paper proposes a reconfigurable model to recognize and detect multiclass (or multiview) objects with large variation in appearance.

Object Recognition valid

Paper
Add Code

Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

no code implementations • NeurIPS 2012 • Xiaolong Wang, Liang Lin

A discriminative learning algorithm, extended from the CCCP [23], is proposed to train the model in a dynamical manner: the model structure (e. g., the configuration of the leaf-nodes associated with the or-nodes) is automatically determined with optimizing the multi-layer parameters during the iteration.

Graph Learning

Paper
Add Code

Deep Joint Task Learning for Generic Object Extraction

no code implementations • NeurIPS 2014 • Xiaolong Wang, Liliang Zhang, Liang Lin, Zhujin Liang, WangMeng Zuo

We present a general joint task learning framework, in which each task (either object localization or object segmentation) is tackled via a multi-layer convolutional neural network, and the two networks work collaboratively to boost performance.

Object Object Localization +1

Paper
Add Code

Learning Contour-Fragment-based Shape Model with And-Or Tree Representation

no code implementations • 3 Feb 2015 • Liang Lin, Xiaolong Wang, Wei Yang, Jian-Huang Lai

This paper proposes a simple yet effective method to learn the hierarchical object shape model consisting of local contour fragments, which represents a category of shapes in the form of an And-Or tree.

Clustering Edge Detection +1

Paper
Add Code

Discriminatively Trained And-Or Graph Models for Object Shape Detection

no code implementations • 2 Feb 2015 • Liang Lin, Xiaolong Wang, Wei Yang, Jian-Huang Lai

In this paper, we investigate a novel reconfigurable part-based model, namely And-Or graph model, to recognize object shapes in images.

object-detection Object Detection

Paper
Add Code

An Expressive Deep Model for Human Action Parsing from A Single Image

no code implementations • 2 Feb 2015 • Zhujin Liang, Xiaolong Wang, Rui Huang, Liang Lin

This paper aims at one newly raising task in vision and multimedia research: recognizing human actions from still images.

Action Parsing Action Understanding +2

Paper
Add Code

3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks

no code implementations • 26 Jan 2015 • Keze Wang, Xiaolong Wang, Liang Lin, Meng Wang, WangMeng Zuo

Our model thus advances existing approaches in two aspects: (i) it acts directly on the raw inputs (grayscale-depth data) to conduct recognition instead of relying on hand-crafted features, and (ii) the model structure can be dynamically adjusted accounting for the temporal variations of human activities, i. e. the network configuration is allowed to be partially activated during inference.

Human Activity Recognition

Paper
Add Code

Designing Deep Networks for Surface Normal Estimation

no code implementations • CVPR 2015 • Xiaolong Wang, David F. Fouhey, Abhinav Gupta

We show by incorporating several constraints (man-made, manhattan world) and meaningful intermediate representations (room layout, edge labels) in the architecture leads to state of the art performance on surface normal estimation.

Scene Understanding Surface Normal Estimation