Search Results for author: Ruohan Gao

Found 25 papers, 10 papers with code

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

no code implementations • 20 Dec 2023 • Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao

We propose a unified multi-modal framework -- Audio-Visual Conversational Attention (AV-CONV), for the joint prediction of conversation behaviors -- speaking and listening -- for both the camera wearer as well as all other social partners present in the egocentric video.

Paper
Add Code

SoundCam: A Dataset for Finding Humans Using Room Acoustics

no code implementations • NeurIPS 2023 • Mason Wang, Samuel Clarke, Jui-Hsien Wang, Ruohan Gao, Jiajun Wu

A room's acoustic properties are a product of the room's geometry, the objects within the room, and their specific positions.

Paper
Add Code

NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities

no code implementations • 2 Nov 2023 • Ruohan Zhang, Sharon Lee, Minjune Hwang, Ayano Hiranaka, Chen Wang, Wensi Ai, Jin Jie Ryan Tan, Shreya Gupta, Yilun Hao, Gabrael Levine, Ruohan Gao, Anthony Norcia, Li Fei-Fei, Jiajun Wu

We present Neural Signal Operated Intelligent Robots (NOIR), a general-purpose, intelligent brain-robot interface system that enables humans to command robots to perform everyday activities through brain signals.

EEG

Paper
Add Code

RealImpact: A Dataset of Impact Sound Fields for Real Objects

no code implementations • CVPR 2023 • Samuel Clarke, Ruohan Gao, Mason Wang, Mark Rau, Julia Xu, Jui-Hsien Wang, Doug L. James, Jiajun Wu

Objects make unique sounds under different perturbations, environment conditions, and poses relative to the listener.

audio-visual learning

Paper
Add Code

Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear

1 code implementation • 1 Jun 2023 • Ruohan Gao, Hao Li, Gokul Dharan, Zhuzhu Wang, Chengshu Li, Fei Xia, Silvio Savarese, Li Fei-Fei, Jiajun Wu

We introduce Sonicverse, a multisensory simulation platform with integrated audio-visual simulation for training household agents that can both see and hear.

Multi-Task Learning Visual Navigation

Paper
Code

The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects

no code implementations • CVPR 2023 • Ruohan Gao, Yiming Dou, Hao Li, Tanmay Agarwal, Jeannette Bohg, Yunzhu Li, Li Fei-Fei, Jiajun Wu

We introduce the ObjectFolder Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning, centered around object recognition, reconstruction, and manipulation with sight, sound, and touch.

Benchmarking Object +1

Paper
Add Code

An Extensible Multimodal Multi-task Object Dataset with Materials

no code implementations • 29 Apr 2023 • Trevor Standley, Ruohan Gao, Dawn Chen, Jiajun Wu, Silvio Savarese

For example, we can train a model to predict the object category from the listing text, or the mass and price from the product listing image.

Attribute Multi-Task Learning +1

Paper
Add Code

Learning Object-Centric Neural Scattering Functions for Free-Viewpoint Relighting and Scene Composition

no code implementations • 10 Mar 2023 • Hong-Xing Yu, Michelle Guo, Alireza Fathi, Yen-Yu Chang, Eric Ryan Chan, Ruohan Gao, Thomas Funkhouser, Jiajun Wu

We propose Object-Centric Neural Scattering Functions (OSFs) for learning to reconstruct object appearance from only images.

Inverse Rendering Object

Paper
Add Code

See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation

no code implementations • 7 Dec 2022 • Hao Li, Yizhi Zhang, Junzhe Zhu, Shaoxiong Wang, Michelle A Lee, Huazhe Xu, Edward Adelson, Li Fei-Fei, Ruohan Gao, Jiajun Wu

Humans use all of their senses to accomplish different tasks in everyday activities.

Decision Making

Paper
Add Code

Differentiable Physics Simulation of Dynamics-Augmented Neural Objects

no code implementations • 17 Oct 2022 • Simon Le Cleac'h, Hong-Xing Yu, Michelle Guo, Taylor A. Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, Mac Schwager

A robot can use this simulation to optimize grasps and manipulation trajectories of neural objects, or to improve the neural object models through gradient-based real-to-simulation transfer.

Friction Object

Paper
Add Code

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

1 code implementation • CVPR 2022 • Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu

We present ObjectFolder 2. 0, a large-scale, multisensory dataset of common household objects in the form of implicit neural representations that significantly enhances ObjectFolder 1. 0 in three aspects.

Object

146

Paper
Code

Visual Acoustic Matching

no code implementations • CVPR 2022 • Changan Chen, Ruohan Gao, Paul Calamia, Kristen Grauman

We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment.

Paper
Add Code

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

no code implementations • 21 Nov 2021 • Rishabh Garg, Ruohan Gao, Kristen Grauman

Binaural audio provides human listeners with an immersive spatial sound experience, but most existing videos lack binaural audio recordings.

Multi-Task Learning Room Impulse Response (RIR)

Paper
Add Code

ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations

no code implementations • 16 Sep 2021 • Ruohan Gao, Yen-Yu Chang, Shivani Mall, Li Fei-Fei, Jiajun Wu

Multisensory object-centric perception, reasoning, and interaction have been a key research topic in recent years.

3D Reconstruction Object +3

Paper
Add Code

VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency

1 code implementation • CVPR 2021 • Ruohan Gao, Kristen Grauman

Given a video, the goal is to extract the speech associated with a face in spite of simultaneous background sounds and/or other human speakers.

Speech Separation

199

Paper
Code

Learning to Set Waypoints for Audio-Visual Navigation

1 code implementation • ICLR 2021 • Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh Kumar Ramakrishnan, Kristen Grauman

In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e. g., a phone ringing in another room).

Visual Navigation

312

Paper
Code

VisualEchoes: Spatial Image Representation Learning through Echolocation

no code implementations • ECCV 2020 • Ruohan Gao, Changan Chen, Ziad Al-Halah, Carl Schissler, Kristen Grauman

Several animal species (e. g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation: a biological sonar used to perceive spatial layout and locate objects in the world.

Monocular Depth Estimation Representation Learning +2

Paper
Add Code

Listen to Look: Action Recognition by Previewing Audio

1 code implementation • CVPR 2020 • Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, Lorenzo Torresani

In the face of the video data deluge, today's expensive clip-level classifiers are increasingly impractical.

Ranked #8 on Action Recognition on ActivityNet

Action Recognition

125

Paper
Code

Co-Separating Sounds of Visual Objects

3 code implementations • ICCV 2019 • Ruohan Gao, Kristen Grauman

Learning how objects sound from video is challenging, since they often heavily overlap in a single audio channel.

Ranked #1 on Audio Denoising on AV-Bench - Wooden Horse

Audio Denoising Audio Source Separation +1

Paper
Code

2.5D Visual Sound

2 code implementations • CVPR 2019 • Ruohan Gao, Kristen Grauman

We devise a deep convolutional neural network that learns to decode the monaural (single-channel) soundtrack into its binaural counterpart by injecting visual information about object and scene configurations.

104

Paper
Code

Learning to Separate Object Sounds by Watching Unlabeled Video

2 code implementations • ECCV 2018 • Ruohan Gao, Rogerio Feris, Kristen Grauman

Our work is the first to learn audio source separation from large-scale "in the wild" videos containing multiple audio sources per video.

Audio Denoising Audio Source Separation +2

Paper
Code

Im2Flow: Motion Hallucination from Static Images for Action Recognition

4 code implementations • CVPR 2018 • Ruohan Gao, Bo Xiong, Kristen Grauman

Second, we show the power of hallucinated flow for recognition, successfully transferring the learned motion into a standard two-stream network for activity recognition.

Action Recognition Hallucination +2

Paper
Code

ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids

no code implementations • ECCV 2018 • Dinesh Jayaraman, Ruohan Gao, Kristen Grauman

We introduce an unsupervised feature learning approach that embeds 3D shape information into a single-view image representation.

Object Object Recognition

Paper
Add Code

On-Demand Learning for Deep Image Restoration

1 code implementation • ICCV 2017 • Ruohan Gao, Kristen Grauman

While machine learning approaches to image restoration offer great promise, current methods risk training models fixated on performing well only for image corruption of a particular level of difficulty---such as a certain level of noise or blur.

Deblurring Image Deblurring +3

Paper
Code

Object-Centric Representation Learning from Unlabeled Videos

no code implementations • 1 Dec 2016 • Ruohan Gao, Dinesh Jayaraman, Kristen Grauman

Compared to existing temporal coherence methods, our idea has the advantage of lightweight preprocessing of the unlabeled video (no tracking required) while still being able to extract object-level regions from which to learn invariances.

Image Classification Object +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.