Search Results for author: Danny Driess

Found 16 papers, 4 papers with code

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

no code implementations • 19 Mar 2024 • Vidhi Jain, Maria Attarian, Nikhil J Joshi, Ayzaan Wahid, Danny Driess, Quan Vuong, Pannag R Sanketi, Pierre Sermanet, Stefan Welker, Christine Chan, Igor Gilitschenski, Yonatan Bisk, Debidatta Dwibedi

Given a video demonstration of a manipulation task and current visual observations, Vid2Robot directly produces robot actions.

Paper
Add Code

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

no code implementations • 12 Feb 2024 • Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, Brian Ichter

In each iteration, the image is annotated with a visual representation of proposals that the VLM can refer to (e. g., candidate robot actions, localizations, or trajectories).

Instruction Following Logical Reasoning +3

Paper
Add Code

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

no code implementations • 22 Jan 2024 • Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, Fei Xia

By training a VLM on such data, we significantly enhance its ability on both qualitative and quantitative spatial VQA.

Question Answering Visual Question Answering

Paper
Add Code

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

1 code implementation • 28 Jul 2023 • Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich

Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.

Object Question Answering +1

276

Paper
Code

Towards Generalist Biomedical AI

no code implementations • 26 Jul 2023 • Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Chuck Lau, Ryutaro Tanno, Ira Ktena, Basil Mustafa, Aakanksha Chowdhery, Yun Liu, Simon Kornblith, David Fleet, Philip Mansfield, Sushant Prakash, Renee Wong, Sunny Virmani, Christopher Semturs, S Sara Mahdavi, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Joelle Barral, Dale Webster, Greg S. Corrado, Yossi Matias, Karan Singhal, Pete Florence, Alan Karthikesalingam, Vivek Natarajan

While considerable work is needed to validate these models in real-world use cases, our results represent a milestone towards the development of generalist biomedical AI systems.

Question Answering Transfer Learning +1

Paper
Add Code

Large Language Models as General Pattern Machines

no code implementations • 10 Jul 2023 • Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng

We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstraction and Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art.

In-Context Learning

Paper
Add Code

PaLM-E: An Embodied Multimodal Language Model

2 code implementations • 6 Mar 2023 • Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence

Large language models excel at a wide range of complex tasks.

Ranked #2 on Visual Question Answering (VQA) on OK-VQA

Language Modelling Large Language Model +2

207

Paper
Code

Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents

no code implementations • NeurIPS 2023 • Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, Brian Ichter

Recent progress in large language models (LLMs) has demonstrated the ability to learn and leverage Internet-scale knowledge through pre-training with autoregressive models.

Language Modelling Text Generation

Paper
Add Code

Reinforcement Learning with Neural Radiance Fields

no code implementations • 3 Jun 2022 • Danny Driess, Ingmar Schubert, Pete Florence, Yunzhu Li, Marc Toussaint

This paper demonstrates that learning state representations with supervision from Neural Radiance Fields (NeRFs) can improve the performance of RL compared to other learned representations or even low-dimensional, hand-engineered state information.

Decoder reinforcement-learning +1

Paper
Add Code

Learning Multi-Object Dynamics with Compositional Neural Radiance Fields

no code implementations • 24 Feb 2022 • Danny Driess, Zhiao Huang, Yunzhu Li, Russ Tedrake, Marc Toussaint

We present a method to learn compositional multi-object dynamics models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks.

Decoder Object

Paper
Add Code

Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics

1 code implementation • NeurIPS 2021 • Ingmar Schubert, Danny Driess, Ozgur S. Oguz, Marc Toussaint

Applications of Reinforcement Learning (RL) in robotics are often limited by high data demand.

Learning to Execute Reinforcement Learning (RL)

Paper
Code

Learning Models as Functionals of Signed-Distance Fields for Manipulation Planning

no code implementations • 2 Oct 2021 • Danny Driess, Jung-Su Ha, Marc Toussaint, Russ Tedrake

We show that representing objects as signed-distance fields not only enables to learn and represent a variety of models with higher accuracy compared to point-cloud and occupancy measure representations, but also that SDF-based models are suitable for optimization-based planning.

Paper
Add Code

Learning Neural Implicit Functions as Object Representations for Robotic Manipulation

no code implementations • 29 Sep 2021 • Jung-Su Ha, Danny Driess, Marc Toussaint

Robotic manipulation planning is the problem of finding a sequence of robot configurations that involves interactions with objects in the scene, e. g., grasp, placement, tool-use, etc.

Open-Ended Question Answering Robot Manipulation

Paper
Add Code

Deep 6-DoF Tracking of Unknown Objects for Reactive Grasping

no code implementations • 9 Mar 2021 • Marc Tuscher, Julian Hörz, Danny Driess, Marc Toussaint

We propose a robotic manipulation system, which is able to grasp a wide variety of formerly unseen objects and is robust against object perturbations and inferior grasping points.

Object Object Tracking +1

Paper
Add Code

Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image

no code implementations • 9 Jun 2020 • Danny Driess, Jung-Su Ha, Marc Toussaint

This is possible by encoding the objects of the scene in images as input to the neural network, instead of a fixed feature vector.

Motion Planning Task and Motion Planning +1

Paper
Add Code

Describing Physics For Physical Reasoning: Force-based Sequential Manipulation Planning

1 code implementation • 28 Feb 2020 • Marc Toussaint, Jung-Su Ha, Danny Driess

Physical reasoning is a core aspect of intelligence in animals and humans.

Robotics

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.