no code implementations • 28 Feb 2024 • Qiuyuan Huang, Naoki Wake, Bidipta Sarkar, Zane Durante, Ran Gong, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Noboru Kuno, Ade Famoti, Ashley Llorens, John Langford, Hoi Vo, Li Fei-Fei, Katsu Ikeuchi, Jianfeng Gao
Recent advancements in large foundation models have remarkably enhanced our understanding of sensory information in open-world environments.
no code implementations • 8 Feb 2024 • Zane Durante, Bidipta Sarkar, Ran Gong, Rohan Taori, Yusuke Noda, Paul Tang, Ehsan Adeli, Shrinidhi Kowshika Lakshmikanth, Kevin Schulman, Arnold Milstein, Demetri Terzopoulos, Ade Famoti, Noboru Kuno, Ashley Llorens, Hoi Vo, Katsu Ikeuchi, Li Fei-Fei, Jianfeng Gao, Naoki Wake, Qiuyuan Huang
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks.
1 code implementation • 7 Jan 2024 • Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Yejin Choi, Katsushi Ikeuchi, Hoi Vo, Li Fei-Fei, Jianfeng Gao
To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions.
no code implementations • 20 Nov 2023 • Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi
The computation starts by analyzing the videos with GPT-4V to convert environmental and action details into text, followed by a GPT-4-empowered task planner.
no code implementations • 18 Oct 2023 • Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi
This technical report explores the ability of ChatGPT in recognizing emotions from text, which can be the basis of various applications like interactive chatbots, data annotation, and mental health analysis.
1 code implementation • 27 Feb 2021 • Naoki Wake, Daichi Saito, Kazuhiro Sasabuchi, Hideki Koike, Katsushi Ikeuchi
These findings highlight the significance of object affordance in multimodal robot teaching, regardless of whether real objects are present in the images.
no code implementations • 9 Dec 2020 • Iori Yanokura, Naoki Wake, Kazuhiro Sasabuchi, Katsushi Ikeuchi, Masayuki Inaba
We propose a Learning-from-Observation framework that splits and understands a video of a human demonstration with verbal instructions to extract accurate action sequences.
no code implementations • 4 Aug 2020 • Naoki Wake, Riku Arakawa, Iori Yanokura, Takuya Kiyokawa, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi
In the context of one-shot robot teaching, the contributions of the paper are: 1) to propose a framework that 1) covers various tasks in grasp-manipulation-release class household operations and 2) mimics human postures during the operations.
Robotics Human-Computer Interaction