Search Results for author: Naoki Wake

Found 8 papers, 2 papers with code

Position Paper: Agent AI Towards a Holistic Intelligence

no code implementations • 28 Feb 2024 • Qiuyuan Huang, Naoki Wake, Bidipta Sarkar, Zane Durante, Ran Gong, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Noboru Kuno, Ade Famoti, Ashley Llorens, John Langford, Hoi Vo, Li Fei-Fei, Katsu Ikeuchi, Jianfeng Gao

Recent advancements in large foundation models have remarkably enhanced our understanding of sensory information in open-world environments.

Position

Paper
Add Code

An Interactive Agent Foundation Model

no code implementations • 8 Feb 2024 • Zane Durante, Bidipta Sarkar, Ran Gong, Rohan Taori, Yusuke Noda, Paul Tang, Ehsan Adeli, Shrinidhi Kowshika Lakshmikanth, Kevin Schulman, Arnold Milstein, Demetri Terzopoulos, Ade Famoti, Noboru Kuno, Ashley Llorens, Hoi Vo, Katsu Ikeuchi, Li Fei-Fei, Jianfeng Gao, Naoki Wake, Qiuyuan Huang

We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks.

Language Modelling Multi-Task Learning

Paper
Add Code

Agent AI: Surveying the Horizons of Multimodal Interaction

1 code implementation • 7 Jan 2024 • Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Yejin Choi, Katsushi Ikeuchi, Hoi Vo, Li Fei-Fei, Jianfeng Gao

To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions.

Paper
Code

GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration

no code implementations • 20 Nov 2023 • Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi

The computation starts by analyzing the videos with GPT-4V to convert environmental and action details into text, followed by a GPT-4-empowered task planner.

Language Modelling Object +1

Paper
Add Code

Bias in Emotion Recognition with ChatGPT

no code implementations • 18 Oct 2023 • Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi

This technical report explores the ability of ChatGPT in recognizing emotions from text, which can be the basis of various applications like interactive chatbots, data annotation, and mental health analysis.

Emotion Recognition Sentiment Analysis

Paper
Add Code

Text-driven object affordance for guiding grasp-type recognition in multimodal robot teaching

1 code implementation • 27 Feb 2021 • Naoki Wake, Daichi Saito, Kazuhiro Sasabuchi, Hideki Koike, Katsushi Ikeuchi

These findings highlight the significance of object affordance in multimodal robot teaching, regardless of whether real objects are present in the images.

Mixed Reality Object +1

Paper
Code

Understanding Action Sequences based on Video Captioning for Learning-from-Observation

no code implementations • 9 Dec 2020 • Iori Yanokura, Naoki Wake, Kazuhiro Sasabuchi, Katsushi Ikeuchi, Masayuki Inaba

We propose a Learning-from-Observation framework that splits and understands a video of a human demonstration with verbal instructions to extract accurate action sequences.

Video Captioning Video Understanding

Paper
Add Code

Learning-from-Observation Framework: One-Shot Robot Teaching for Grasp-Manipulation-Release Household Operations

no code implementations • 4 Aug 2020 • Naoki Wake, Riku Arakawa, Iori Yanokura, Takuya Kiyokawa, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi

In the context of one-shot robot teaching, the contributions of the paper are: 1) to propose a framework that 1) covers various tasks in grasp-manipulation-release class household operations and 2) mimics human postures during the operations.

Robotics Human-Computer Interaction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.