TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Robot Manipulation	RLBench	PerAct (Evaluated in RVT)	Succ. Rate (18 tasks, 100 demo/task)	49.4	# 4
Robot Manipulation	RLBench	PerAct (Evaluated in RVT)	Training Time	16	# 3
Robot Manipulation	RLBench	PerAct (Evaluated in RVT)	Inference Speed (fps)	4.9	# 2
Robot Manipulation	RLBench	PerAct (Evaluated in RVT)	Input Image Size	128	# 1
Robot Manipulation	RLBench	Image-BC CNN	Succ. Rate (18 tasks, 100 demo/task)	1.3	# 8
Robot Manipulation	RLBench	Image-BC CNN	Input Image Size	128	# 1
Robot Manipulation	RLBench	Image-BC VIT	Succ. Rate (18 tasks, 100 demo/task)	1.3	# 8
Robot Manipulation	RLBench	Image-BC VIT	Input Image Size	128	# 1
Robot Manipulation	RLBench	PerAct	Succ. Rate (18 tasks, 100 demo/task)	42.7	# 6
Robot Manipulation	RLBench	PerAct	Training Time	16	# 3
Robot Manipulation	RLBench	PerAct	Input Image Size	128	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/perceiver-actor-a-multi-task-transformer-for/robot-manipulation-on-rlbench)](https://paperswithcode.com/sota/robot-manipulation-on-rlbench?p=perceiver-actor-a-multi-task-transformer-for)`

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

12 Sep 2022 · Mohit Shridhar, Lucas Manuelli, Dieter Fox ·

Transformers have revolutionized vision and natural language processing with their ability to scale with large datasets. But in robotic manipulation, data is both limited and expensive. Can manipulation still benefit from Transformers with the right problem formulation? We investigate this question with PerAct, a language-conditioned behavior-cloning agent for multi-task 6-DoF manipulation. PerAct encodes language goals and RGB-D voxel observations with a Perceiver Transformer, and outputs discretized actions by ``detecting the next best voxel action''. Unlike frameworks that operate on 2D images, the voxelized 3D observation and action space provides a strong structural prior for efficiently learning 6-DoF actions. With this formulation, we train a single multi-task Transformer for 18 RLBench tasks (with 249 variations) and 7 real-world tasks (with 18 variations) from just a few demonstrations per task. Our results show that PerAct significantly outperforms unstructured image-to-action agents and 3D ConvNet baselines for a wide range of tabletop tasks.

PDF Abstract

Code

Add Remove Mark official

peract/peract official

↳ Quickstart in

Colab

283

Tasks

Add Remove

Robot Manipulation

Datasets

RLBench

Results from the Paper

Edit

Ranked #4 on Robot Manipulation on RLBench

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Robot Manipulation	RLBench	Image-BC CNN	Succ. Rate (18 tasks, 100 demo/task)	1.3	# 8	Compare
Robot Manipulation	RLBench	Image-BC CNN	Input Image Size	128	# 1	Compare
Robot Manipulation	RLBench	PerAct	Succ. Rate (18 tasks, 100 demo/task)	42.7	# 6	Compare
			Training Time	16	# 3	Compare
			Input Image Size	128	# 1	Compare

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Compare
Robot Manipulation	RLBench	PerAct (Evaluated in RVT)	Succ. Rate (18 tasks, 100 demo/task)	49.4	# 4	See all
			Training Time	16	# 3	See all
			Inference Speed (fps)	4.9	# 2	See all
			Input Image Size	128	# 1	See all
Robot Manipulation	RLBench	Image-BC VIT	Succ. Rate (18 tasks, 100 demo/task)	1.3	# 8	See all
Robot Manipulation	RLBench	Image-BC VIT	Input Image Size	128	# 1	See all

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit