Real-world multiobject, multigrasp detection

A deep learning architecture is proposed to predict graspable locations for robotic manipulation. It considers situations where no, one, or multiple object(s) are seen. By defining the learning problem to be classified with null hypothesis competition instead of regression, the deep neural network with red, green, blue and depth (RGB-D) image input predicts multiple grasp candidates for a single object or multiple objects, in a single shot. The method outperforms state-of-the-art approaches on the Cornell dataset with 96.0% and 96.1% accuracy on imagewise and object-wise splits, respectively. Evaluation on a multiobject dataset illustrates the generalization capability of the architecture. Grasping experiments achieve 96.0% grasp localization and 89.0% grasping success rates on a test set of household objects. The real-time process takes less than 0.25 s from image to plan.

PDF Abstract

Datasets


Introduced in the Paper:

Grasp MultiObject

Used in the Paper:

Cornell
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Robotic Grasping Cornell Grasp Dataset ResNet50 multi-grasp predictor 5 fold cross validation 96 # 3

Methods