Benchmarking

1524 papers with code • 1 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Habitat: A Platform for Embodied AI Research

facebookresearch/habitat-sim ICCV 2019

We present Habitat, a platform for research in embodied artificial intelligence (AI).

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

amenra/ranx 28 Nov 2016

The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering.

Multitask learning and benchmarking with clinical time series data

yerevann/mimic3-benchmarks 22 Mar 2017

Health care is one of the most exciting frontiers in data mining and machine learning.

A large annotated medical image dataset for the development and evaluation of segmentation algorithms

iyerkrithika21/mesh2ssm_2023 25 Feb 2019

Semantic segmentation of medical images aims to associate a pixel with a label in a medical image without human initialization.

COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting

numbbo/coco 29 Mar 2016

We introduce COCO, an open source platform for Comparing Continuous Optimizers in a black-box setting.

On Evaluation of Embodied Navigation Agents

facebookresearch/habitat-api 18 Jul 2018

Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence.

Benchmarking Natural Language Understanding Services for building Conversational Agents

xliuhw/NLU-Evaluation-Data 13 Mar 2019

We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer.

Torchreid: A Library for Deep Learning Person Re-Identification in Pytorch

KaiyangZhou/deep-person-reid 22 Oct 2019

Person re-identification (re-ID), which aims to re-identify people across different camera views, has been significantly advanced by deep learning in recent years, particularly with convolutional neural networks (CNNs).

Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks

uoe-agents/epymarl 14 Jun 2020

Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult.

Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet Datasets

CAMMA-public/cholect45 11 Apr 2022

We also develop a metrics library, ivtmetrics, for model evaluation on surgical triplets.