Benchmarking

1524 papers with code • 1 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Benchmarking

Trend	Dataset	Best Model	Paper	Code	Compare
	Wiki-40B	OutEffHop-Bert_base			See all

Datasets

Most implemented papers

Most implemented Social Latest No code

Habitat: A Platform for Embodied AI Research

facebookresearch/habitat-sim • • ICCV 2019

We present Habitat, a platform for research in embodied artificial intelligence (AI).

Paper
Code

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

amenra/ranx • 28 Nov 2016

The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering.

Paper
Code

Multitask learning and benchmarking with clinical time series data

yerevann/mimic3-benchmarks • • 22 Mar 2017

Health care is one of the most exciting frontiers in data mining and machine learning.

Paper
Code

A large annotated medical image dataset for the development and evaluation of segmentation algorithms

iyerkrithika21/mesh2ssm_2023 • • 25 Feb 2019

Semantic segmentation of medical images aims to associate a pixel with a label in a medical image without human initialization.

Paper
Code

COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting

numbbo/coco • 29 Mar 2016

We introduce COCO, an open source platform for Comparing Continuous Optimizers in a black-box setting.

Paper
Code

On Evaluation of Embodied Navigation Agents

facebookresearch/habitat-api • • 18 Jul 2018

Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence.

Paper
Code

Benchmarking Natural Language Understanding Services for building Conversational Agents

xliuhw/NLU-Evaluation-Data • 13 Mar 2019

We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer.

Paper
Code

Torchreid: A Library for Deep Learning Person Re-Identification in Pytorch

KaiyangZhou/deep-person-reid • • 22 Oct 2019

Person re-identification (re-ID), which aims to re-identify people across different camera views, has been significantly advanced by deep learning in recent years, particularly with convolutional neural networks (CNNs).

Paper
Code

Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks

uoe-agents/epymarl • • 14 Jun 2020

Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult.

Paper
Code

Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet Datasets

CAMMA-public/cholect45 • • 11 Apr 2022

We also develop a metrics library, ivtmetrics, for model evaluation on surgical triplets.

Paper
Code

Benchmarking

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result