The Alexa Point of View dataset is point of view conversion dataset, a parallel corpus of messages spoken to a virtual assistant and the converted messages for delivery. The dataset contains parallel corpus of input (input column) message and POV converted messages (output column). An example of a pair is tell @CN@ that i'll be late [\t] hi @CN@, @SCN@ would like you to know that they'll be late. The input and pov-converted output pair is tab separated. @CN@ tag is a placeholder for the contact name (receiver) and @SCN@ tag is a placeholder for source contact name (sender). The total dataset has 46563 pairs. This data is then test/train/dev split into 6985 pairs/32594 pairs/6985 pairs.
1 PAPER • 1 BENCHMARK
The Kite database is a multi-modal dataset for the control of unmanned aerial vehicles (UAVs). There are three modalities present in the dataset:
1 PAPER • NO BENCHMARKS YET
This is a dataset of 3 English books which do not contain the letter "e" in them. This dataset includes all of "Gadsby" by Ernest Vincent Wright, all of "A Void" by Georges Perec, and almost all of "Eunoia" by Christian Bok (except for the single chapter that uses the letter "e" in it)
The social vision and language dataset is a large-scale multimodal dataset designed for research into social contextual learning.
Verified Smart Contracts is a dataset of real Ethereum smart contracts, containing both Solidity and Vyper source code. It consists of every deployed Ethereum smart contract as of 1st of April 2022, whose been verified on Etherscan and has a least one transaction. A total of 186,397 unique smart contracts are provided, filtered down from 2,217,692 smart contracts. The dataset contains 53,843,305 lines of code.
This is the Big-Bench version of our language-based movie recommendation dataset