CoSQL is a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions.
33 PAPERS • 1 BENCHMARK
KaggleDBQA is a challenging cross-domain and complex evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions.
17 PAPERS • 1 BENCHMARK
MultiSpider is a large multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese).
2 PAPERS • NO BENCHMARKS YET
TriageSQL is a cross-domain text-to-SQL question intention classification benchmark that requires models to distinguish four types of unanswerable questions from answerable questions.
Numbers Station Text to SQL
0 PAPER • NO BENCHMARKS YET