Earnings-21, a 39-hour corpus of earnings calls containing entity-dense speech from nine different financial sectors. This corpus is intended to benchmark ASR (Automatic Speech Recognition) systems in the wild with special attention towards named entity recognition.
8 PAPERS • NO BENCHMARKS YET
GUM is an open source multilayer English corpus of richly annotated texts from twelve text types. Annotations include:
8 PAPERS • 1 BENCHMARK