Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks.
We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet.
To fix the noisy state annotations, we use crowdsourced workers to re-annotate state and utterances based on the original utterances in the dataset.
Ranked #4 on Multi-domain Dialogue State Tracking on MULTIWOZ 2.1
We wrap our dataset and model in an easy-to-use Python library, which supports downloading and retrieving top-k word translations in any of the supported language pairs as well as computing top-k word translations for custom parallel corpora.
Pre-training text representations have led to significant improvements in many areas of natural language processing.
We present in this work a new dataset of coreference annotations for works of literature in English, covering 29, 103 mentions in 210, 532 tokens from 100 works of fiction.
Spoken language translation has recently witnessed a resurgence in popularity, thanks to the development of end-to-end models and the creation of new corpora, such as Augmented LibriSpeech and MuST-C.
Recently, with the surge of transformers based models, language-specific BERT based models have proven to be very efficient at language understanding, provided they are pre-trained on a very large corpus.
Ranked #1 on Sentiment Analysis on AJGT