2 code implementations • 24 May 2021 • Kasra Hosseini, Kaspar Beelen, Giovanni Colavizza, Mariona Coll Ardanuy
We present four types of neural language models trained on a large historical dataset of books in English, published between 1760-1900 and comprised of ~5. 1 billion tokens.
1 code implementation • EMNLP 2020 • Kasra Hosseini, Federico Nanni, Mariona Coll Ardanuy
We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking.
2 code implementations • 17 Sep 2020 • Mariona Coll Ardanuy, Kasra Hosseini, Katherine McDonough, Amrey Krause, Daniel van Strien, Federico Nanni
We report its performance on candidate selection in the context of the downstream task of toponym resolution, both on existing datasets and on a new manually-annotated resource of nineteenth-century English OCR'd text.
1 code implementation • COLING 2020 • Mariona Coll Ardanuy, Federico Nanni, Kaspar Beelen, Kasra Hosseini, Ruth Ahnert, Jon Lawrence, Katherine McDonough, Giorgia Tolfo, Daniel CS Wilson, Barbara McGillivray
This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text.