no code implementations • 2 Mar 2022 • Oscar Giles, Kasra Hosseini, Grigorios Mingas, Oliver Strickson, Louise Bowler, Camila Rangel Smith, Harrison Wilde, Jen Ning Lim, Bilal Mateen, Kasun Amarasinghe, Rayid Ghani, Alison Heppenstall, Nik Lomax, Nick Malleson, Martin O'Reilly, Sebastian Vollmerteke
Synthetic datasets are often presented as a silver-bullet solution to the problem of privacy-preserving data publishing.
1 code implementation • 30 Nov 2021 • Kasra Hosseini, Daniel C. S. Wilson, Kaspar Beelen, Katherine McDonough
We present MapReader, a free, open-source software library written in Python for analyzing large map collections (scanned or born-digital).
2 code implementations • 24 May 2021 • Kasra Hosseini, Kaspar Beelen, Giovanni Colavizza, Mariona Coll Ardanuy
We present four types of neural language models trained on a large historical dataset of books in English, published between 1760-1900 and comprised of ~5. 1 billion tokens.
1 code implementation • EMNLP 2020 • Kasra Hosseini, Federico Nanni, Mariona Coll Ardanuy
We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking.
2 code implementations • 17 Sep 2020 • Mariona Coll Ardanuy, Kasra Hosseini, Katherine McDonough, Amrey Krause, Daniel van Strien, Federico Nanni
We report its performance on candidate selection in the context of the downstream task of toponym resolution, both on existing datasets and on a new manually-annotated resource of nineteenth-century English OCR'd text.
1 code implementation • COLING 2020 • Mariona Coll Ardanuy, Federico Nanni, Kaspar Beelen, Kasra Hosseini, Ruth Ahnert, Jon Lawrence, Katherine McDonough, Giorgia Tolfo, Daniel CS Wilson, Barbara McGillivray
This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text.