Search Results for author: Mike Kestemont

Found 18 papers, 3 papers with code

A Dutch Dataset for Cross-lingual Multilabel Toxicity Detection

no code implementations • RANLP (BUCC) 2021 • Ben Burtenshaw, Mike Kestemont

Multi-label toxicity detection is highly prominent, with many research groups, companies, and individuals engaging with it through shared tasks and dedicated venues.

Multi Label Text Classification Multi-Label Text Classification +1

Paper
Add Code

Quantifying Contextual Aspects of Inter-annotator Agreement in Intertextuality Research

no code implementations • EMNLP (LaTeCHCLfL, CLFL, LaTeCH) 2021 • Enrique Manjavacas Arevalo, Laurence Mellerin, Mike Kestemont

We report on an inter-annotator agreement experiment involving instances of text reuse focusing on the well-known case of biblical intertextuality in medieval literature.

Paper
Add Code

Neural Machine Translation of Artwork Titles Using Iconclass Codes

no code implementations • COLING (LaTeCHCLfL, CLFL, LaTeCH) 2020 • Nikolay Banar, Walter Daelemans, Mike Kestemont

We investigate the use of Iconclass in the context of neural machine translation for NL<->EN artwork titles.

Machine Translation Translation

Paper
Add Code

From exemplar to copy: the scribal appropriation of a Hadewijch manuscript computationally explored

1 code implementation • 25 Oct 2022 • Wouter Haverals, Mike Kestemont

This study is devoted to two of the oldest known manuscripts in which the oeuvre of the medieval mystical author Hadewijch has been preserved: Brussels, KBR, 2879-2880 (ms. A) and Brussels, KBR, 2877-2878 (ms. B).

Paper
Code

UAntwerp at SemEval-2021 Task 5: Spans are Spans, stacking a binary word level approach to toxic span detection

no code implementations • SEMEVAL 2021 • Ben Burtenshaw, Mike Kestemont

This paper describes the system developed by the Antwerp Centre for Digital humanities and literary Criticism [UAntwerp] for toxic span detection.

Paper
Add Code

RFC-0000 - RFC on RFCs

no code implementations • TimeMachine RFC 2031 • Frédéric Kaplan, Kevin Baumer, Mike Kestemont, Daniel Jeller

Reaching consensus on the technology options to pursue in a programme as large as Time Machine is a complex issue.

Paper
Add Code

Character-level Transformer-based Neural Machine Translation

no code implementations • 22 May 2020 • Nikolay Banar, Walter Daelemans, Mike Kestemont

To stimulate further research in this area and close the gap with subword-level NMT, we make all our code and models publicly available.

Machine Translation NMT +1

Paper
Add Code

On the Transferability of Winning Tickets in Non-Natural Image Datasets

no code implementations • 11 May 2020 • Matthia Sabatelli, Mike Kestemont, Pierre Geurts

We study the generalization properties of pruned neural networks that are the winners of the lottery ticket hypothesis on datasets of natural images.

Paper
Add Code

Detecting Direct Speech in Multilingual Collection of 19th-century Novels

no code implementations • LREC 2020 • Joanna Byszuk, Micha{\l} Wo{\'z}niak, Mike Kestemont, Albert Le{\'s}niak, Wojciech {\L}ukasik, Artjoms {\v{S}}e{\c{l}}a, Maciej Eder

Fictional prose can be broadly divided into narrative and discursive forms with direct speech being central to any discourse representation (alongside indirect reported speech and free indirect discourse).

Sentence

Paper
Add Code

Generation of Hip-Hop Lyrics with Hierarchical Modeling and Conditional Templates

no code implementations • WS 2019 • Enrique Manjavacas, Mike Kestemont, Folgert Karsdorp

This paper addresses Hip-Hop lyric generation with conditional Neural Language Models.

Language Modelling Text Generation

Paper
Add Code

On the Feasibility of Automated Detection of Allusive Text Reuse

no code implementations • WS 2019 • Enrique Manjavacas, Brian Long, Mike Kestemont

The detection of allusive text reuse is particularly challenging due to the sparse evidence on which allusive references rely---commonly based on none or very few shared words.

Information Retrieval Retrieval

Paper
Add Code

Improving Lemmatization of Non-Standard Languages with Joint Learning

2 code implementations • NAACL 2019 • Enrique Manjavacas, Ákos Kádár, Mike Kestemont

Lemmatization of standard languages is concerned with (i) abstracting over morphological differences and (ii) resolving token-lemma ambiguities of inflected words in order to map them to a dictionary headword.

Decoder Language Modelling +4

Paper
Code

Synthetic Literature: Writing Science Fiction in a Co-Creative Process

no code implementations • WS 2017 • Enrique Manjavacas, Folgert Karsdorp, Ben Burtenshaw, Mike Kestemont

Language Modelling Text Generation

Paper
Add Code

Assessing the Stylistic Properties of Neurally Generated Text in Authorship Attribution

no code implementations • WS 2017 • Enrique Manjavacas, Jeroen De Gussem, Walter Daelemans, Mike Kestemont

Recent applications of neural language models have led to an increased interest in the automatic generation of natural language.

Attribute Authorship Attribution +3

Paper
Add Code

Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

1 code implementation • 4 Mar 2016 • Mike Kestemont, Jeroen De Gussem

In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization.

LEMMA Lemmatization +4

Paper
Code

Function Words in Authorship Attribution. From Black Magic to Theory?

no code implementations • WS 2014 • Mike Kestemont

Authorship Attribution

Paper
Add Code

Mining the Twentieth Century's History from the Time Magazine Corpus

no code implementations • WS 2014 • Mike Kestemont, Folgert Karsdorp, Marten D{\"u}ring

Paper
Add Code

The Netlog Corpus. A Resource for the Study of Flemish Dutch Internet Language

no code implementations • LREC 2012 • Mike Kestemont, Claudia Peersman, Benny De Decker, Guy De Pauw, Kim Luyckx, Roser Morante, Frederik Vaassen, Janneke van de Loo, Walter Daelemans

Although in recent years numerous forms of Internet communication ― such as e-mail, blogs, chat rooms and social network environments ― have emerged, balanced corpora of Internet speech with trustworthy meta-information (e. g. age and gender) or linguistic annotations are still limited.

Lemmatization POS +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.