1 code implementation • ACL 2021 • Adrien Barbaresi
The tool performs significantly better than other open-source solutions in this evaluation and in external benchmarks.
no code implementations • JEPTALNRECITAL 2020 • Ga{\"e}l Lejeune, Adrien Barbaresi
Nous proposons une d{\'e}monstration sur l{'}extraction de contenu textuel dans des pages web ainsi que son {\'e}valuation.
no code implementations • JEPTALNRECITAL 2020 • Adrien Barbaresi, Ga{\"e}l Lejeune
La collecte et l{'}usage opportunistes de donn{\'e}es textuelles tir{\'e}es du web sont sujets {\`a} une s{\'e}rie de probl{\`e}mes {\'e}thiques, m{\'e}thodologiques et {\'e}pist{\'e}mologiques qui m{\'e}ritent l{'}attention de la communaut{\'e} scientifique.
no code implementations • LREC 2020 • Adrien Barbaresi, Ga{\"e}l Lejeune
This article examines extraction methods designed to retain the main text content of web pages and discusses how the extraction could be oriented and evaluated: can and should it be as generic as possible to ensure opportunistic corpus construction?
no code implementations • COLING 2018 • Adrien Barbaresi
The present contribution revolves around efficient approaches to language classification which have been field-tested in the Vardial evaluation campaign.
1 code implementation • WS 2017 • Adrien Barbaresi
The present contribution revolves around a contrastive subword n-gram model which has been tested in the Discriminating between Similar Languages shared task.
no code implementations • WS 2016 • Adrien Barbaresi
In this study conducted on the occasion of the Discriminating between Similar Languages shared task, I introduce an additional decision factor focusing on the token and subtoken level.