no code implementations • 30 Apr 2024 • Andrei-Marius Avram, Andreea Iuga, George-Vlad Manolache, Vlad-Cristian Matei, Răzvan-Gabriel Micliuş, Vlad-Andrei Muntean, Manuel-Petru Sorlescu, Dragoş-Andrei Şerban, Adrian-Dinu Urse, Vasile Păiş, Dumitru-Clementin Cercel
This work introduces HistNERo, the first Romanian corpus for Named Entity Recognition (NER) in historical newspapers.
no code implementations • 30 Jun 2023 • Andrei-Marius Avram, Răzvan-Alexandru Smădu, Vasile Păiş, Dumitru-Clementin Cercel, Radu Ion, Dan Tufiş
With the rise of bidirectional encoder representations from Transformer models in natural language processing, the speech community has adopted some of their development methodologies.
no code implementations • 17 Jun 2023 • Andrei-Marius Avram, Verginica Barbu Mititelu, Vasile Păiş, Dumitru-Clementin Cercel, Ştefan Trăuşan-Matu
Correctly identifying multiword expressions (MWEs) is an important task for most natural language processing systems since their misidentification can result in ambiguity and misunderstanding of the underlying text.
no code implementations • CLIB 2022 • Radu Ion, Andrei-Marius Avram, Vasile Păiş, Maria Mitrofan, Verginica Barbu Mititelu, Elena Irimia, Valentin Badea
The paper will present the QA system and its integration with the Romanian language technologies portal RELATE, the COVID-19 data set and different evaluations of the QA performance.
1 code implementation • LREC 2022 • Andrei-Marius Avram, Darius Catrina, Dumitru-Clementin Cercel, Mihai Dascălu, Traian Rebedea, Vasile Păiş, Dan Tufiş
In this work, we introduce three light and fast versions of distilled BERT models for the Romanian language: Distil-BERT-base-ro, Distil-RoBERT-base, and DistilMulti-BERT-base-ro.
1 code implementation • 23 Nov 2021 • Andrei-Marius Avram, Vasile Păiş, Dan Tufiş
One of the fundamental functionalities for accepting a socially assistive robot is its communication capabilities with other agents in the environment.
no code implementations • 22 Nov 2021 • Vasile Păiş, Radu Ion, Andrei-Marius Avram, Elena Irimia, Verginica Barbu Mititelu, Maria Mitrofan
The paper contains a detailed description of the acquisition process, corpus statistics as well as an evaluation of the corpus influence on a low-latency ASR system as well as a dialogue component.
no code implementations • 21 Nov 2021 • Vasile Păiş, Dan Tufiş
To this end, the previously created sets of word embeddings (based on word occurrences) on the CoRoLa corpus (P\u{a}i\c{s} and Tufi\c{s}, 2018) are and will be further augmented with new representations learned from the same corpus by using specific features such as lemmas and parts of speech.
no code implementations • 21 Nov 2021 • Vasile Păiş, Dan Tufiş
Ensuring proper punctuation and letter casing is a key pre-processing step towards applying complex natural language processing algorithms.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2