no code implementations • SEMEVAL 2020 • Carlos Santos Armendariz, Matthew Purver, Senja Pollak, Nikola Ljube{\v{s}}i{\'c}, Matej Ul{\v{c}}ar, Ivan Vuli{\'c}, Mohammad Taher Pilehvar
This paper presents the Graded Word Similarity in Context (GWSC) task which asked participants to predict the effects of context on human perception of similarity in English, Croatian, Slovene and Finnish.
no code implementations • LREC 2020 • Simon Krek, {\v{S}}pela Arhar Holdt, Toma{\v{z}} Erjavec, Jaka {\v{C}}ibej, Andraz Repar, Polona Gantar, Nikola Ljube{\v{s}}i{\'c}, Iztok Kosem, Kaja Dobrovoljc
We describe a new version of the Gigafida reference corpus of Slovene.
no code implementations • WS 2019 • Nikola Ljube{\v{s}}i{\'c}, Kaja Dobrovoljc
We present experiments on Slovenian, Croatian and Serbian morphosyntactic annotation and lemmatisation between the former state-of-the-art for these three languages and one of the best performing systems at the CoNLL 2018 shared task, the Stanford NLP neural pipeline.
no code implementations • WS 2018 • Nikola Ljube{\v{s}}i{\'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er
Both datasets are published in encrypted form, to enable others to perform experiments on detecting content to be deleted without revealing potentially inappropriate content.
no code implementations • COLING 2018 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal Van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain
We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects.
no code implementations • COLING 2018 • Nikola Ljube{\v{s}}i{\'c}
This paper presents two systems taking part in the Morphosyntactic Tagging of Tweets shared task on Slovene, Croatian and Serbian data, organized inside the VarDial Evaluation Campaign.
1 code implementation • WS 2018 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er, Anita Peti-Stanti{\'c}
We show that the notions of concreteness and imageability are highly predictable both within and across languages, with a moderate loss of up to 20{\%} in correlation when predicting across languages.
no code implementations • WS 2017 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er, Toma{\v{z}} Erjavec
In this paper we present a set of experiments and analyses on predicting the gender of Twitter users based on language-independent features extracted either from the text or the metadata of users{'} tweets.
no code implementations • WS 2017 • Darja Fi{\v{s}}er, Toma{\v{z}} Erjavec, Nikola Ljube{\v{s}}i{\'c}
In this paper we present the legal framework, dataset and annotation schema of socially unacceptable discourse practices on social networking platforms in Slovenia.
no code implementations • WS 2017 • Tanja Samard{\v{z}}i{\'c}, Mirjana Starovi{\'c}, {\v{Z}}eljko Agi{\'c}, Nikola Ljube{\v{s}}i{\'c}
The paper documents the procedure of building a new Universal Dependencies (UDv2) treebank for Serbian starting from an existing Croatian UDv1 treebank and taking into account the other Slavic UD annotation guidelines.
no code implementations • WS 2017 • Marcos Zampieri, Shervin Malmasi, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann, Yves Scherrer, No{\"e}mi Aepli
We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL{'}2017.
no code implementations • WS 2017 • Nikola Ljube{\v{s}}i{\'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er
We remove more than half of the error of the standard tagger when applied to non-standard texts by training it on a combination of standard and non-standard training data, while enriching the data representation with external resources removes additional 11 percent of the error.
no code implementations • WS 2016 • Shervin Malmasi, Marcos Zampieri, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann
We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial{'}2016 workshop at COLING{'}2016.
no code implementations • COLING 2016 • Nikola Ljube{\v{s}}i{\'c}, Tanja Samard{\v{z}}i{\'c}, Curdin Derungs
In this paper we present a newly developed tool that enables researchers interested in spatial variation of language to define a geographic perimeter of interest, collect data from the Twitter streaming API published in that perimeter, filter the obtained data by language and country, define and extract variables of interest and analyse the extracted variables by one spatial statistic and two spatial visualisations.
no code implementations • WS 2016 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er
In this paper we present a series of experiments on discriminating between private and corporate accounts on Twitter.
no code implementations • WS 2016 • Maja Popovi{\'c}, Kostadin Cholakov, Valia Kordoni, Nikola Ljube{\v{s}}i{\'c}
Massive Open Online Courses have been growing rapidly in size and impact.
no code implementations • LREC 2016 • Vanja {\v{S}}tefanec, Nikola Ljube{\v{s}}i{\'c}, Jelena Kuva{\v{c}} Kraljevi{\'c}
In the paper authors present the Croatian corpus of non-professional written language.
1 code implementation • LREC 2016 • Nikola Ljube{\v{s}}i{\'c}, Toma{\v{z}} Erjavec
In this paper we present a tagger developed for inflectionally rich languages for which both a training corpus and a lexicon are available.
no code implementations • LREC 2016 • Nikola Ljube{\v{s}}i{\'c}, Miquel Espl{\`a}-Gomis, Antonio Toral, Sergio Ortiz Rojas, Filip Klubi{\v{c}}ka
This paper presents an approach for building large monolingual corpora and, at the same time, extracting parallel data by crawling the top-level domain of a given language of interest.
no code implementations • LREC 2016 • Nikola Ljube{\v{s}}i{\'c}, Filip Klubi{\v{c}}ka, {\v{Z}}eljko Agi{\'c}, Ivo-Pavao Jazbec
In this paper we present newly developed inflectional lexcions and manually annotated corpora of Croatian and Serbian.
no code implementations • LREC 2016 • Nikola Ljube{\v{s}}i{\'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er
In computer-mediated communication, Latin-based scripts users often omit diacritics when writing.
no code implementations • EAMT 2016 • Antonio Toral, Tommi A. Pirinen, Andy Way, Gema Ram{\'\i}rez-S{\'a}nchez, Sergio Ortiz Rojas, Raphael Rubino, Miquel Espl{\`a}, Mikel L. Forcada, Vassilis Papavassiliou, Prokopis Prokopidis, Nikola Ljube{\v{s}}i{\'c}
no code implementations • LREC 2014 • {\v{Z}}eljko Agi{\'c}, Nikola Ljube{\v{s}}i{\'c}
We build and evaluate statistical models for lemmatization, morphosyntactic tagging, named entity recognition and dependency parsing on top of SETimes. HR and the test sets, providing the state of the art in all the tasks.
no code implementations • LREC 2014 • Miquel Espl{\`a}-Gomis, Filip Klubi{\v{c}}ka, Nikola Ljube{\v{s}}i{\'c}, Sergio Ortiz-Rojas, Vassilis Papavassiliou, Prokopis Prokopidis
We used both tools for crawling 21 multilingual websites from the tourism domain to build a domain-specific English―Croatian parallel corpus.
1 code implementation • LREC 2014 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er, Toma{\v{z}} Erjavec
This paper presents TweetCaT, an open-source Python tool for building Twitter corpora that was designed for smaller languages.
no code implementations • LREC 2014 • Raphael Rubino, Antonio Toral, Nikola Ljube{\v{s}}i{\'c}, Gema Ram{\'\i}rez-S{\'a}nchez
This paper presents a novel approach for parallel data generation using machine translation and quality estimation.
no code implementations • LREC 2014 • Nikola Ljube{\v{s}}i{\'c}, Antonio Toral
In this paper we present the construction process of a web corpus of Catalan built from the content of the . cat top-level domain.
no code implementations • LREC 2012 • Darja Fi{\v{s}}er, Nikola Ljube{\v{s}}i{\'c}, Ozren Kubelka
This paper presents an approach to extract translation equivalents from comparable corpora for polysemous nouns.