no code implementations • LREC 2020 • Milo{\v{s}} Jakub{\'\i}{\v{c}}ek, Vojt{\v{e}}ch Kov{\'a}{\v{r}}, Pavel Rychl{\'y}, Vit Suchomel
In this paper we discuss some of the current challenges in web corpus building that we faced in the recent years when expanding the corpora in Sketch Engine.
no code implementations • WS 2016 • Ond{\v{r}}ej Herman, V{\'\i}t Suchomel, V{\'\i}t Baisa, Pavel Rychl{\'y}
In this paper we investigate two approaches to discrimination of similar languages: Expectation{--}maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods.
no code implementations • LREC 2016 • Vojt{\v{e}}ch Kov{\'a}{\v{r}}, Monika Mo{\v{c}}iarikov{\'a}, Pavel Rychl{\'y}
The paper describes automatic definition finding implemented within the leading corpus query and management tool, Sketch Engine.
no code implementations • LREC 2014 • Ond{\v{r}}ej Bojar, Vojt{\v{e}}ch Diatka, Pavel Rychl{\'y}, Pavel Stra{\v{n}}{\'a}k, V{\'\i}t Suchomel, Ale{\v{s}} Tamchyna, Daniel Zeman
HindEnCorp consists of 274k parallel sentences (3. 9 million Hindi and 3. 8 million English tokens).
no code implementations • LREC 2014 • Adam Kilgarriff, Pavel Rychl{\'y}, Milo{\v{s}} Jakub{\'\i}{\v{c}}ek, Vojt{\v{e}}ch Kov{\'a}{\v{r}}, V{\'\i}t Baisa, Lucia Kocincov{\'a}
The NLP researcher or application-builder often wonders {``}what corpus should I use, or should I build one of my own?
no code implementations • LREC 2012 • Jan Pomik{\'a}lek, Milo{\v{s}} Jakub{\'\i}{\v{c}}ek, Pavel Rychl{\'y}
This work describes the process of creation of a 70 billion word text corpus of English.
no code implementations • LREC 2012 • Franti{\v{s}}ek Cvr{\v{c}}ek, Karel Pala, Pavel Rychl{\'y}
During the 4 year project the large legal terminological dictionary of Czech was created in the form of the electronic lexical database enriched with a hierarchical ontology of legal terms.