Search Results for author: Pavel Rychl{\'y}

Found 8 papers, 0 papers with code

Current Challenges in Web Corpus Building

no code implementations • LREC 2020 • Milo{\v{s}} Jakub{\'\i}{\v{c}}ek, Vojt{\v{e}}ch Kov{\'a}{\v{r}}, Pavel Rychl{\'y}, Vit Suchomel

In this paper we discuss some of the current challenges in web corpus building that we faced in the recent years when expanding the corpora in Sketch Engine.

Paper
Add Code

DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation--Maximization and Chunk-based Language Model

no code implementations • WS 2016 • Ond{\v{r}}ej Herman, V{\'\i}t Suchomel, V{\'\i}t Baisa, Pavel Rychl{\'y}

In this paper we investigate two approaches to discrimination of similar languages: Expectation{--}maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods.

Language Modelling

Paper
Add Code

Finding Definitions in Large Corpora with Sketch Engine

no code implementations • LREC 2016 • Vojt{\v{e}}ch Kov{\'a}{\v{r}}, Monika Mo{\v{c}}iarikov{\'a}, Pavel Rychl{\'y}

The paper describes automatic definition finding implemented within the leading corpus query and management tool, Sketch Engine.

Management

Paper
Add Code

HindEnCorp - Hindi-English and Hindi-only Corpus for Machine Translation

no code implementations • LREC 2014 • Ond{\v{r}}ej Bojar, Vojt{\v{e}}ch Diatka, Pavel Rychl{\'y}, Pavel Stra{\v{n}}{\'a}k, V{\'\i}t Suchomel, Ale{\v{s}} Tamchyna, Daniel Zeman

HindEnCorp consists of 274k parallel sentences (3. 9 million Hindi and 3. 8 million English tokens).

Machine Translation Translation

Paper
Add Code

Extrinsic Corpus Evaluation with a Collocation Dictionary Task

no code implementations • LREC 2014 • Adam Kilgarriff, Pavel Rychl{\'y}, Milo{\v{s}} Jakub{\'\i}{\v{c}}ek, Vojt{\v{e}}ch Kov{\'a}{\v{r}}, V{\'\i}t Baisa, Lucia Kocincov{\'a}

The NLP researcher or application-builder often wonders {``}what corpus should I use, or should I build one of my own?

Paper
Add Code

Finding Terms in Corpora for Many Languages with the Sketch Engine

no code implementations • EACL 2014 • Milo{\v{s}} Jakub{\'\i}{\v{c}}ek, Adam Kilgarriff, Vojt{\v{e}}ch Kov{\'a}{\v{r}}, Pavel Rychl{\'y}, V{\'\i}t Suchomel

Paper
Add Code

Building a 70 billion word corpus of English from ClueWeb

no code implementations • LREC 2012 • Jan Pomik{\'a}lek, Milo{\v{s}} Jakub{\'\i}{\v{c}}ek, Pavel Rychl{\'y}

This work describes the process of creation of a 70 billion word text corpus of English.

Machine Translation Management +2

Paper
Add Code

Legal electronic dictionary for Czech

no code implementations • LREC 2012 • Franti{\v{s}}ek Cvr{\v{c}}ek, Karel Pala, Pavel Rychl{\'y}

During the 4 year project the large legal terminological dictionary of Czech was created in the form of the electronic lexical database enriched with a hierarchical ontology of legal terms.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.