Search Results for author: Marco Cognetta

Found 6 papers, 0 papers with code

An Analysis of BPE Vocabulary Trimming in Neural Machine Translation

no code implementations • 30 Mar 2024 • Marco Cognetta, Tatsuya Hiraoka, Naoaki Okazaki, Rico Sennrich, Yuval Pinter

We explore threshold vocabulary trimming in Byte-Pair Encoding subword tokenization, a postprocessing step that replaces rare subwords with their component subwords.

Machine Translation Translation

Paper
Add Code

Two Counterexamples to Tokenization and the Noiseless Channel

no code implementations • 22 Feb 2024 • Marco Cognetta, Vilém Zouhar, Sangwhan Moon, Naoaki Okazaki

In Tokenization and the Noiseless Channel (Zouhar et al., 2023a), R\'enyi efficiency is suggested as an intrinsic mechanism for evaluating a tokenizer: for NLP tasks, the tokenizer which leads to the highest R\'enyi efficiency of the unigram distribution should be chosen.

Machine Translation

Paper
Add Code

SoftRegex: Generating Regex from Natural Language Descriptions using Softened Regex Equivalence

no code implementations • IJCNLP 2019 • Jun-U Park, Sang-Ki Ko, Marco Cognetta, Yo-Sub Han

We continue the study of generating se-mantically correct regular expressions from natural language descriptions (NL).

Paper
Add Code

On the Compression of Lexicon Transducers

no code implementations • WS 2019 • Marco Cognetta, Cyril Allauzen, Michael Riley

Indeed, a delicate balance between comprehensiveness, speed, and memory must be struck to conform to device requirements while providing a good user experience. In this paper, we describe a compression scheme for lexicons when represented as finite-state transducers.

Paper
Add Code

Online Infix Probability Computation for Probabilistic Finite Automata

no code implementations • ACL 2019 • Marco Cognetta, Yo-Sub Han, Soon Chan Kwon

Probabilistic finite automata (PFAs) are com- mon statistical language model in natural lan- guage and speech processing.

Language Modelling

Paper
Add Code

Incremental Computation of Infix Probabilities for Probabilistic Finite Automata

no code implementations • EMNLP 2018 • Marco Cognetta, Yo-Sub Han, Soon Chan Kwon

The problem of computing infix probabilities of strings when the pattern distribution is given by a probabilistic context-free grammar or by a probabilistic finite automaton is already solved, yet it was open to compute the infix probabilities in an incremental manner.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.