no code implementations • EAMT 2020 • Ondřej Bojar, Dominik Macháček, Sangeet Sagar, Otakar Smrž, Jonáš Kratochvíl, Ebrahim Ansari, Dario Franceschini, Chiara Canton, Ivan Simonini, Thai-Son Nguyen, Felix Schneider, Sebastian Stücker, Alex Waibel, Barry Haddow, Rico Sennrich, Philip Williams
ELITR (European Live Translator) project aims to create a speech translation system for simultaneous subtitling of conferences and online meetings targetting up to 43 languages.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 27 Jul 2023 • Dominik Macháček, Raj Dabre, Ondřej Bojar
Whisper is one of the recent state-of-the-art multilingual speech recognition and translation models, however, it is not designed for real time transcription.
no code implementations • 26 May 2023 • Dominik Macháček, Peter Polák, Ondřej Bojar, Raj Dabre
Automatic speech translation is sensitive to speech recognition errors, but in a multilingual scenario, the same content may be available in various languages via simultaneous interpreting, dubbing or subtitling.
1 code implementation • 16 Nov 2022 • Dominik Macháček, Ondřej Bojar, Raj Dabre
There have been several meta-evaluation studies on the correlation between human ratings and offline machine translation (MT) evaluation metrics such as BLEU, chrF2, BertScore and COMET.
no code implementations • 4 Mar 2022 • Dávid Javorský, Dominik Macháček, Ondřej Bojar
Our results show that the subtitling layout or flicker have a little effect on comprehension, in contrast to machine translation itself and individual competence.
no code implementations • 25 Feb 2022 • Tom Kocmi, Dominik Macháček, Ondřej Bojar
Machine translation is for us a prime example of deep learning applications where human skills and learning capabilities are taken as a benchmark that many try to match and surpass.
no code implementations • 17 Jun 2021 • Dominik Macháček, Matúš Žilinec, Ondřej Bojar
Interpreters facilitate multi-lingual meetings but the affordable set of languages is often smaller than what is needed.
no code implementations • 18 Sep 2020 • Dominik Macháček, Ondřej Bojar
Furthermore, we propose a way how to estimate the overall usability of the combination of automatic translation and subtitling by measuring the quality, latency, and stability on a test set, and propose an improved measure for translation latency.
no code implementations • WS 2020 • Dominik Macháček, Jonáš Kratochvíl, Sangeet Sagar, Matúš Žilinec, Ondřej Bojar, Thai-Son Nguyen, Felix Schneider, Philip Williams, Yuekun Yao
This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020.
no code implementations • 24 Oct 2019 • Thuong-Hai Pham, Dominik Macháček, Ondřej Bojar
The data manipulation techniques, recommended in previous works, prove ineffective in large data settings.
no code implementations • 2 Aug 2019 • Dominik Macháček, Jonáš Kratochvíl, Tereza Vojtěchová, Ondřej Bojar
We present a test corpus of audio recordings and transcriptions of presentations of students' enterprises together with their slides and web-pages.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • WS 2019 • Martin Popel, Dominik Macháček, Michal Auersperger, Ondřej Bojar, Pavel Pecina
We describe our NMT systems submitted to the WMT19 shared task in English-Czech news translation.
no code implementations • 29 Jul 2019 • Ivana Kvapilíková, Dominik Macháček, Ondřej Bojar
In this paper we describe the CUNI translation system used for the unsupervised news shared task of the ACL 2019 Fourth Conference on Machine Translation (WMT19).
1 code implementation • 14 Jun 2018 • Dominik Macháček, Jonáš Vidra, Ondřej Bojar
The state of the art of handling rich morphology in neural machine translation (NMT) is to break word forms into subword units, so that the overall vocabulary size of these units fits the practical limits given by the NMT model and GPU memory capacity.