no code implementations • 4 Mar 2024 • Joonas Kalda, Clément Pagés, Ricard Marxer, Tanel Alumäe, Hervé Bredin
A major drawback of supervised speech separation (SSep) systems is their reliance on synthetic data, leading to poor real-world generalization.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 26 Oct 2023 • Tanel Alumäe, Jiaming Kong, Daniil Robnikov
This paper describes Tallinn University of Technology (TalTech) systems developed for the ASRU MADASR 2023 Challenge.
no code implementations • 14 May 2022 • Joonas Kalda, Tanel Alumäe
Instead, the proposed method uses an objective function which encourages the model to predict a single positive label within a specified collar.
no code implementations • 14 May 2022 • Tanel Alumäe, Kunnar Kukk
For the unconstrained task, we relied on both externally available pretrained models as well as external data: the multilingual XLSR-53 wav2vec2. 0 model was finetuned on the VoxLingua107 corpus for the language recognition task, and finally finetuned on the provided target language training data, augmented with CommonVoice data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 31 Mar 2022 • Kunnar Kukk, Tanel Alumäe
Language identification from speech is a common preprocessing step in many spoken language processing systems.
2 code implementations • 25 Nov 2020 • Jörgen Valk, Tanel Alumäe
Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.
1 code implementation • 18 May 2020 • Adrian Łańcucki, Jan Chorowski, Guillaume Sanchez, Ricard Marxer, Nanxin Chen, Hans J. G. A. Dolfing, Sameer Khurana, Tanel Alumäe, Antoine Laurent
We show that the codebook learning can suffer from poor initialization and non-stationarity of clustered encoder outputs.
no code implementations • 11 Jan 2019 • Tanel Alumäe, Ottokar Tilk, Asadullah
Out-of-vocabulary words are recovered using a phoneme n-gram based decoding subgraph and a FST-based phoneme-to-grapheme model.
no code implementations • 22 Jun 2018 • Martin Karu, Tanel Alumäe
The method uses speaker diarization to find unique speakers in each recording, and i-vectors to project the speech of each speaker to a fixed-dimensional vector.
no code implementations • WS 2017 • Ottokar Tilk, Tanel Alumäe
Recent neural headline generation models have shown great results, but are generally trained on very large datasets.