1 code implementation • 16 Jun 2023 • Huang Xie, Khazar Khorrami, Okko Räsänen, Tuomas Virtanen
Conversely, the results suggest that using only binary relevances defined by captioning-based audio-caption pairs is sufficient for contrastive learning.
no code implementations • 5 Jun 2023 • Khazar Khorrami, María Andrea Cruz Blandón, Tuomas Virtanen, Okko Räsänen
As a result, we find that sequential training with wav2vec 2. 0 first and VGS next provides higher performance on audio-visual retrieval compared to simultaneous optimization of both learning mechanisms.
1 code implementation • 2 Jun 2023 • Marvin Lavechin, Yaya Sy, Hadrien Titeux, María Andrea Cruz Blandón, Okko Räsänen, Hervé Bredin, Emmanuel Dupoux, Alejandrina Cristia
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
2 code implementations • 19 May 2023 • Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath
In this paper, we show that representations capturing syllabic units emerge when training a self-supervised speech model with a visually-grounded training objective.
1 code implementation • 16 May 2023 • Einari Vaaras, Manu Airaksinen, Sampsa Vanhatalo, Okko Räsänen
The recently-developed infant wearable MAIJU provides a means to automatically evaluate infants' motor performance in an objective and scalable manner in out-of-hospital settings.
1 code implementation • 3 May 2023 • María Andrea Cruz Blandón, Alejandrina Cristia, Okko Räsänen
Our results show that the use of modest and high audio quality naturalistic speech data result in largely similar conclusions on IDS and ADS in terms of acoustic analyses and modelling experiments.
no code implementations • 8 Nov 2022 • Huang Xie, Okko Räsänen, Tuomas Virtanen
With a constant training setting on the retrieval system from [1], we study eight sampling strategies, including hard and semi-hard negative sampling.
1 code implementation • 21 Jun 2022 • Einari Vaaras, Manu Airaksinen, Okko Räsänen
In this paper, we combine CPC and multiple dimensionality reduction methods in search of functioning practices for clustering-based AL. Our experiments for simulating speech emotion recognition system deployment show that both the local and global topology of the feature space can be successfully used for AL, and that CPC can be used to improve clustering-based AL performance over traditional signal features.
no code implementations • 4 Nov 2021 • Kevin Eloff, Okko Räsänen, Herman A. Engelbrecht, Arnu Pretorius, Herman Kamper
Multi-agent reinforcement learning has been used as an effective means to study emergent communication between agents, yet little focus has been given to continuous acoustic communication.
1 code implementation • 6 Oct 2021 • Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen
We investigate unsupervised learning of correspondences between sound events and textual phrases through aligning audio clips with textual captions describing the content of a whole audio clip.
1 code implementation • 29 Sep 2021 • Khazar Khorrami, Okko Räsänen
We review the extent that the audiovisual aspect of LLH is supported by the existing computational studies.
1 code implementation • 16 Aug 2021 • Yuanyuan Liu, Nelly Penttilä, Tiina Ihalainen, Juulia Lintula, Rachel Convey, Okko Räsänen
Experimental results on a Finnish PD speech corpus demonstrate the efficacy and reliability of the proposed automatic method in deriving VAI, VSA, FCR and F2i/F2u (the second formant ratio for vowels /i/ and /u/).
1 code implementation • 14 Jul 2021 • Afra Alishahia, Grzegorz Chrupała, Alejandrina Cristia, Emmanuel Dupoux, Bertrand Higy, Marvin Lavechin, Okko Räsänen, Chen Yu
We present the visually-grounded language modelling track that was introduced in the Zero-Resource Speech challenge, 2021 edition, 2nd round.
1 code implementation • 5 Jul 2021 • Khazar Khorrami, Okko Räsänen
We compare the alignment performance using our proposed evaluation metrics to the semantic retrieval task commonly used to evaluate VGS models.
no code implementations • 2 Jul 2021 • Manu Airaksinen, Sampsa Vanhatalo, Okko Räsänen
In addition, we explore the benefits of data augmentation methods in ideal and non-ideal recording conditions.
no code implementations • 14 Jun 2021 • Einari Vaaras, Sari Ahlqvist-Björkroth, Konstantinos Drossos, Okko Räsänen
Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes.
no code implementations • 25 Nov 2020 • Huang Xie, Okko Räsänen, Tuomas Virtanen
In this paper, we study zero-shot learning in audio classification through factored linear and nonlinear acoustic-semantic projections between audio instances and sound classes.
2 code implementations • 3 Aug 2020 • Okko Räsänen, María Andrea Cruz Blandón
One potential approach to this problem is to use dynamic time warping (DTW) to find well-aligning patterns from the speech data.
no code implementations • 8 Jul 2020 • María Andrea Cruz Blandón, Okko Räsänen
The present study investigates the behaviour of two predictive coding models, Autoregressive Predictive Coding and Contrastive Predictive Coding, in a phoneme discrimination task (ABX task) for two languages with different dataset sizes.
no code implementations • 21 Sep 2019 • Manu Airaksinen, Okko Räsänen, Elina Ilén, Taru Häyrinen, Anna Kivi, Viviana Marchi, Anastasia Gallen, Sonja Blom, Anni Varhe, Nico Kaartinen, Leena Haataja, Sampsa Vanhatalo
These data were manually annotated for infant posture and movement based on video recordings of the sessions, and using a novel annotation scheme specifically designed to assess the overall movement pattern of infants in the given age group.
1 code implementation • 24 Jun 2019 • Shreyas Seshadri, Okko Räsänen
Automatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learning children in different environments.
no code implementations • 24 Jun 2019 • Okko Räsänen, Khazar Khorrami
Earlier research has suggested that human infants might use statistical dependencies between speech and non-linguistic multimodal input to bootstrap their language learning before they know how to segment words from running speech.
no code implementations • ACL 2017 • Paul Michel, Okko Räsänen, Roland Thiollière, Emmanuel Dupoux
Phonemic segmentation of speech is a critical step of speech recognition systems.