Search Results for author: Okko Räsänen

Found 23 papers, 13 papers with code

Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances

1 code implementation16 Jun 2023 Huang Xie, Khazar Khorrami, Okko Räsänen, Tuomas Virtanen

Conversely, the results suggest that using only binary relevances defined by captioning-based audio-caption pairs is sufficient for contrastive learning.

Audio captioning Contrastive Learning +1

Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System

no code implementations5 Jun 2023 Khazar Khorrami, María Andrea Cruz Blandón, Tuomas Virtanen, Okko Räsänen

As a result, we find that sequential training with wav2vec 2. 0 first and VGS next provides higher performance on audio-visual retrieval compared to simultaneous optimization of both learning mechanisms.

Multi-Task Learning Representation Learning +3

BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models

1 code implementation2 Jun 2023 Marvin Lavechin, Yaya Sy, Hadrien Titeux, María Andrea Cruz Blandón, Okko Räsänen, Hervé Bredin, Emmanuel Dupoux, Alejandrina Cristia

Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.

Benchmarking Language Acquisition

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model

2 code implementations19 May 2023 Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath

In this paper, we show that representations capturing syllabic units emerge when training a self-supervised speech model with a visually-grounded training objective.

Language Modelling Masked Language Modeling +3

Evaluation of self-supervised pre-training for automatic infant movement classification using wearable movement sensors

1 code implementation16 May 2023 Einari Vaaras, Manu Airaksinen, Sampsa Vanhatalo, Okko Räsänen

The recently-developed infant wearable MAIJU provides a means to automatically evaluate infants' motor performance in an objective and scalable manner in out-of-hospital settings.

Human Activity Recognition Self-Supervised Learning

Analysing the Impact of Audio Quality on the Use of Naturalistic Long-Form Recordings for Infant-Directed Speech Research

1 code implementation3 May 2023 María Andrea Cruz Blandón, Alejandrina Cristia, Okko Räsänen

Our results show that the use of modest and high audio quality naturalistic speech data result in largely similar conclusions on IDS and ADS in terms of acoustic analyses and modelling experiments.

Language Acquisition Self-Supervised Learning

On Negative Sampling for Contrastive Audio-Text Retrieval

no code implementations8 Nov 2022 Huang Xie, Okko Räsänen, Tuomas Virtanen

With a constant training setting on the retrieval system from [1], we study eight sampling strategies, including hard and semi-hard negative sampling.

Audio to Text Retrieval Contrastive Learning +2

Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition

1 code implementation21 Jun 2022 Einari Vaaras, Manu Airaksinen, Okko Räsänen

In this paper, we combine CPC and multiple dimensionality reduction methods in search of functioning practices for clustering-based AL. Our experiments for simulating speech emotion recognition system deployment show that both the local and global topology of the feature space can be successfully used for AL, and that CPC can be used to improve clustering-based AL performance over traditional signal features.

Active Learning Clustering +3

Towards Learning to Speak and Hear Through Multi-Agent Communication over a Continuous Acoustic Channel

no code implementations4 Nov 2021 Kevin Eloff, Okko Räsänen, Herman A. Engelbrecht, Arnu Pretorius, Herman Kamper

Multi-agent reinforcement learning has been used as an effective means to study emergent communication between agents, yet little focus has been given to continuous acoustic communication.

Language Acquisition Multi-agent Reinforcement Learning +3

Unsupervised Audio-Caption Aligning Learns Correspondences between Individual Sound Events and Textual Phrases

1 code implementation6 Oct 2021 Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen

We investigate unsupervised learning of correspondences between sound events and textual phrases through aligning audio clips with textual captions describing the content of a whole audio clip.

Event Detection Retrieval +1

Language-Independent Approach for Automatic Computation of Vowel Articulation Features in Dysarthric Speech Assessment

1 code implementation16 Aug 2021 Yuanyuan Liu, Nelly Penttilä, Tiina Ihalainen, Juulia Lintula, Rachel Convey, Okko Räsänen

Experimental results on a Finnish PD speech corpus demonstrate the efficacy and reliability of the proposed automatic method in deriving VAI, VSA, FCR and F2i/F2u (the second formant ratio for vowels /i/ and /u/).

Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models

1 code implementation5 Jul 2021 Khazar Khorrami, Okko Räsänen

We compare the alignment performance using our proposed evaluation metrics to the semantic retrieval task commonly used to evaluate VGS models.

Cross-Modal Retrieval Object Localization +2

Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections

no code implementations25 Nov 2020 Huang Xie, Okko Räsänen, Tuomas Virtanen

In this paper, we study zero-shot learning in audio classification through factored linear and nonlinear acoustic-semantic projections between audio instances and sound classes.

Audio Classification General Classification +2

Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics

2 code implementations3 Aug 2020 Okko Räsänen, María Andrea Cruz Blandón

One potential approach to this problem is to use dynamic time warping (DTW) to find well-aligning patterns from the speech data.

Dynamic Time Warping

Analysis of Predictive Coding Models for Phonemic Representation Learning in Small Datasets

no code implementations8 Jul 2020 María Andrea Cruz Blandón, Okko Räsänen

The present study investigates the behaviour of two predictive coding models, Autoregressive Predictive Coding and Contrastive Predictive Coding, in a phoneme discrimination task (ABX task) for two languages with different dataset sizes.

Language Acquisition Representation Learning

Automatic Posture and Movement Tracking of Infants with Wearable Movement Sensors

no code implementations21 Sep 2019 Manu Airaksinen, Okko Räsänen, Elina Ilén, Taru Häyrinen, Anna Kivi, Viviana Marchi, Anastasia Gallen, Sonja Blom, Anni Varhe, Nico Kaartinen, Leena Haataja, Sampsa Vanhatalo

These data were manually annotated for infant posture and movement based on video recordings of the sessions, and using a novel annotation scheme specifically designed to assess the overall movement pattern of infants in the given age group.

SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech

1 code implementation24 Jun 2019 Shreyas Seshadri, Okko Räsänen

Automatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learning children in different environments.

A computational model of early language acquisition from audiovisual experiences of young infants

no code implementations24 Jun 2019 Okko Räsänen, Khazar Khorrami

Earlier research has suggested that human infants might use statistical dependencies between speech and non-linguistic multimodal input to bootstrap their language learning before they know how to segment words from running speech.

Language Acquisition

Cannot find the paper you are looking for? You can Submit a new open access paper.