Search Results for author: Tuomas Virtanen

Found 72 papers, 34 papers with code

Speaker Distance Estimation in Enclosures from Single-Channel Audio

1 code implementation • 26 Mar 2024 • Michael Neri, Archontis Politis, Daniel Krause, Marco Carli, Tuomas Virtanen

Distance estimation from audio plays a crucial role in various applications, such as acoustic scene analysis, sound source localization, and room modeling.

Paper
Code

From Weak to Strong Sound Event Labels using Adaptive Change-Point Detection and Active Learning

1 code implementation • 13 Mar 2024 • John Martinsson, Olof Mogren, Maria Sandsten, Tuomas Virtanen

In this work we propose an audio recording segmentation method based on an adaptive change point detection (A-CPD) for machine guided weak label annotation of audio recording segments.

Active Learning Change Point Detection

Paper
Code

Neural Ambisonics encoding for compact irregular microphone arrays

no code implementations • 11 Jan 2024 • Mikko Heikkinen, Archontis Politis, Tuomas Virtanen

Ambisonics encoding of microphone array signals can enable various spatial audio applications, such as virtual reality or telepresence, but it is typically designed for uniformly-spaced spherical microphone arrays.

Paper
Add Code

Attention-Driven Multichannel Speech Enhancement in Moving Sound Source Scenarios

no code implementations • 17 Dec 2023 • Yuzhu Wang, Archontis Politis, Tuomas Virtanen

The clean speech clips from WSJ0 are employed for simulating speech signals of moving speakers in a reverberant environment.

Speech Enhancement

Paper
Add Code

Representation Learning for Audio Privacy Preservation using Source Separation and Robust Adversarial Learning

no code implementations • 9 Aug 2023 • Diep Luong, Minh Tran, Shayan Gharib, Konstantinos Drossos, Tuomas Virtanen

Privacy preservation has long been a concern in smart acoustic monitoring systems, where speech can be passively recorded along with a target signal in the system's operating environment.

Privacy Preserving Representation Learning

Paper
Add Code

Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances

1 code implementation • 16 Jun 2023 • Huang Xie, Khazar Khorrami, Okko Räsänen, Tuomas Virtanen

Conversely, the results suggest that using only binary relevances defined by captioning-based audio-caption pairs is sufficient for contrastive learning.

Audio captioning Contrastive Learning +1

Paper
Code

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

1 code implementation • NeurIPS 2023 • Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e. g., sounds of footsteps come from the feet of a walker.

Sound Event Localization and Detection

Paper
Code

Permutation Invariant Recurrent Neural Networks for Sound Source Tracking Applications

no code implementations • 14 Jun 2023 • David Diaz-Guerra, Archontis Politis, Antonio Miguel, Jose R. Beltran, Tuomas Virtanen

Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vector to store their state.

Paper
Add Code

Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System

no code implementations • 5 Jun 2023 • Khazar Khorrami, María Andrea Cruz Blandón, Tuomas Virtanen, Okko Räsänen

As a result, we find that sequential training with wav2vec 2. 0 first and VGS next provides higher performance on audio-visual retrieval compared to simultaneous optimization of both learning mechanisms.

Multi-Task Learning Representation Learning +3

Paper
Add Code

Attention-Based Methods For Audio Question Answering

no code implementations • 31 May 2023 • Parthasaarathy Sudarsanam, Tuomas Virtanen

On the yes/no binary classification task, our proposed model achieves an accuracy of 68. 3% compared to 62. 7% in the reference model.

Audio Question Answering Binary Classification +1

Paper
Add Code

Adversarial Representation Learning for Robust Privacy Preservation in Audio

1 code implementation • 29 Apr 2023 • Shayan Gharib, Minh Tran, Diep Luong, Konstantinos Drossos, Tuomas Virtanen

In this study, we propose a novel adversarial training method for learning representations of audio recordings that effectively prevents the detection of speech activity from the latent features of the recordings.

Event Detection Representation Learning +1

Paper
Code

Multi-Channel Masking with Learnable Filterbank for Sound Source Separation

no code implementations • 14 Mar 2023 • Wang Dai, Archontis Politis, Tuomas Virtanen

Specifically, each mask is used to multiply the corresponding channel's 2D representation, and the masked output of all channels are then summed.

Paper
Add Code

On Negative Sampling for Contrastive Audio-Text Retrieval

no code implementations • 8 Nov 2022 • Huang Xie, Okko Räsänen, Tuomas Virtanen

With a constant training setting on the retrieval system from [1], we study eight sampling strategies, including hard and semi-hard negative sampling.

Audio to Text Retrieval Contrastive Learning +2

Paper
Add Code

Position tracking of a varying number of sound sources with sliding permutation invariant training

no code implementations • 26 Oct 2022 • David Diaz-Guerra, Archontis Politis, Tuomas Virtanen

Recent data- and learning-based sound source localization (SSL) methods have shown strong performance in challenging acoustic scenarios.

Position

Paper
Add Code

Language-based Audio Retrieval Task in DCASE 2022 Challenge

no code implementations • 20 Sep 2022 • Huang Xie, Samuel Lipping, Tuomas Virtanen

Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset.

Audio captioning Retrieval

Paper
Add Code

Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network

1 code implementation • 4 Aug 2022 • Yanxiong Li, Wenchang Cao, Konstantinos Drossos, Tuomas Virtanen

Automatic estimation of domestic activities from audio can be used to solve many problems, such as reducing the labor cost for nursing the elderly people.

Clustering

Paper
Code

Language-based Audio Retrieval Task in DCASE 2022 Challenge

1 code implementation • 13 Jun 2022 • Huang Xie, Samuel Lipping, Tuomas Virtanen

Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset.

Audio captioning Retrieval

Paper
Code

Zero-Shot Audio Classification using Image Embeddings

no code implementations • 10 Jun 2022 • Duygu Dogan, Huang Xie, Toni Heittola, Tuomas Virtanen

The results show that the classification performance is highly sensitive to the semantic relation between test and training classes and textual and image embeddings can reach up to the semantic acoustic embeddings when the seen and unseen classes are semantically similar.

Audio Classification Zero-shot Audio Classification +1

Paper
Add Code

Low-complexity acoustic scene classification in DCASE 2022 Challenge

no code implementations • 8 Jun 2022 • Irene Martín-Morató, Francesco Paissan, Alberto Ancilotto, Toni Heittola, Annamaria Mesaros, Elisabetta Farella, Alessio Brutti, Tuomas Virtanen

The provided baseline system is a convolutional neural network which employs post-training quantization of parameters, resulting in 46. 5 K parameters, and 29. 23 million multiply-and-accumulate operations (MMACs).

Acoustic Scene Classification Classification +2

Paper
Add Code

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

2 code implementations • 4 Jun 2022 • Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format.

Ranked #1 on Sound Event Localization and Detection on STARSS22

Sound Event Localization and Detection

Paper
Code

Self-supervised Learning of Audio Representations from Audio-Visual Data using Spatial Alignment

no code implementations • 2 Jun 2022 • Shanshan Wang, Archontis Politis, Annamaria Mesaros, Tuomas Virtanen

In addition to the correspondence, AVSA also learns from the spatial location of acoustic and visual content.

Acoustic Scene Classification Action Recognition +6

Paper
Add Code

Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering

no code implementations • 20 Apr 2022 • Samuel Lipping, Parthasaarathy Sudarsanam, Konstantinos Drossos, Tuomas Virtanen

Audio question answering (AQA) is a multimodal translation task where a system analyzes an audio signal and a natural language question, to generate a desirable natural language answer.

Audio Question Answering Question Answering

Paper
Add Code

Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers

2 code implementations • 29 Oct 2021 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen

Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem.

Classification Direction of Arrival Estimation +2

Paper
Code

Unsupervised Audio-Caption Aligning Learns Correspondences between Individual Sound Events and Textual Phrases

1 code implementation • 6 Oct 2021 • Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen

We investigate unsupervised learning of correspondences between sound events and textual phrases through aligning audio clips with textual captions describing the content of a whole audio clip.

Event Detection Retrieval +1

Paper
Code

Sound Event Detection: A Tutorial

1 code implementation • 12 Jul 2021 • Annamaria Mesaros, Toni Heittola, Tuomas Virtanen, Mark D. Plumbley

The goal of automatic sound event detection (SED) methods is to recognize what is happening in an audio signal and when it is happening.

BIG-bench Machine Learning Event Detection +1

Paper
Code

Mobile Microphone Array Speech Detection and Localization in Diverse Everyday Environments

no code implementations • 28 Jun 2021 • Pasi Pertilä, Emre Cakir, Aapo Hakala, Eemi Fagerlund, Tuomas Virtanen, Archontis Politis, Antti Eronen

Joint sound event localization and detection (SELD) is an integral part of developing context awareness into communication interfaces of mobile robots, smartphones, and home assistants.

Sound Event Localization and Detection

Paper
Add Code

Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair

no code implementations • 22 Jun 2021 • Shanshan Wang, Gaurav Naithani, Archontis Politis, Tuomas Virtanen

Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation.

Clustering Deep Clustering +2

Paper
Add Code

A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection

1 code implementation • 13 Jun 2021 • Archontis Politis, Sharath Adavanne, Daniel Krause, Antoine Deleforge, Prerak Srivastava, Tuomas Virtanen

This report presents the dataset and baseline of Task 3 of the DCASE2021 Challenge on Sound Event Localization and Detection (SELD).

Sound Event Localization and Detection

Paper
Code

Low-complexity acoustic scene classification for multi-device audio: analysis of DCASE 2021 Challenge systems

1 code implementation • 28 May 2021 • Irene Martín-Morató, Toni Heittola, Annamaria Mesaros, Tuomas Virtanen

The most used techniques among the submissions were residual networks and weight quantization, with the top systems reaching over 70% accuracy, and log loss under 0. 8.

Acoustic Scene Classification Quantization +1

Paper
Code

Audio-visual scene classification: analysis of DCASE 2021 Challenge submissions

no code implementations • 28 May 2021 • Shanshan Wang, Toni Heittola, Annamaria Mesaros, Tuomas Virtanen

More importantly, multi-modal methods using both audio and video are employed by all the top 5 teams.

Classification Data Augmentation +2

Paper
Add Code

Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections

no code implementations • 25 Nov 2020 • Huang Xie, Okko Räsänen, Tuomas Virtanen

In this paper, we study zero-shot learning in audio classification through factored linear and nonlinear acoustic-semantic projections between audio instances and sound classes.

Audio Classification General Classification +2

Paper
Add Code

Zero-Shot Audio Classification via Semantic Embeddings

no code implementations • 24 Nov 2020 • Huang Xie, Tuomas Virtanen

The experimental results show that classification performance is significantly improved by involving sound classes that are semantically close to the test classes in training.

Audio Classification General Classification +4

Paper
Add Code

Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags

1 code implementation • 27 Oct 2020 • Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra

In this work we propose a method for learning audio representations using an audio autoencoder (AAE), a general word embeddings model (WEM), and a multi-head self-attention (MHA) mechanism.

Representation Learning TAG +1

Paper
Code

Neural Network-based Acoustic Vehicle Counting

no code implementations • 22 Oct 2020 • Slobodan Djukanović, Yash Patel, Jiři Matas, Tuomas Virtanen

This distance is predicted from audio using a two-stage (coarse-fine) regression, with both stages realised via neural networks (NNs).

Distance regression regression

Paper
Add Code

Robust Audio-Based Vehicle Counting in Low-to-Moderate Traffic Flow

no code implementations • 22 Oct 2020 • Slobodan Djukanović, Jiři Matas, Tuomas Virtanen

The method is trained and tested on a traffic-monitoring dataset comprising $422$ short, $20$-second one-channel sound files with a total of $ 1421 $ vehicles passing by the microphone.

regression

Paper
Add Code

WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information

1 code implementation • 21 Oct 2020 • An Tran, Konstantinos Drossos, Tuomas Virtanen

Automated audio captioning (AAC) is a novel task, where a method takes as an input an audio sample and outputs a textual description (i. e. a caption) of its contents.

Audio captioning Image Captioning +2

Paper
Code

Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019

4 code implementations • 6 Sep 2020 • Archontis Politis, Annamaria Mesaros, Sharath Adavanne, Toni Heittola, Tuomas Virtanen

A large-scale realistic dataset of spatialized sound events was generated for the challenge, to be used for training of learning-based approaches, and for evaluation of the submissions in an unlabeled subset.

Data Augmentation Sound Event Localization and Detection

Paper
Code

Conditioned Time-Dilated Convolutions for Sound Event Detection

no code implementations • 10 Jul 2020 • Konstantinos Drossos, Stylianos I. Mimilakis, Tuomas Virtanen

Sound event detection (SED) is the task of identifying sound events along with their onset and offset times.

Event Detection Language Modelling +1

Paper
Add Code

Multi-task Regularization Based on Infrequent Classes for Audio Captioning

1 code implementation • 9 Jul 2020 • Emre Çakır, Konstantinos Drossos, Tuomas Virtanen

Audio captioning is a multi-modal task, focusing on using natural language for describing the contents of general audio.

Audio captioning

Paper
Code

Temporal Sub-sampling of Audio Feature Sequences for Automated Audio Captioning

1 code implementation • 6 Jul 2020 • Khoa Nguyen, Konstantinos Drossos, Tuomas Virtanen

In this work we present an approach that focuses on explicitly taking advantage of this difference of lengths between sequences, by applying a temporal sub-sampling to the audio input sequence.

Audio captioning

Paper
Code

Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation

no code implementations • 6 Jul 2020 • Pyry Pyykkönen, Styliannos I. Mimilakis, Konstantinos Drossos, Tuomas Virtanen

We focus on singing voice separation, employing an RNN architecture, and we replace the RNNs with DWS convolutions (DWS-CNNs).

Music Source Separation

Paper
Add Code

COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

2 code implementations • 15 Jun 2020 • Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra

Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features.

Representation Learning

Paper
Code

A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection

2 code implementations • 2 Jun 2020 • Archontis Politis, Sharath Adavanne, Tuomas Virtanen

This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge.

Sound Event Localization and Detection

Paper
Code

Acoustic scene classification in DCASE 2020 Challenge: generalization across devices and low complexity solutions

no code implementations • 29 May 2020 • Toni Heittola, Annamaria Mesaros, Tuomas Virtanen

This paper presents the details of Task 1: Acoustic Scene Classification in the DCASE 2020 Challenge.

Acoustic Scene Classification Classification +1

Paper
Add Code

Active Learning for Sound Event Detection

no code implementations • 12 Feb 2020 • Shuyang Zhao, Toni Heittola, Tuomas Virtanen

Training with recordings as context outperforms training with only annotated segments.

Active Learning Change Point Detection +2

Paper
Add Code

Sound Event Detection with Depthwise Separable and Dilated Convolutions

1 code implementation • 2 Feb 2020 • Konstantinos Drossos, Stylianos I. Mimilakis, Shayan Gharib, Yanxiong Li, Tuomas Virtanen

The number of the channels of the CNNs and size of the weight matrices of the RNNs have a direct effect on the total amount of parameters of the SED method, which is to a couple of millions.

Event Detection Sound Event Detection

Paper
Code

Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters

no code implementations • 1 Nov 2019 • Niccoló Nicodemo, Gaurav Naithani, Konstantinos Drossos, Tuomas Virtanen, Roberto Saletti

The application of the low-bit quantization allows a 50% reduction of the DNN memory footprint while the STOI performance drops only by 2. 7%.

Quantization Speech Enhancement

Paper
Add Code

Clotho: An Audio Captioning Dataset

7 code implementations • 21 Oct 2019 • Konstantinos Drossos, Samuel Lipping, Tuomas Virtanen

Audio captioning is the novel task of general audio content description using free text.

Audio captioning Translation

Paper
Code

Crowdsourcing a Dataset of Audio Captions

1 code implementation • 22 Jul 2019 • Samuel Lipping, Konstantinos Drossos, Tuomas Virtanen

In this paper we present a three steps based framework for crowdsourcing an audio captioning dataset, based on concepts and practises followed for the creation of widely used image captioning and machine translations datasets.

Sound Audio and Speech Processing

Paper
Code

Language Modelling for Sound Event Detection with Teacher Forcing and Scheduled Sampling

1 code implementation • Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 2019 • Konstantinos Drossos, Shayan Gharib, Paul Magron, Tuomas Virtanen

On the contrary, with our method there is a decrease of 4% at F1 score and an increase of 7% at ER for the TUT-SED Synthetic 2016 dataset.

Event Detection Language Modelling +2

Paper
Code

A multi-room reverberant dataset for sound event localization and detection

3 code implementations • 21 May 2019 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen

This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge.

Sound Audio and Speech Processing

Paper
Code

Zero-Shot Audio Classification Based on Class Label Embeddings

no code implementations • 6 May 2019 • Huang Xie, Tuomas Virtanen

We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings.

Audio Classification General Classification +2

Paper
Add Code

Deep Learning for Audio Signal Processing

1 code implementation • 30 Apr 2019 • Hendrik Purwins, Bo Li, Tuomas Virtanen, Jan Schlüter, Shuo-Yiin Chang, Tara Sainath

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing.

Audio Signal Processing Automatic Speech Recognition +5

Paper
Code

Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network

1 code implementation • 29 Apr 2019 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen

This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN).

311

Paper
Code

Unsupervised Adversarial Domain Adaptation Based On The Wasserstein Distance For Acoustic Scene Classification

1 code implementation • 24 Apr 2019 • Konstantinos Drossos, Paul Magron, Tuomas Virtanen

A challenging problem in deep learning-based machine listening field is the degradation of the performance when using data from unseen conditions.

Acoustic Scene Classification Classification +3

Paper
Code

Unsupervised adversarial domain adaptation for acoustic scene classification

1 code implementation • 17 Aug 2018 • Shayan Gharib, Konstantinos Drossos, Emre Çakır, Dmitriy Serdyuk, Tuomas Virtanen

A general problem in acoustic scene classification task is the mismatched conditions between training and testing data, which significantly reduces the performance of the developed methods on classification accuracy.

Acoustic Scene Classification Classification +3

Paper
Code

Acoustic Scene Classification: A Competition Review

no code implementations • 2 Aug 2018 • Shayan Gharib, Honain Derrar, Daisuke Niizumi, Tuukka Senttula, Janne Tommola, Toni Heittola, Tuomas Virtanen, Heikki Huttunen

In this paper we study the problem of acoustic scene classification, i. e., categorization of audio sequences into mutually exclusive classes based on their spectral content.

Acoustic Scene Classification Classification +2

Paper
Add Code

A multi-device dataset for urban acoustic scene classification

2 code implementations • 25 Jul 2018 • Annamaria Mesaros, Toni Heittola, Tuomas Virtanen

This paper introduces the acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task, and evaluates the performance of a baseline system in the task.

Acoustic Scene Classification Classification +1

Paper
Code

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

8 code implementations • 30 Jun 2018 • Sharath Adavanne, Archontis Politis, Joonas Nikunen, Tuomas Virtanen

In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3D) space.

Sound Audio and Speech Processing

311

Paper
Code

End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input

no code implementations • 9 May 2018 • Emre Çakır, Tuomas Virtanen

Sound event detection systems typically consist of two stages: extracting hand-crafted features from the raw audio waveform, and learning a mapping between these features and the target sound events using a classifier.

Event Detection Sound Event Detection

Paper
Add Code

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

2 code implementations • 1 Feb 2018 • Konstantinos Drossos, Stylianos Ioannis Mimilakis, Dmitriy Serdyuk, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio

Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods.

Sound Audio and Speech Processing

111

Paper
Code

Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features

no code implementations • 29 Jan 2018 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen

Each of this dataset has a four-channel first-order Ambisonic, binaural, and single-channel versions, on which the performance of SED using the proposed method are compared to study the potential of SED using multichannel audio.

Event Detection Sound Event Detection

Paper
Add Code

Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

no code implementations • 4 Nov 2017 • Stylianos Ioannis Mimilakis, Konstantinos Drossos, João F. Santos, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio

Singing voice separation based on deep learning relies on the usage of time-frequency masking.

Sound Audio and Speech Processing

Paper
Add Code

Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network

no code implementations • 27 Oct 2017 • Sharath Adavanne, Archontis Politis, Tuomas Virtanen

This paper proposes a deep neural network for estimating the directions of arrival (DOA) of multiple sound sources.

Direction of Arrival Estimation

Paper
Add Code

Automated Audio Captioning with Recurrent Neural Networks

no code implementations • 30 Jun 2017 • Konstantinos Drossos, Sharath Adavanne, Tuomas Virtanen

The encoder is a multi-layered, bi-directional gated recurrent unit (GRU) and the decoder a multi-layered GRU with a classification layer connected to the last GRU of the decoder.

Audio captioning General Classification +3

Paper
Add Code

Stacked Convolutional and Recurrent Neural Networks for Bird Audio Detection

no code implementations • 7 Jun 2017 • Sharath Adavanne, Konstantinos Drossos, Emre Çakır, Tuomas Virtanen

This paper studies the detection of bird calls in audio segments using stacked convolutional and recurrent neural networks.

Bird Audio Detection Data Augmentation +1

Paper
Add Code

Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features

no code implementations • 7 Jun 2017 • Sharath Adavanne, Giambattista Parascandolo, Pasi Pertilä, Toni Heittola, Tuomas Virtanen

In this paper, we propose the use of spatial and harmonic features in combination with long short term memory (LSTM) recurrent neural network (RNN) for automatic sound event detection (SED) task.

Event Detection Sound Event Detection

Paper
Add Code

Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition

no code implementations • 7 Jun 2017 • Miroslav Malik, Sharath Adavanne, Konstantinos Drossos, Tuomas Virtanen, Dasa Ticha, Roman Jarina

This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space.

Emotion Recognition Music Emotion Recognition

Paper
Add Code

Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network

no code implementations • 7 Jun 2017 • Sharath Adavanne, Pasi Pertilä, Tuomas Virtanen

This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection.

Event Detection Sound Event Detection

Paper
Add Code

Convolutional Recurrent Neural Networks for Bird Audio Detection

no code implementations • 7 Mar 2017 • EmreÇakır, Sharath Adavanne, Giambattista Parascandolo, Konstantinos Drossos, Tuomas Virtanen

Bird sounds possess distinctive spectral structure which may exhibit small shifts in spectrum depending on the bird species and environmental conditions.

Bird Audio Detection

Paper
Add Code

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

1 code implementation • 21 Feb 2017 • Emre Çakır, Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, Tuomas Virtanen

Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure.

Event Detection Sound Event Detection

Paper
Code

Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings

2 code implementations • 4 Apr 2016 • Giambattista Parascandolo, Heikki Huttunen, Tuomas Virtanen

In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs).

Data Augmentation Event Detection +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.