2 code implementations • 26 Apr 2024 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
To reduce the need for skilled clinicians in heart sound interpretation, recent studies on automating cardiac auscultation have explored deep learning approaches.
no code implementations • 12 Apr 2024 • Masahiro Yasuda, Noboru Harada, Yasunori Ohishi, Shoichiro Saito, Akira Nakayama, Nobutaka Ono
This is because the information obtained from a single sensor is often missing or fragmented in such an environment; observations from multiple locations and modalities should be integrated to analyze events comprehensively.
2 code implementations • 9 Apr 2024 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
This study proposes Masked Modeling Duo (M2D), an improved masked prediction SSL, which learns by predicting representations of masked input signals that serve as training signals.
Ranked #1 on Linear evaluation on ESC-50
no code implementations • 16 Mar 2024 • Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Keisuke Imoto
Experimental results show that method (i) improves the audio-text retrieval performance by selecting the nearest image that aligns with the audio information and transferring the learned knowledge.
no code implementations • 4 Mar 2024 • Masahiro Yasuda, Shoichiro Saito, Akira Nakayama, Noboru Harada
A system trained only with a dataset using microphone arrays in a fixed position would be unable to adapt to the fast relative motion of sound events associated with self-motion, resulting in the degradation of SELD performance.
no code implementations • 13 Feb 2024 • Shiqi Zhang, Zheng Qiu, Daiki Takeuchi, Noboru Harada, Shoji Makino
With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding.
1 code implementation • 23 Aug 2023 • Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino
We proposed Audio Difference Captioning (ADC) as a new extension task of audio captioning for describing the semantic differences between input pairs of similar but slightly different audio clips.
1 code implementation • 23 May 2023 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
Self-supervised learning general-purpose audio representations have demonstrated high performance in a variety of tasks.
1 code implementation • 13 May 2023 • Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Yohei Kawaguchi
In 2023 Task 2, we focus on solving the first-shot problem, which is the challenge of training a model on a completely novel machine type.
1 code implementation • 27 Apr 2023 • Kenji Ishikawa, Daiki Takeuchi, Noboru Harada, Takehiro Moriya
We compared the method with conventional ones, such as image filters, a spatiotemporal filter, and other DNN architectures, on numerical and experimental data.
1 code implementation • 1 Mar 2023 • Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi, Masahiro Yasuda
This paper provides a baseline system for First-shot-compliant unsupervised anomaly detection (ASD) for machine condition monitoring.
1 code implementation • 26 Oct 2022 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches.
Ranked #1 on Speaker Identification on VoxCeleb1 (using extra training data)
no code implementations • 25 Jul 2022 • Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, Kunio Kashino
We use it to bridge modality-dependent information, i. e., the speech segments in the mixture, and the specified, modality-independent concept.
1 code implementation • 20 Jul 2022 • Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino
While the range of conventional content-based audio retrieval is limited to audio that is similar to the query audio, the proposed method can adjust the retrieval range by adding an embedding of the auxiliary text query-modifier to the embedding of the query sample audio in a shared latent space.
2 code implementations • 13 Jun 2022 • Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Takashi Endo, Masaaki Yamamoto, Yohei Kawaguchi
We present the task description and discussion on the results of the DCASE 2022 Challenge Task 2: ``Unsupervised anomalous sound detection (ASD) for machine condition monitoring applying domain generalization techniques''.
1 code implementation • 17 May 2022 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
This approach improves the utility of frequency and channel information in downstream processes, and combines the effectiveness of middle and late layer features for different tasks.
1 code implementation • 26 Apr 2022 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
In this paper, we seek to learn audio representations from the input itself as supervision using a pretext task of auto-encoding of masked spectrogram patches, Masked Spectrogram Modeling (MSM, a variant of Masked Image Modeling applied to audio spectrogram).
1 code implementation • 15 Apr 2022 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
In this study, we hypothesize that representations effective for general audio tasks should provide multiple aspects of robust features of the input sound.
1 code implementation • 18 Feb 2022 • Masahiro Yasuda, Yasunori Ohishi, Shoichiro Saito, Noboru Harada
We tackle a challenging task: multi-view and multi-modal event detection that detects events in a wide-range real environment by utilizing data from distributed cameras and microphones and their weak labels.
4 code implementations • 8 Jun 2021 • Yohei Kawaguchi, Keisuke Imoto, Yuma Koizumi, Noboru Harada, Daisuke Niizumi, Kota Dohi, Ryo Tanabe, Harsh Purohit, Takashi Endo
In 2020, we organized an unsupervised anomalous sound detection (ASD) task, identifying whether a given sound was normal or anomalous without anomalous training data.
7 code implementations • 4 Jun 2021 • Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, Shoichiro Saito
This paper proposes a new large-scale dataset called "ToyADMOS2" for anomaly detection in machine operating sounds (ADMOS).
2 code implementations • 11 Mar 2021 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
Inspired by the recent progress in self-supervised learning for computer vision that generates supervision using data augmentations, we explore a new general-purpose audio representation learning approach.
no code implementations • 24 Sep 2020 • Daiki Takeuchi, Yuma Koizumi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning.
no code implementations • 1 Jul 2020 • Yuma Koizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
This technical report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6: automated audio captioning.
Ranked #4 on Audio captioning on Clotho
3 code implementations • 10 Jun 2020 • Yuma Koizumi, Yohei Kawaguchi, Keisuke Imoto, Toshiki Nakamura, Yuki Nikaido, Ryo Tanabe, Harsh Purohit, Kaori Suefusa, Takashi Endo, Masahiro Yasuda, Noboru Harada
The main challenge of this task is to detect unknown anomalous sounds under the condition that only normal sound samples have been provided as training data.
no code implementations • 14 Feb 2020 • Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada
In the proposed method, DNNs estimate phase derivatives instead of phase itself, which allows us to avoid the sensitivity problem.
1 code implementation • 25 Nov 2019 • Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada
Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand.
Audio and Speech Processing Sound
no code implementations • 10 Oct 2019 • Luca Mazzon, Yuma Koizumi, Masahiro Yasuda, Noboru Harada
The same transformation is applied also to the labels, in order to maintain consistency between input data and target labels.
2 code implementations • 9 Aug 2019 • Yuma Koizumi, Shoichiro Saito, Hisashi Uematsu, Noboru Harada, Keisuke Imoto
To build a large-scale dataset for ADMOS, we collected anomalous operating sounds of miniature machines (toys) by deliberately damaging them.
no code implementations • 19 Jul 2019 • Yuma Koizumi, Shoichiro Saito, Masataka Yamaguchi, Shin Murata, Noboru Harada
The AE is trained to minimize the sample mean of the anomaly score of normal sounds in a mini-batch.
no code implementations • 10 Mar 2019 • Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada
This paper presents a novel phase reconstruction method (only from a given amplitude spectrogram) by combining a signal-processing-based approach and a deep neural network (DNN).
no code implementations • 14 Dec 2018 • Masataka Yamaguchi, Yuma Koizumi, Noboru Harada
To address this difficulty, we propose AdaFlow, a new DNN-based density estimator that can be easily adapted to the change of the distribution.
no code implementations • 5 Nov 2018 • Yuma Koizumi, Noboru Harada, Yoichi Haneda
To overcome this problem, we incorporate AWS into the speech enhancement procedure, and the windowing function of each time-frame is manipulated using a DNN depending on the input signal.
1 code implementation • 22 Oct 2018 • Yuma Koizumi, Shoichiro Saito, Hisashi Uematsum Yuta Kawachi, Noboru Harada
To calculate the TPR in the objective function, we consider that the set of anomalous sounds is the complementary set of normal sounds and simulate anomalous sounds by using a rejection sampling algorithm.