no code implementations • 24 Feb 2024 • Xia Liang, Xingjian Du, Jiaju Lin, Pei Zou, Yuan Wan, Bilei Zhu
Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks.
no code implementations • 16 Oct 2023 • Xingjian Du, Zhesong Yu, Jiaju Lin, Bilei Zhu, Qiuqiang Kong
However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags.
no code implementations • 21 Mar 2023 • Xingjian Du, Zijie Wang, Xia Liang, Huidong Liang, Bilei Zhu, Zejun Ma
Deep learning based methods have become a paradigm for cover song identification (CSI) in recent years, where the ByteCover systems have achieved state-of-the-art results on all the mainstream datasets of CSI.
1 code implementation • 7 Nov 2022 • Huidong Liang, Xingjian Du, Bilei Zhu, Zejun Ma, Ke Chen, Junbin Gao
Existing graph contrastive learning methods rely on augmentation techniques based on random perturbations (e. g., randomly adding or dropping edges and nodes).
no code implementations • ICASSP 2022 • Xingjian Du, Ke Chen, Zijie Wang, Bilei Zhu, Zejun Ma
Convolutional neural network (CNN)-based methods have dominated the recent research of cover song identification (CSI).
Ranked #1 on Cover song identification on SHS100K-TEST
1 code implementation • 2 Feb 2022 • Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov
To combat these problems, we introduce HTS-AT: an audio transformer with a hierarchical structure to reduce the model size and training time.
Ranked #4 on Sound Event Detection on DESED
no code implementations • AAAI 2021 • Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov
Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training.
1 code implementation • 15 Dec 2021 • Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov
Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training.
Ranked #1 on Audio Source Separation on AudioSet
1 code implementation • 27 Oct 2020 • Xingjian Du, Zhesong Yu, Bilei Zhu, Xiaoou Chen, Zejun Ma
We present in this paper ByteCover, which is a new feature learning method for cover song identification (CSI).
Ranked #2 on Cover song identification on Da-TACOS
1 code implementation • 27 Oct 2020 • Yuanbo Hou, Yi Deng, Bilei Zhu, Zejun Ma, Dick Botteldooren
Detecting anchor's voice in live musical streams is an important preprocessing for music and speech signal processing.
Sound Multimedia Audio and Speech Processing
no code implementations • 26 Oct 2020 • Zhesong Yu, Xingjian Du, Bilei Zhu, Zejun Ma
The rise of video-sharing platforms has attracted more and more people to shoot videos and upload them to the Internet.