BYTECOVER2: TOWARDS DIMENSIONALITY REDUCTION OF LATENT EMBEDDING FOR EFFICIENT COVER SONG IDENTIFICATION

ICASSP 2022  ·  Xingjian Du, Ke Chen, Zijie Wang, Bilei Zhu, Zejun Ma ·

Convolutional neural network (CNN)-based methods have dominated the recent research of cover song identification (CSI). A typical example is the ByteCover system we proposed, which has achieved state-of-the-art results on all the mainstream datasets of CSI. In this paper, we propose an upgraded version of ByteCover, termed ByteCover2, which further improves ByteCover in both identification performance and efficiency. Compared with ByteCover, ByteCover2 is designed with an additional PCA-FC module, which integrates the capability of principal component analysis (PCA) and fully-connected (FC) neural network for dimensionality reduction of the audio embedding, allowing ByteCover2 to perform CSI in a more precise and efficient way. We evaluated ByteCover2 on multiple datasets in different dimension sizes and training settings, where ByteCover2 beat all the compared methods including ByteCover, even with a dimension size of 128, which is 15 times smaller than that of ByteCover.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Cover song identification Covers80 ByteCover2 MAP 0.928 # 1
Cover song identification Da-TACOS ByteCover2 mAP 0.791 # 1
Cover song identification SHS100K-TEST Bytecover mAP 0.864 # 1

Methods


No methods listed for this paper. Add relevant methods here