Cross-modal information fusion for voice spoofing detection

journal 2023  ·  Junxiao Xue, Hao Zhou, Huawei Song, Bin Wu, Lei Shi ·

In recent years, speaker verification systems have been used in many production scenarios. Unfortunately, they are still very vulnerable to different kinds of spoofing attacks, such as speech synthesis attacks, replay attacks, etc. Researchers have proposed many methods to defend against these attacks, but in the existing methods, researchers just focus on speech features. In recent studies, researchers have found that speech contains a large amount of face information. In fact, we can determine the speaker's gender, age, mouth shape, and other information by voice. These information can help us distinguish spoofing attacks. Inspired by this phenomenon, we propose a generalized framework named GACMNet. To cope with different attack scenarios, we instantiated two different models. Our framework is mainly divided into data pre-processing phase, feature extraction phase, feature fusion phase, and classification phase. Specifically, our framework consists of two branches. On the one hand, we extract face features in speech by a convolutional neural network. On the other hand, we use a densely connected network to extract speech features. For the more, we designed a global attention-based information fusion mechanism to distinguish the importance of each part of the features. Our solution was proven to be effective in two large scenarios. Compared to the existing methods, our model improves the tandem decision cost function (t-DCF) and equal error rate (EER) scores by 9% and 11% in the logical access scenario, respectively, our model improves the EER score by 10% in the physical access scenario.

PDF

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Voice Anti-spoofing ASVspoof 2019 - LA LFCC&Face+SE-DenseNet+A-softmax EER 2.73 # 2
min t-dcf 0.0713 # 2
Voice Anti-spoofing ASVspoof 2019 - PA CQT&Face+SE-Res2Net+log-softmax min t-dcf 0.0230 # 1
EER 0.85 # 1

Methods