Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation

30 Apr 2024  ·  Eyal Liron Dolev, Clemens Fidel Lutz, Noëmi Aepli ·

Whisper is a state-of-the-art automatic speech recognition (ASR) model (Radford et al., 2022). Although Swiss German dialects are allegedly not part of Whisper's training data, preliminary experiments showed that Whisper can transcribe Swiss German quite well, with the output being a speech translation into Standard German. To gain a better understanding of Whisper's performance on Swiss German, we systematically evaluate it using automatic, qualitative, and human evaluation. We test its performance on three existing test sets: SwissDial (Dogan-Sch\"onberger et al., 2021), STT4SG-350 (Pl\"uss et al., 2023), and Swiss Parliaments Corpus (Pl\"uss et al., 2021). In addition, we create a new test set for this work, based on short mock clinical interviews. For automatic evaluation, we used word error rate (WER) and BLEU. In the qualitative analysis, we discuss Whisper's strengths and weaknesses and anylyze some output examples. For the human evaluation, we conducted a survey with 28 participants who were asked to evaluate Whisper's performance. All of our evaluations suggest that Whisper is a viable ASR system for Swiss German, so long as the Standard German output is desired.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here