A Mask-based Model for Mandarin Chinese Polyphone Disambiguation

21 Oct 2020 · Haiteng Zhang ·

Polyphone disambiguation serves as an essential part of Mandarin text-to-speech (TTS) system. However, conventional system modeling the entire Pinyin set causes the case that prediction belongs to the unrelated polyphonic character instead of the current input one, which has negative impacts on TTS performance. To address this issue, we introduce a mask-based model for polyphone disambiguation. The model takes a mask vector extracted from the context as an extra input. In our model, the mask vector not only acts as a weighting factor in Weightedsoftmax to prevent the case of mis-prediction but also eliminates the contribution of non-candidate set to the overall loss. Moreover, to mitigate the uneven distribution of pronunciation, we introduce a new loss called Modified Focal Loss. The experimental result shows the effectiveness of the proposed mask based model. We also empirically studied the impact of Weighted-softmax and Modified Focal Loss. It was found that Weighted-softmax can effectively prevent the model from predicting outside the candidate set. Besides, Modified Focal Loss can reduce the adverse impacts of the uneven distribution of pronunciation.

PDF

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Polyphone disambiguation

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

Focal Loss

Edit Social Preview

A Mask-based Model for Mandarin Chinese Polyphone Disambiguation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove