In this paper, we propose a new loss function called generalized end-to-end
(GE2E) loss, which makes the training of speaker verification models more
efficient than our previous tuple-based end-to-end (TE2E) loss function. Unlike
TE2E, the GE2E loss function updates the network in a way that emphasizes
examples that are difficult to verify at each step of the training process...
Additionally, the GE2E loss does not require an initial stage of example
selection. With these properties, our model with the new loss function
decreases speaker verification EER by more than 10%, while reducing the
training time by 60% at the same time. We also introduce the MultiReader
technique, which allows us to do domain adaptation - training a more accurate
model that supports multiple keywords (i.e. "OK Google" and "Hey Google") as
well as multiple dialects.