no code implementations • 29 Oct 2020 • Siddharth Sigtia, John Bridle, Hywel Richards, Pascal Clark, Erik Marchi, Vineet Garg
We first demonstrate that by including more audio context after a detected trigger phrase, we can indeed get a more accurate decision.
no code implementations • 26 Jan 2020 • Siddharth Sigtia, Erik Marchi, Sachin Kajarekar, Devang Naik, John Bridle
We train the network in a supervised multi-task learning setup, where the speech transcription branch of the network is trained to minimise a phonetic connectionist temporal classification (CTC) loss while the speaker recognition branch of the network is trained to label the input sequence with the correct label for the speaker.
no code implementations • 26 Jan 2020 • Siddharth Sigtia, Pascal Clark, Rob Haynes, Hywel Richards, John Bridle
Next, we collect a much smaller dataset of examples that are challenging for the baseline system.