no code implementations • 24 Mar 2023 • Nitish Shukla, Anurima Dey, Srivatsan K
We empirically show that this type of training compresses the model without sacrificing accuracy despite being up to 10 times smaller than the teacher model.
Knowledge Distillation Vocal Bursts Type Prediction