Alternative Objective Functions for Deep Clustering
The recently proposed deep clustering framework represents a significant step towards solv-ing the cocktail party problem. This study proposes and compares a variety of alternativeobjective functions for training deep clustering networks. In addition, whereas the originaldeep clustering work relied on k-means clustering for test-time inference, here we investigateinference methods that are matched to the training objective. Furthermore, we explore theuse of an improved chimera network architecture for speech separation, which combines deepclustering with mask-inference networks in a multiobjective training scheme. The deep clus-tering loss acts as a regularizer while training the end-to-end mask inference network for bestseparation. With further iterative phase reconstruction, our best proposed method achievesa state-of-the-art 11.5 dB signal-to-distortion ratio (SDR) result on the publicly availablewsj0-2mix dataset, with a much simpler architecture than the previous best approach.
PDFCode
Datasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Speech Separation | WSJ0-2mix | Chimera++ | SI-SDRi | 11.5 | # 29 |