SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks

1 Sep 2023 · Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Parsa Kavehzadeh, Marzieh Tahaei, Boxing Chen, Ali Ghodsi ·

Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous specific models. There are solutions in the literature to deal with single dynamic or many-in-one models instead of many individual networks; however, they usually suffer from heavy model search requirements, being architecture-specific, working only on a limited number of dimensions (e.g. depth only or width only) or sub-models. To address these problems, we propose SortedNet, a generalized and scalable training solution to harness the inherent modularity of DNNs. Thanks to a generalized nested architecture (which we refer to as \textit{sorted} architecture in this paper) with shared parameters and its novel update scheme combining random sub-model sampling and gradient accumulation, SortedNet enables the training of numerous sub-models simultaneously, simplifies dynamic model selection and deployment during inference, and reduces the model storage requirement significantly. The versatility and scalability of SortedNet are validated through various architectures and tasks including LLaMA, BERT, RoBERTa (NLP tasks), ResNet and MobileNet (image classification) demonstrating its superiority over existing dynamic training methods. SortedNet is able to train up to 160 sub-models at once, achieving at least 96\% of the original model's performance.

PDF Abstract