SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks

Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous specific models. There are solutions in the literature to deal with single dynamic or many-in-one models instead of many individual networks; however, they usually suffer from heavy model search requirements, being architecture-specific, working only on a limited number of dimensions (e.g. depth only or width only) or sub-models. To address these problems, we propose SortedNet, a generalized and scalable training solution to harness the inherent modularity of DNNs. Thanks to a generalized nested architecture (which we refer to as \textit{sorted} architecture in this paper) with shared parameters and its novel update scheme combining random sub-model sampling and gradient accumulation, SortedNet enables the training of numerous sub-models simultaneously, simplifies dynamic model selection and deployment during inference, and reduces the model storage requirement significantly. The versatility and scalability of SortedNet are validated through various architectures and tasks including LLaMA, BERT, RoBERTa (NLP tasks), ResNet and MobileNet (image classification) demonstrating its superiority over existing dynamic training methods. SortedNet is able to train up to 160 sub-models at once, achieving at least 96\% of the original model's performance.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods