Search Results for author: Kshitiz Malik

Found 9 papers, 5 papers with code

Effective Long-Context Scaling of Foundation Models

1 code implementation • 27 Sep 2023 • Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma

We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.

Continual Pretraining Language Modelling

281

Paper
Code

Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning

1 code implementation • 14 Oct 2022 • John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael Rabbat

Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity.

Federated Learning

Paper
Code

Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning

2 code implementations • 30 Jun 2022 • John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael Rabbat

Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity.

Federated Learning

Paper
Code

Federated Learning with Partial Model Personalization

2 code implementations • 8 Apr 2022 • Krishna Pillutla, Kshitiz Malik, Abdelrahman Mohamed, Michael Rabbat, Maziar Sanjabi, Lin Xiao

We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices.

Federated Learning

Paper
Code

FedSynth: Gradient Compression via Synthetic Data in Federated Learning

1 code implementation • 4 Apr 2022 • Shengyuan Hu, Jack Goetz, Kshitiz Malik, Hongyuan Zhan, Zhe Liu, Yue Liu

Model compression is important in federated learning (FL) with large models to reduce communication cost.

Federated Learning Model Compression

1,186

Paper
Code

Papaya: Practical, Private, and Scalable Federated Learning

no code implementations • 8 Nov 2021 • Dzmitry Huba, John Nguyen, Kshitiz Malik, Ruiyu Zhu, Mike Rabbat, Ashkan Yousefpour, Carole-Jean Wu, Hongyuan Zhan, Pavel Ustinov, Harish Srinivas, Kaikai Wang, Anthony Shoumikhin, Jesik Min, Mani Malek

Our work tackles the aforementioned issues, sketches of some of the system design challenges and their solutions, and touches upon principles that emerged from building a production FL system for millions of clients.

Federated Learning

Paper
Add Code

Federated Learning with Buffered Asynchronous Aggregation

no code implementations • 11 Jun 2021 • John Nguyen, Kshitiz Malik, Hongyuan Zhan, Ashkan Yousefpour, Michael Rabbat, Mani Malek, Dzmitry Huba

On the other hand, asynchronous aggregation of client updates in FL (i. e., asynchronous FL) alleviates the scalability issue.

Federated Learning Privacy Preserving

Paper
Add Code

Federated User Representation Learning

no code implementations • ICLR 2020 • Duc Bui, Kshitiz Malik, Jack Goetz, Honglei Liu, Seungwhan Moon, Anuj Kumar, Kang G. Shin

Furthermore, we show that user embeddings learned in FL and the centralized setting have a very similar structure, indicating that FURL can learn collaboratively through the shared parameters while preserving user privacy.

Federated Learning Privacy Preserving +1

Paper
Add Code

Active Federated Learning

no code implementations • 27 Sep 2019 • Jack Goetz, Kshitiz Malik, Duc Bui, Seungwhan Moon, Honglei Liu, Anuj Kumar

To exploit this we propose Active Federated Learning, where in each round clients are selected not uniformly at random, but with a probability conditioned on the current model and the data on the client to maximize efficiency.

Federated Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.