no code implementations • 11 Mar 2024 • Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel A. McFarland, James Y. Zou
We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM).
no code implementations • 4 Mar 2024 • Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou
We find empirically that across multiple language tasks, surprisingly, Voting Inference Systems' performance first increases but then decreases as a function of the number of LLM calls.
no code implementations • 22 Nov 2023 • Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou
As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative.
4 code implementations • 18 Jul 2023 • Lingjiao Chen, Matei Zaharia, James Zou
We find that the performance and behavior of both GPT-3. 5 and GPT-4 can vary greatly over time.
no code implementations • 9 May 2023 • Lingjiao Chen, Matei Zaharia, James Zou
There is a rapidly growing number of large language models (LLMs) that users can query for a fee.
no code implementations • 11 Oct 2022 • Nazneen Rajani, Weixin Liang, Lingjiao Chen, Meg Mitchell, James Zou
With the advent of Transformers, large language models (LLMs) have saturated well-known NLP benchmarks and leaderboards with high aggregate performance.
1 code implementation • 18 Sep 2022 • Lingjiao Chen, Zhihua Jin, Sabri Eyuboglu, Christopher Ré, Matei Zaharia, James Zou
HAPI is the first large-scale dataset of ML API usages and is a unique resource for studying ML-as-a-service (MLaaS).
no code implementations • 18 Sep 2022 • Lingjiao Chen, Matei Zaharia, James Zou
We further propose SEES, an algorithmic framework to characterize the distribution shift under SJS and to estimate a model's performance on new data without any labels.
1 code implementation • NeurIPS 2023 • Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Douwe Kiela, David Jurado, David Kanter, Rafael Mosquera, Juan Ciro, Lora Aroyo, Bilge Acun, Lingjiao Chen, Mehul Smriti Raje, Max Bartolo, Sabri Eyuboglu, Amirata Ghorbani, Emmett Goodman, Oana Inel, Tariq Kane, Christine R. Kirkpatrick, Tzu-Sheng Kuo, Jonas Mueller, Tristan Thrush, Joaquin Vanschoren, Margaret Warren, Adina Williams, Serena Yeung, Newsha Ardalani, Praveen Paritosh, Lilith Bat-Leah, Ce Zhang, James Zou, Carole-Jean Wu, Cody Coleman, Andrew Ng, Peter Mattson, Vijay Janapa Reddi
Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems.
no code implementations • 4 Oct 2021 • Lingjiao Chen, Leshang Chen, Hongyi Wang, Susan Davidson, Edgar Dobriban
There has been a growing need to provide Byzantine-resilience in distributed model training.
no code implementations • ICLR 2022 • Lingjiao Chen, Matei Zaharia, James Zou
ML prediction APIs from providers like Amazon and Google have made it simple to use ML in applications.
no code implementations • 29 Jul 2021 • Lingjiao Chen, Tracy Cai, Matei Zaharia, James Zou
This motivated us to formulate the API shift assessment problem at a more fine-grained level as estimating how the API model's confusion matrix changes over time when the data distribution is constant.
no code implementations • 18 Feb 2021 • Lingjiao Chen, Matei Zaharia, James Zou
In this work, we propose FrugalMCT, a principled framework that adaptively selects the APIs to use for different data in an online fashion while respecting user's budget.
no code implementations • NeurIPS 2020 • Lingjiao Chen, Matei Zaharia, James Zou
Prediction APIs offered for a fee are a fast-growing industry and an important part of machine learning as a service.
no code implementations • NeurIPS 2018 • Lingjiao Chen, Hongyi Wang, Jinman Zhao, Dimitris Papailiopoulos, Paraschos Koutris
Distributed implementations of mini-batch stochastic gradient descent (SGD) suffer from communication overheads, attributed to the high frequency of gradient updates inherent in small-batch training.
no code implementations • 26 May 2018 • Lingjiao Chen, Paraschos Koutris, Arun Kumar
Finally, we conduct extensive experiments, which validate that the MBP framework can provide high revenue to the seller, high affordability to the buyer, and also operate on low runtime cost.
1 code implementation • ICML 2018 • Lingjiao Chen, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos
Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i. e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS).
no code implementations • ICLR 2018 • Xi Wu, Uyeong Jang, Lingjiao Chen, Somesh Jha
Interestingly, we find that a recent objective by Madry et al. encourages training a model that satisfies well our formal version of the goodness property, but has a weak control of points that are wrong but with low confidence.
no code implementations • ICML 2018 • Xi Wu, Uyeong Jang, Jiefeng Chen, Lingjiao Chen, Somesh Jha
In this paper we study leveraging confidence information induced by adversarial training to reinforce adversarial robustness of a given adversarially trained model.
no code implementations • 22 Feb 2017 • Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Jeffrey F. Naughton, Jignesh M. Patel, Xi Wu
We fill this crucial research gap by proposing a new lossless compression scheme we call tuple-oriented compression (TOC) that is inspired by an unlikely source, the string/text compression scheme Lempel-Ziv-Welch, but tailored to MGD in a way that preserves tuple boundaries within mini-batches.