Search Results for author: Cyril Zhang

Found 31 papers, 6 papers with code

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

no code implementations22 Apr 2024 Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Masahiro Tanaka, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, ZiYi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

GPT-3.5 Language Modelling

Can large language models explore in-context?

no code implementations22 Mar 2024 Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins

We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making.

Decision Making GPT-3.5 +1

Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck

no code implementations7 Sep 2023 Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang

Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning.

tabular-classification

Learning Hidden Markov Models Using Conditional Samples

no code implementations28 Feb 2023 Sham M. Kakade, Akshay Krishnamurthy, Gaurav Mahajan, Cyril Zhang

In this paper, we depart from this setup and consider an interactive access model, in which the algorithm can query for samples from the conditional distributions of the HMMs.

Time Series Time Series Analysis

Neural Active Learning on Heteroskedastic Distributions

1 code implementation2 Nov 2022 Savya Khosla, Chew Kin Whye, Jordan T. Ash, Cyril Zhang, Kenji Kawaguchi, Alex Lamb

To this end, we demonstrate the catastrophic failure of these active learning algorithms on heteroskedastic distributions and propose a fine-tuning-based approach to mitigate these failures.

Active Learning

Transformers Learn Shortcuts to Automata

no code implementations19 Oct 2022 Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Cyril Zhang

Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine.

Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms

no code implementations1 Sep 2022 Surbhi Goel, Sham Kakade, Adam Tauman Kalai, Cyril Zhang

For example, on parity problems, the NN learns as well as Gaussian elimination, an efficient algorithm that can be succinctly described.

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

no code implementations18 Jul 2022 Boaz Barak, Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang

There is mounting evidence of emergent phenomena in the capabilities of deep learning methods as we scale up datasets, model sizes, and training times.

Understanding Contrastive Learning Requires Incorporating Inductive Biases

no code implementations28 Feb 2022 Nikunj Saunshi, Jordan Ash, Surbhi Goel, Dipendra Misra, Cyril Zhang, Sanjeev Arora, Sham Kakade, Akshay Krishnamurthy

Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs.

Contrastive Learning Self-Supervised Learning

Anti-Concentrated Confidence Bonuses for Scalable Exploration

no code implementations ICLR 2022 Jordan T. Ash, Cyril Zhang, Surbhi Goel, Akshay Krishnamurthy, Sham Kakade

Intrinsic rewards play a central role in handling the exploration-exploitation trade-off when designing sequential decision-making algorithms, in both foundational theory and state-of-the-art deep reinforcement learning.

Decision Making reinforcement-learning +1

Inductive Biases and Variable Creation in Self-Attention Mechanisms

no code implementations19 Oct 2021 Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Cyril Zhang

Self-attention, an architectural motif designed to model long-range interactions in sequential data, has driven numerous recent breakthroughs in natural language processing and beyond.

Sparsity in Partially Controllable Linear Systems

no code implementations12 Oct 2021 Yonathan Efroni, Sham Kakade, Akshay Krishnamurthy, Cyril Zhang

However, in practice, we often encounter systems in which a large set of state variables evolve exogenously and independently of the control inputs; such systems are only partially controllable.

Learning Rate Grafting: Transferability of Optimizer Tuning

no code implementations29 Sep 2021 Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang

In the empirical science of training large neural networks, the learning rate schedule is a notoriously challenging-to-tune hyperparameter, which can depend on all other properties (architecture, optimizer, batch size, dataset, regularization, ...) of the problem.

Acceleration via Fractal Learning Rate Schedules

no code implementations1 Mar 2021 Naman Agarwal, Surbhi Goel, Cyril Zhang

In practical applications of iterative first-order optimization, the learning rate schedule remains notoriously difficult to understand and expensive to tune.

Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking

1 code implementation19 Feb 2021 Paula Gradu, John Hallman, Daniel Suo, Alex Yu, Naman Agarwal, Udaya Ghai, Karan Singh, Cyril Zhang, Anirudha Majumdar, Elad Hazan

We present an open-source library of natively differentiable physics and robotics environments, accompanied by gradient-based control methods and a benchmark-ing suite.

Benchmarking OpenAI Gym

Machine Learning for Mechanical Ventilation Control

2 code implementations12 Feb 2021 Daniel Suo, Naman Agarwal, Wenhan Xia, Xinyi Chen, Udaya Ghai, Alexander Yu, Paula Gradu, Karan Singh, Cyril Zhang, Edgar Minasyan, Julienne LaChance, Tom Zajdel, Manuel Schottdorf, Daniel Cohen, Elad Hazan

We consider the problem of controlling an invasive mechanical ventilator for pressure-controlled ventilation: a controller must let air in and out of a sedated patient's lungs according to a trajectory of airway pressures specified by a clinician.

BIG-bench Machine Learning

Stochastic Optimization with Laggard Data Pipelines

no code implementations NeurIPS 2020 Naman Agarwal, Rohan Anil, Tomer Koren, Kunal Talwar, Cyril Zhang

State-of-the-art optimization is steadily shifting towards massively parallel pipelines with extremely large batch sizes.

Stochastic Optimization

Disentangling Adaptive Gradient Methods from Learning Rates

1 code implementation26 Feb 2020 Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang

We investigate several confounding factors in the evaluation of optimization algorithms for deep learning.

No-Regret Prediction in Marginally Stable Systems

no code implementations6 Feb 2020 Udaya Ghai, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang

This requires a refined regret analysis, including a structural lemma showing the current state of the system to be a small linear combination of past states, even if the state grows polynomially.

LEMMA

Revisiting the Generalization of Adaptive Gradient Methods

no code implementations ICLR 2020 Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang

A commonplace belief in the machine learning community is that using adaptive gradient methods hurts generalization.

BIG-bench Machine Learning

Calibration, Entropy Rates, and Memory in Language Models

no code implementations ICML 2020 Mark Braverman, Xinyi Chen, Sham M. Kakade, Karthik Narasimhan, Cyril Zhang, Yi Zhang

Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing.

Robust guarantees for learning an autoregressive filter

no code implementations23 May 2019 Holden Lee, Cyril Zhang

The optimal predictor for a linear dynamical system (with hidden state and Gaussian noise) takes the form of an autoregressive linear filter, namely the Kalman filter.

Time Series Time Series Prediction

Extreme Tensoring for Low-Memory Preconditioning

no code implementations ICLR 2020 Xinyi Chen, Naman Agarwal, Elad Hazan, Cyril Zhang, Yi Zhang

State-of-the-art models are now trained with billions of parameters, reaching hardware limits in terms of memory consumption.

Stochastic Optimization

Efficient Full-Matrix Adaptive Regularization

no code implementations ICLR 2019 Naman Agarwal, Brian Bullins, Xinyi Chen, Elad Hazan, Karan Singh, Cyril Zhang, Yi Zhang

Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive.

Spectral Filtering for General Linear Dynamical Systems

no code implementations NeurIPS 2018 Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang

We give a polynomial-time algorithm for learning latent-state linear dynamical systems without system identification, and without assumptions on the spectral radius of the system's transition matrix.

Towards Provable Control for Unknown Linear Dynamical Systems

no code implementations ICLR 2018 Sanjeev Arora, Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang

We study the control of symmetric linear dynamical systems with unknown dynamics and a hidden state.

Learning Linear Dynamical Systems via Spectral Filtering

1 code implementation NeurIPS 2017 Elad Hazan, Karan Singh, Cyril Zhang

We present an efficient and practical algorithm for the online prediction of discrete-time linear dynamical systems with a symmetric transition matrix.

Time Series Time Series Analysis

Not-So-Random Features

1 code implementation ICLR 2018 Brian Bullins, Cyril Zhang, Yi Zhang

We propose a principled method for kernel learning, which relies on a Fourier-analytic characterization of translation-invariant or rotation-invariant kernels.

Translation

Efficient Regret Minimization in Non-Convex Games

no code implementations ICML 2017 Elad Hazan, Karan Singh, Cyril Zhang

We consider regret minimization in repeated games with non-convex loss functions.

Cannot find the paper you are looking for? You can Submit a new open access paper.