Search Results for author: Yongchao Zhou

Found 7 papers, 4 papers with code

Transformers Can Achieve Length Generalization But Not Robustly

no code implementations • 14 Feb 2024 • Yongchao Zhou, Uri Alon, Xinyun Chen, Xuezhi Wang, Rishabh Agarwal, Denny Zhou

We show that the success of length generalization is intricately linked to the data format and the type of position encoding.

Position

Paper
Add Code

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

no code implementations • 12 Oct 2023 • Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal

Finally, in practical scenarios with models of varying sizes, first using distillation to boost the performance of the target model and then applying DistillSpec to train a well-aligned draft model can reduce decoding latency by 6-10x with minimal performance drop, compared to standard decoding without distillation.

Knowledge Distillation Language Modelling +1

Paper
Add Code

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

1 code implementation • 25 Sep 2023 • Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, Tatsunori Hashimoto

Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks.

Language Modelling valid

Paper
Code

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

no code implementations • 23 Jun 2023 • Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos, Matthieu Geist, Olivier Bachem

Instead of solely relying on a fixed set of output sequences, GKD trains the student on its self-generated output sequences by leveraging feedback from the teacher on such sequences.

Arithmetic Reasoning Knowledge Distillation +1

Paper
Add Code

Training on Thin Air: Improve Image Classification with Generated Data

1 code implementation • 24 May 2023 • Yongchao Zhou, Hshmat Sahak, Jimmy Ba

In this paper, we present Diffusion Inversion, a simple yet effective method that leverages the pre-trained generative model, Stable Diffusion, to generate diverse, high-quality training data for image classification.

Data Augmentation Few-Shot Learning +2

Paper
Code

Large Language Models Are Human-Level Prompt Engineers

2 code implementations • 3 Nov 2022 • Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba

By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers.

Few-Shot Learning In-Context Learning +3

991

Paper
Code

Dataset Distillation using Neural Feature Regression

2 code implementations • 1 Jun 2022 • Yongchao Zhou, Ehsan Nezhadarya, Jimmy Ba

Dataset distillation can be formulated as a bi-level meta-learning problem where the outer loop optimizes the meta-dataset and the inner loop trains a model on the distilled data.

Continual Learning Image Classification +2

1,174

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.