Search Results for author: Yongchao Zhou

Found 7 papers, 4 papers with code

Transformers Can Achieve Length Generalization But Not Robustly

no code implementations14 Feb 2024 Yongchao Zhou, Uri Alon, Xinyun Chen, Xuezhi Wang, Rishabh Agarwal, Denny Zhou

We show that the success of length generalization is intricately linked to the data format and the type of position encoding.

Position

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

no code implementations12 Oct 2023 Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal

Finally, in practical scenarios with models of varying sizes, first using distillation to boost the performance of the target model and then applying DistillSpec to train a well-aligned draft model can reduce decoding latency by 6-10x with minimal performance drop, compared to standard decoding without distillation.

Knowledge Distillation Language Modelling +1

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

1 code implementation25 Sep 2023 Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, Tatsunori Hashimoto

Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks.

Language Modelling valid

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

no code implementations23 Jun 2023 Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos, Matthieu Geist, Olivier Bachem

Instead of solely relying on a fixed set of output sequences, GKD trains the student on its self-generated output sequences by leveraging feedback from the teacher on such sequences.

Arithmetic Reasoning Knowledge Distillation +1

Training on Thin Air: Improve Image Classification with Generated Data

1 code implementation24 May 2023 Yongchao Zhou, Hshmat Sahak, Jimmy Ba

In this paper, we present Diffusion Inversion, a simple yet effective method that leverages the pre-trained generative model, Stable Diffusion, to generate diverse, high-quality training data for image classification.

Data Augmentation Few-Shot Learning +2

Large Language Models Are Human-Level Prompt Engineers

2 code implementations3 Nov 2022 Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba

By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers.

Few-Shot Learning In-Context Learning +3

Dataset Distillation using Neural Feature Regression

2 code implementations1 Jun 2022 Yongchao Zhou, Ehsan Nezhadarya, Jimmy Ba

Dataset distillation can be formulated as a bi-level meta-learning problem where the outer loop optimizes the meta-dataset and the inner loop trains a model on the distilled data.

Continual Learning Image Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.