1 code implementation • 30 Nov 2023 • Marwa Abdulhai, Isadora White, Charlie Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine
Developing such algorithms requires tasks that can gauge progress on algorithm design, provide accessible and reproducible evaluations for multi-turn interactions, and cover a range of task properties and challenges in improving reinforcement learning algorithms.
1 code implementation • 25 May 2023 • Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao liu, Pieter Abbeel, Sergey Levine, Dawn Song
This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model.
no code implementations • 30 Sep 2022 • Charlie Snell, Dan Klein, Ruiqi Zhong
We show that context distillation is a general method to train language models, and it can effectively internalize 3 types of training signals.
1 code implementation • 5 Jun 2022 • Charlie Snell, Ilya Kostrikov, Yi Su, Mengjiao Yang, Sergey Levine
Large language models distill broad knowledge from text corpora.
1 code implementation • 25 May 2022 • Ruiqi Zhong, Charlie Snell, Dan Klein, Jason Eisner
We introduce APEL, a framework in which non-programmers select among candidate programs generated by a seed semantic parser (e. g., Codex).
no code implementations • Findings (NAACL) 2022 • Charlie Snell, Mengjiao Yang, Justin Fu, Yi Su, Sergey Levine
Goal-oriented dialogue systems face a trade-off between fluent language generation and task-specific control.
1 code implementation • 28 Jan 2022 • Ruiqi Zhong, Charlie Snell, Dan Klein, Jacob Steinhardt
We then re-rank the descriptions by checking how often they hold on a larger set of samples with a learned verifier.
1 code implementation • 13 Mar 2021 • Charlie Snell, Ruiqi Zhong, Dan Klein, Jacob Steinhardt
Our approximation explains why models sometimes attend to salient words, and inspires a toy example where a multi-head attention model can overcome the above hard training distribution by improving learning dynamics rather than expressiveness.
no code implementations • 16 Aug 2020 • Charlie Snell, Ruiqi Zhong, Jacob Steinhardt, Dan Klein
If we ablate attention by fixing it to uniform, the output relevance still correlates with the attention of a normally trained model; but if we instead ablate output relevance, attention cannot be learned.