Search Results for author: Charlie Snell

Found 9 papers, 6 papers with code

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

1 code implementation30 Nov 2023 Marwa Abdulhai, Isadora White, Charlie Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine

Developing such algorithms requires tasks that can gauge progress on algorithm design, provide accessible and reproducible evaluations for multi-turn interactions, and cover a range of task properties and challenges in improving reinforcement learning algorithms.

reinforcement-learning Text Generation

The False Promise of Imitating Proprietary LLMs

1 code implementation25 May 2023 Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao liu, Pieter Abbeel, Sergey Levine, Dawn Song

This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model.

Language Modelling

Learning by Distilling Context

no code implementations30 Sep 2022 Charlie Snell, Dan Klein, Ruiqi Zhong

We show that context distillation is a general method to train language models, and it can effectively internalize 3 types of training signals.

Language Modelling Text-To-SQL

Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL

1 code implementation25 May 2022 Ruiqi Zhong, Charlie Snell, Dan Klein, Jason Eisner

We introduce APEL, a framework in which non-programmers select among candidate programs generated by a seed semantic parser (e. g., Codex).

Bayesian Inference Text-To-SQL

Describing Differences between Text Distributions with Natural Language

1 code implementation28 Jan 2022 Ruiqi Zhong, Charlie Snell, Dan Klein, Jacob Steinhardt

We then re-rank the descriptions by checking how often they hold on a larger set of samples with a learned verifier.

Binary Classification Re-Ranking

Approximating How Single Head Attention Learns

1 code implementation13 Mar 2021 Charlie Snell, Ruiqi Zhong, Dan Klein, Jacob Steinhardt

Our approximation explains why models sometimes attend to salient words, and inspires a toy example where a multi-head attention model can overcome the above hard training distribution by improving learning dynamics rather than expressiveness.

Understanding Attention Training via Output Relevance

no code implementations16 Aug 2020 Charlie Snell, Ruiqi Zhong, Jacob Steinhardt, Dan Klein

If we ablate attention by fixing it to uniform, the output relevance still correlates with the attention of a normally trained model; but if we instead ablate output relevance, attention cannot be learned.

Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.