Search Results for author: Hugh Zhang

Found 6 papers, 3 papers with code

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

1 code implementation • 22 Feb 2024 • Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener

The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candidate completions.

Code Generation Language Modelling

Paper
Code

Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization

no code implementations • 19 Feb 2024 • Luca D'Amico-Wong, Hugh Zhang, Marc Lanctot, David C. Parkes

We propose ABCs (Adaptive Branching through Child stationarity), a best-of-both-worlds algorithm combining Boltzmann Q-learning (BQL), a classic reinforcement learning algorithm for single-agent domains, and counterfactual regret minimization (CFR), a central algorithm for learning in multi-agent domains.

counterfactual OpenAI Gym +1

Paper
Add Code

Chain-of-Thought Reasoning is a Policy Improvement Operator

no code implementations • 15 Sep 2023 • Hugh Zhang, David C. Parkes

We introduce SECToR (Self-Education via Chain-of-Thought Reasoning), a proof-of-concept demonstration that language models can teach themselves new skills using chain-of-thought reasoning.

Self-Learning

Paper
Add Code

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

1 code implementation • Science 2022 • Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sash Mitts, Aditya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu, Hugh Zhang, Markus Zijlstra

Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge.

1,238

Paper
Code

Trading Off Diversity and Quality in Natural Language Generation

no code implementations • EACL (HumEval) 2021 • Hugh Zhang, Daniel Duckworth, Daphne Ippolito, Arvind Neelakantan

For open-ended language generation tasks such as storytelling and dialogue, choosing the right decoding algorithm is critical to controlling the tradeoff between generation quality and diversity.

Text Generation

Paper
Add Code

Unifying Human and Statistical Evaluation for Natural Language Generation

2 code implementations • NAACL 2019 • Tatsunori B. Hashimoto, Hugh Zhang, Percy Liang

How can we measure whether a natural language generation system produces both high quality and diverse outputs?

Sentence Text Generation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.