Search Results for author: Hugh Zhang

Found 6 papers, 3 papers with code

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

1 code implementation22 Feb 2024 Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener

The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candidate completions.

Code Generation Language Modelling

Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization

no code implementations19 Feb 2024 Luca D'Amico-Wong, Hugh Zhang, Marc Lanctot, David C. Parkes

We propose ABCs (Adaptive Branching through Child stationarity), a best-of-both-worlds algorithm combining Boltzmann Q-learning (BQL), a classic reinforcement learning algorithm for single-agent domains, and counterfactual regret minimization (CFR), a central algorithm for learning in multi-agent domains.

counterfactual OpenAI Gym +1

Chain-of-Thought Reasoning is a Policy Improvement Operator

no code implementations15 Sep 2023 Hugh Zhang, David C. Parkes

We introduce SECToR (Self-Education via Chain-of-Thought Reasoning), a proof-of-concept demonstration that language models can teach themselves new skills using chain-of-thought reasoning.

Self-Learning

Trading Off Diversity and Quality in Natural Language Generation

no code implementations EACL (HumEval) 2021 Hugh Zhang, Daniel Duckworth, Daphne Ippolito, Arvind Neelakantan

For open-ended language generation tasks such as storytelling and dialogue, choosing the right decoding algorithm is critical to controlling the tradeoff between generation quality and diversity.

Text Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.