1 code implementation • 22 Feb 2024 • Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener
The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candidate completions.
no code implementations • 19 Feb 2024 • Luca D'Amico-Wong, Hugh Zhang, Marc Lanctot, David C. Parkes
We propose ABCs (Adaptive Branching through Child stationarity), a best-of-both-worlds algorithm combining Boltzmann Q-learning (BQL), a classic reinforcement learning algorithm for single-agent domains, and counterfactual regret minimization (CFR), a central algorithm for learning in multi-agent domains.
no code implementations • 15 Sep 2023 • Hugh Zhang, David C. Parkes
We introduce SECToR (Self-Education via Chain-of-Thought Reasoning), a proof-of-concept demonstration that language models can teach themselves new skills using chain-of-thought reasoning.
1 code implementation • Science 2022 • Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sash Mitts, Aditya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu, Hugh Zhang, Markus Zijlstra
Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge.
no code implementations • EACL (HumEval) 2021 • Hugh Zhang, Daniel Duckworth, Daphne Ippolito, Arvind Neelakantan
For open-ended language generation tasks such as storytelling and dialogue, choosing the right decoding algorithm is critical to controlling the tradeoff between generation quality and diversity.
2 code implementations • NAACL 2019 • Tatsunori B. Hashimoto, Hugh Zhang, Percy Liang
How can we measure whether a natural language generation system produces both high quality and diverse outputs?