no code implementations • 1 May 2024 • Hugh Zhang, Jeff Da, Dean Lee, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue
Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning.
1 code implementation • 22 Jan 2024 • Will LeVine, Benjamin Pikus, Jacob Phillips, Berk Norman, Fernando Amat Gil, Sean Hendryx
As deep neural networks become adopted in high-stakes domains, it is crucial to be able to identify when inference inputs are Out-of-Distribution (OOD) so that users can be alerted of likely drops in performance and calibration despite high confidence.
no code implementations • 21 Nov 2023 • Will LeVine, Benjamin Pikus, Anthony Chen, Sean Hendryx
These reward models are additionally used at inference-time to estimate LLM responses' adherence to those desired behaviors.