no code implementations • 28 Feb 2024 • Zhengping Jiang, Yining Lu, Hanjie Chen, Daniel Khashabi, Benjamin Van Durme, Anqi Liu
This is achieved by assessing the conditional V-information \citep{hewitt-etal-2021-conditional} with a predictive family robust against leaky features that can be exploited by a small model.
1 code implementation • 19 Feb 2024 • Xiao Ye, Andrew Wang, Jacob Choi, Yining Lu, Shreya Sharma, Lingfeng Shen, Vijay Tiyyala, Nicholas Andrews, Daniel Khashabi
Our benchmarking approach focuses on aspects of this ability that are common among humans: (i) recalling related experiences from a large amount of information, and (ii) applying analogical reasoning to complex and lengthy scenarios.
1 code implementation • 17 Jul 2023 • Yining Lu, Haoping Yu, Daniel Khashabi
GEAR achieves better efficiency by delegating tool grounding and execution to small language models (SLM) and LLM, respectively; while leveraging semantic and pattern-based evaluation at both question and answer levels for generalizable tool grounding.
1 code implementation • 17 Nov 2022 • Yining Lu, Jingxi Qiu, Gaurav Gupta
Subjective answer evaluation is a time-consuming and tedious task, and the quality of the evaluation is heavily influenced by a variety of subjective personal characteristics.
no code implementations • 25 Jun 2022 • Yining Lu, Changjie Lu, Naina Bandyopadhyay, Manoj Kumar, Gaurav Gupta
In order to evaluate the proposed RTB strategy's performance, we demonstrate the results on ten sequential simulated auction campaigns.