1 code implementation • 1 Apr 2024 • wei he, Shichun Liu, Jun Zhao, Yiwen Ding, Yi Lu, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang
The generated demos strategically interpolate between existing demos and the given query, transforming the query from OOD to ID.
1 code implementation • 8 Feb 2024 • Zhiheng Xi, Wenxiang Chen, Boyang Hong, Senjie Jin, Rui Zheng, wei he, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuanjing Huang
In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models.
no code implementations • 12 Dec 2023 • Yue Zhang, Ming Zhang, Haipeng Yuan, Shichun Liu, Yongyao Shi, Tao Gui, Qi Zhang, Xuanjing Huang
The three crucial questions for LLM evaluation are ``what, where, and how to evaluate''.
no code implementations • 18 Mar 2023 • Junjie Ye, Xuanting Chen, Nuo Xu, Can Zu, Zekai Shao, Shichun Liu, Yuhan Cui, Zeyang Zhou, Chao Gong, Yang shen, Jie zhou, Siming Chen, Tao Gui, Qi Zhang, Xuanjing Huang
GPT series models, such as GPT-3, CodeX, InstructGPT, ChatGPT, and so on, have gained considerable attention due to their exceptional natural language processing capabilities.