1 code implementation • 16 Apr 2024 • Pengyu Cheng, Tianhao Hu, Han Xu, Zhisong Zhang, Yong Dai, Lei Han, Nan Du
Hence, we are curious about whether LLMs' reasoning ability can be further enhanced by Self-Play in this Adversarial language Game (SPAG).
1 code implementation • 12 Dec 2023 • Dun Zeng, Yong Dai, Pengyu Cheng, Longyue Wang, Tianhao Hu, Wanshun Chen, Nan Du, Zenglin Xu
Our analysis reveals a correlation between the calibration performance of reward models (RMs) and the alignment performance of LLMs.
1 code implementation • 14 Nov 2023 • Pengyu Cheng, Yifan Yang, Jian Li, Yong Dai, Tianhao Hu, Peixin Cao, Nan Du
Human preference alignment is essential to improve the interaction quality of large language models (LLMs).
1 code implementation • 7 Sep 2022 • Tianhao Hu, Bangti Jin, Zhi Zhou
Extensive numerical experiments in two- and multi-dimensional spaces with point sources, line sources or their combinations are presented to illustrate the efficiency of the proposed approach, and a comparative study with several existing approaches based on neural networks is also given, which shows clearly its competitiveness for the specific class of problems.