1 code implementation • 8 Feb 2024 • Zhiheng Xi, Wenxiang Chen, Boyang Hong, Senjie Jin, Rui Zheng, wei he, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuanjing Huang
In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models.
1 code implementation • 17 Jan 2024 • Trung Quoc Luong, Xinbo Zhang, Zhanming Jie, Peng Sun, Xiaoran Jin, Hang Li
ReFT first warmups the model with SFT, and then employs on-line reinforcement learning, specifically the PPO algorithm in this paper, to further fine-tune the model, where an abundance of reasoning paths are automatically sampled given the question and the rewards are naturally derived from the ground-truth answers.
1 code implementation • 20 Sep 2023 • Zhanming Jie, Trung Quoc Luong, Xinbo Zhang, Xiaoran Jin, Hang Li
We also find that Python is a better choice of language than Wolfram for program CoTs.
no code implementations • Findings (ACL) 2022 • Jiangjie Chen, Rui Xu, Ziquan Fu, Wei Shi, Zhongqiao Li, Xinbo Zhang, Changzhi Sun, Lei LI, Yanghua Xiao, Hao Zhou
Holding the belief that models capable of reasoning should be right for the right reasons, we propose a first-of-its-kind Explainable Knowledge-intensive Analogical Reasoning benchmark (E-KAR).
no code implementations • 29 Sep 2021 • Xinbo Zhang, Changzhi Sun, Yue Zhang, Lei LI, Hao Zhou
Logical reasoning over natural text is an important capability towards human level intelligence.
1 code implementation • Findings (ACL) 2021 • Changzhi Sun, Xinbo Zhang, Jiangjie Chen, Chun Gan, Yuanbin Wu, Jiaze Chen, Hao Zhou, Lei LI
In this paper, we propose PRobr, a novel approach for joint answer prediction and proof generation.
1 code implementation • 25 Dec 2020 • Jiangjie Chen, Qiaoben Bao, Changzhi Sun, Xinbo Zhang, Jiaze Chen, Hao Zhou, Yanghua Xiao, Lei LI
The final claim verification is based on all latent variables.
no code implementations • EMNLP 2018 • Sen Hu, Lei Zou, Xinbo Zhang
Although natural language question answering over knowledge graphs have been studied in the literature, existing methods have some limitations in answering complex questions.