no code implementations • 27 May 2024 • Yuzi Yan, Jialian Li, Yipin Zhang, Dong Yan
This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs), with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series.
no code implementations • 21 May 2024 • Xingzhou Lou, Junge Zhang, Jian Xie, Lifeng Liu, Dong Yan, Kaiqi Huang
Human preference alignment is critical in building powerful and reliable large language models (LLMs).
no code implementations • 15 Feb 2024 • Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang
Then, based on this framework, we introduce the IBN to analyze generalization in the reward modeling stage of RLHF.
1 code implementation • 19 Sep 2023 • Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, Juntao Dai, Kun Fang, Lei Su, Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, MingAn Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, WeiPeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, Zhiying Wu
Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering.
no code implementations • 9 Mar 2023 • Chengyang Ying, Zhongkai Hao, Xinning Zhou, Hang Su, Songming Liu, Dong Yan, Jun Zhu
Extensive experiments in both image-based and state-based tasks show that TAD can significantly improve the performance of handling different tasks simultaneously, especially for those with high TDR, and display a strong generalization ability to unseen tasks.
no code implementations • 2 Nov 2022 • Yao Feng, Yuhong Jiang, Hang Su, Dong Yan, Jun Zhu
Model-based reinforcement learning usually suffers from a high sample complexity in training the world model, especially for the environments with complex dynamics.
Model-based Reinforcement Learning reinforcement-learning +1
1 code implementation • 15 Sep 2022 • Chengyang Ying, Zhongkai Hao, Xinning Zhou, Hang Su, Dong Yan, Jun Zhu
In this paper, we reveal that the instability is also related to a new notion of Reuse Bias of IS -- the bias in off-policy evaluation caused by the reuse of the replay buffer for evaluation and optimization.
1 code implementation • 9 Jun 2022 • Chengyang Ying, Xinning Zhou, Hang Su, Dong Yan, Ning Chen, Jun Zhu
Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation.
no code implementations • 13 Mar 2022 • Jialian Li, Tongzheng Ren, Dong Yan, Hang Su, Jun Zhu
Our goal is to identify a near-optimal robust policy for the perturbed testing environment, which introduces additional technical difficulties as we need to simultaneously estimate the training environment uncertainty from samples and find the worst-case perturbation for testing.
1 code implementation • 29 Jul 2021 • Jiayi Weng, Huayu Chen, Dong Yan, Kaichao You, Alexis Duburcq, Minghao Zhang, Yi Su, Hang Su, Jun Zhu
In this paper, we present Tianshou, a highly modularized Python library for deep reinforcement learning (DRL) that uses PyTorch as its backend.
no code implementations • ICML Workshop AML 2021 • Chengyang Ying, Xinning Zhou, Dong Yan, Jun Zhu
Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty caused by stochastic policies and environment variability.
no code implementations • 1 Jan 2021 • Guan Wang, Dong Yan, Hang Su, Jun Zhu
In this work, we point out that the optimal value of n actually differs on each data point, while the fixed value n is a rough average of them.
no code implementations • 11 Jun 2020 • Lei Ma, Ke Guan, Dong Yan, Danping He, Nuno R. Leonor, Bo Ai, Junhyeong Kim
In this paper, the satellite-terrestrial channel at 22. 6 GHz is characterized for a typical high-speed railway (HSR) environment.
no code implementations • ICLR 2020 • Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu
In this paper, we present Lazy-CFR, a CFR algorithm that adopts a lazy update strategy to avoid traversing the whole game tree in each round.
no code implementations • 27 Jan 2019 • Haosheng Zou, Tongzheng Ren, Dong Yan, Hang Su, Jun Zhu
Reward shaping is one of the most effective methods to tackle the crucial yet challenging problem of credit assignment in Reinforcement Learning (RL).
no code implementations • 10 Oct 2018 • Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu
In this paper, we present a novel technique, lazy update, which can avoid traversing the whole game tree in CFR, as well as a novel analysis on the regret of CFR with lazy update.