1 code implementation • 30 Nov 2023 • Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam, Xiaohan Zhang, Lichao Sun, Hongning Wang, Jing Zhang, Minlie Huang, Yuxiao Dong, Jie Tang
We will provide public APIs for evaluating AlignBench with CritiqueLLM to facilitate the evaluation of LLMs' Chinese alignment.
2 code implementations • 30 Nov 2023 • Pei Ke, Bosi Wen, Zhuoer Feng, Xiao Liu, Xuanyu Lei, Jiale Cheng, Shengyuan Wang, Aohan Zeng, Yuxiao Dong, Hongning Wang, Jie Tang, Minlie Huang
Since the natural language processing (NLP) community started to make large language models (LLMs), such as GPT-4, act as a critic to evaluate the quality of generated texts, most of them only train a critique generation model of a specific scale on specific datasets.
1 code implementation • 7 Nov 2023 • Jiale Cheng, Xiao Liu, Kehan Zheng, Pei Ke, Hongning Wang, Yuxiao Dong, Jie Tang, Minlie Huang
However, these models are often not well aligned with human intents, which calls for additional treatments on them, that is, the alignment problem.
2 code implementations • 20 Apr 2023 • Hao Sun, Zhexin Zhang, Jiawen Deng, Jiale Cheng, Minlie Huang
To further promote the safe deployment of LLMs, we develop a Chinese LLM safety assessment benchmark.
no code implementations • 18 Feb 2023 • Jiawen Deng, Jiale Cheng, Hao Sun, Zhexin Zhang, Minlie Huang
This survey presents a framework for safety research pertaining to large models, delineating the landscape of safety risks as well as safety evaluation and improvement methods.
1 code implementation • 19 Dec 2022 • Jiale Cheng, Sahand Sabour, Hao Sun, Zhuang Chen, Minlie Huang
As previous studies have demonstrated that seekers' persona is an important factor for effective support, we investigate whether there are benefits to modeling such information in dialogue models for support.
1 code implementation • 4 Dec 2022 • Zhexin Zhang, Jiale Cheng, Hao Sun, Jiawen Deng, Fei Mi, Yasheng Wang, Lifeng Shang, Minlie Huang
In order to detect such toxic generations, existing methods rely on templates, real-world data extraction, crowdsourcing workers, or automatic generation to construct adversarial contexts that are likely to induce toxic generations.
1 code implementation • Findings (ACL) 2022 • Hao Sun, Guangxuan Xu, Jiawen Deng, Jiale Cheng, Chujie Zheng, Hao Zhou, Nanyun Peng, Xiaoyan Zhu, Minlie Huang
We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors in human-bot dialogue settings, with focuses on context-sensitive unsafety, which is under-explored in prior works.