SafetyBench is a comprehensive benchmark designed to evaluate the safety of large language models (LLMs) using multiple-choice questions. As LLMs become increasingly prevalent, concerns about their safety have grown. SafetyBench addresses this by providing a reliable evaluation framework for researchers and developers. Here are the key points about SafetyBench:
Purpose: SafetyBench aims to help researchers and developers better understand and assess the safety of LLMs. It serves as a reference for model selection and optimization, promoting the development of safe, responsible, and ethical large models that align with legislative norms, social standards, and human values¹².
Comprehensive Benchmark: SafetyBench comprises 11,435 diverse multiple-choice questions across 7 distinct categories related to safety concerns. These questions cover a wide range of topics, allowing for thorough evaluation of LLM safety¹.
Multilingual Evaluation: SafetyBench includes both Chinese and English data, making it suitable for evaluating LLMs in both languages. Researchers can assess model safety across different linguistic contexts¹.
Performance Insights: Extensive tests using SafetyBench on 25 popular Chinese and English LLMs (including zero-shot and few-shot settings) revealed that GPT-4 outperformed its counterparts. However, there is still room for improvement in enhancing the safety of existing LLMs¹.
Availability: Data and evaluation guidelines for SafetyBench are accessible through the following URLs:
In summary, SafetyBench provides a valuable resource for assessing and advancing the safety of large language models, contributing to their responsible deployment and alignment with societal norms and values.
(1) [2309.07045] SafetyBench: Evaluating the Safety of Large Language .... https://arxiv.org/abs/2309.07045. (2) SafetyBench:通过单选题评估大型语言模型安全性. https://posts.careerengine.us/p/6511afb61a8da974e9d62f40. (3) thu-coai/SafetyBench · Datasets at Hugging Face. https://huggingface.co/datasets/thu-coai/SafetyBench. (4) SafetyBench:通过单选题评估大型语言模型安全性_鲟曦研习社. https://www.kuxai.com/article/1505. (5) GitHub - thu-coai/SafetyBench: Official github repo for SafetyBench, a .... https://github.com/thu-coai/SafetyBench. (6) undefined. https://doi.org/10.48550/arXiv.2309.07045.
Paper | Code | Results | Date | Stars |
---|