SafetyBench is a comprehensive benchmark designed to evaluate the safety of large language models (LLMs) using multiple-choice questions. As LLMs become increasingly prevalent, concerns about their safety have grown. SafetyBench addresses this by providing a reliable evaluation framework for researchers and developers. Here are the key points about SafetyBench:

  1. Purpose: SafetyBench aims to help researchers and developers better understand and assess the safety of LLMs. It serves as a reference for model selection and optimization, promoting the development of safe, responsible, and ethical large models that align with legislative norms, social standards, and human values¹².

  2. Comprehensive Benchmark: SafetyBench comprises 11,435 diverse multiple-choice questions across 7 distinct categories related to safety concerns. These questions cover a wide range of topics, allowing for thorough evaluation of LLM safety¹.

  3. Multilingual Evaluation: SafetyBench includes both Chinese and English data, making it suitable for evaluating LLMs in both languages. Researchers can assess model safety across different linguistic contexts¹.

  4. Performance Insights: Extensive tests using SafetyBench on 25 popular Chinese and English LLMs (including zero-shot and few-shot settings) revealed that GPT-4 outperformed its counterparts. However, there is still room for improvement in enhancing the safety of existing LLMs¹.

  5. Availability: Data and evaluation guidelines for SafetyBench are accessible through the following URLs:

  6. SafetyBench Data and Guidelines
  7. SafetyBench Submission Entrance and Leaderboard¹⁴.

In summary, SafetyBench provides a valuable resource for assessing and advancing the safety of large language models, contributing to their responsible deployment and alignment with societal norms and values.

(1) [2309.07045] SafetyBench: Evaluating the Safety of Large Language .... https://arxiv.org/abs/2309.07045. (2) SafetyBench:通过单选题评估大型语言模型安全性. https://posts.careerengine.us/p/6511afb61a8da974e9d62f40. (3) thu-coai/SafetyBench · Datasets at Hugging Face. https://huggingface.co/datasets/thu-coai/SafetyBench. (4) SafetyBench:通过单选题评估大型语言模型安全性_鲟曦研习社. https://www.kuxai.com/article/1505. (5) GitHub - thu-coai/SafetyBench: Official github repo for SafetyBench, a .... https://github.com/thu-coai/SafetyBench. (6) undefined. https://doi.org/10.48550/arXiv.2309.07045.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages