SafetyBench Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

**SafetyBench** is a comprehensive benchmark designed to evaluate the safety of large language models (LLMs) using multiple-choice questions. As LLMs become increasingly prevalent, concerns about their safety have grown. SafetyBench addresses this by providing a reliable evaluation framework for researchers and developers. Here are the key points about SafetyBench:

1. **Purpose**: SafetyBench aims to help researchers and developers better understand and assess the safety of LLMs. It serves as a reference for model selection and optimization, promoting the development of safe, responsible, and ethical large models that align with legislative norms, social standards, and human values¹².

2. **Comprehensive Benchmark**: SafetyBench comprises **11,435 diverse multiple-choice questions** across **7 distinct categories** related to safety concerns. These questions cover a wide range of topics, allowing for thorough evaluation of LLM safety¹.

3. **Multilingual Evaluation**: SafetyBench includes both **Chinese and English data**, making it suitable for evaluating LLMs in both languages. Researchers can assess model safety across different linguistic contexts¹.

4. **Performance Insights**: Extensive tests using SafetyBench on **25 popular Chinese and English LLMs** (including zero-shot and few-shot settings) revealed that **GPT-4** outperformed its counterparts. However, there is still room for improvement in enhancing the safety of existing LLMs¹.

5. **Availability**: Data and evaluation guidelines for SafetyBench are accessible through the following URLs:
   - [SafetyBench Data and Guidelines](https://arxiv.org/abs/2309.07045)
   - [SafetyBench Submission Entrance and Leaderboard](https://arxiv.org/abs/2309.07045)¹⁴.

In summary, SafetyBench provides a valuable resource for assessing and advancing the safety of large language models, contributing to their responsible deployment and alignment with societal norms and values.

(1) [2309.07045] SafetyBench: Evaluating the Safety of Large Language .... https://arxiv.org/abs/2309.07045.
(2) SafetyBench：通过单选题评估大型语言模型安全性. https://posts.careerengine.us/p/6511afb61a8da974e9d62f40.
(3) thu-coai/SafetyBench · Datasets at Hugging Face. https://huggingface.co/datasets/thu-coai/SafetyBench.
(4) SafetyBench：通过单选题评估大型语言模型安全性_鲟曦研习社. https://www.kuxai.com/article/1505.
(5) GitHub - thu-coai/SafetyBench: Official github repo for SafetyBench, a .... https://github.com/thu-coai/SafetyBench.
(6) undefined. https://doi.org/10.48550/arXiv.2309.07045.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

SafetyBench

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

ToolBench

GLUE-X

CMB

CSCD-IME

Usage

License

Modalities

Languages

SafetyBench

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit