SAP Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

The **SAP benchmark** is a significant development in the realm of **attack prompt generation** for **red teaming** and **defending large language models (LLMs)**. Let's delve into the details:

1. **Objective**:
   - The primary goal of the SAP benchmark is to evaluate the safety and robustness of LLMs against **red teaming attacks**.
   - Red teaming attacks involve inducing LLMs to generate harmful or inappropriate content.

2. **Methodology**:
   - The SAP benchmark combines both manual and automatic methods to generate high-quality attack prompts.
   - It leverages the impressive capabilities of newly emerged LLMs.
   - Specifically, it instructs LLMs to mimic human-generated prompts through **in-context learning**.
   - The attack framework is designed to create these prompts.

3. **Defense Framework**:
   - In addition to attacking LLMs, the SAP benchmark proposes a defense framework.
   - This framework fine-tunes victim LLMs through **iterative interactions** with the attack framework.
   - The goal is to enhance the safety of LLMs against red teaming attacks.

4. **Validation and Datasets**:
   - Extensive experiments on different LLMs validate the effectiveness of both the attack and defense frameworks.
   - As part of this work, the authors release a series of **attack prompt datasets** named **SAP** with varying sizes.
   - These datasets facilitate safety evaluation and enhancement for a broader range of LLMs¹.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

SAP

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Usage

License

Modalities

Languages

SAP

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Usage

License Edit

Modalities Edit

Languages Edit