GPTFuzzer Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

**GPTFuzzer** is a fascinating project that explores **red teaming** of large language models (LLMs) using **auto-generated jailbreak prompts**. Let's dive into the details:

1. **Project Overview**:
   - **GPTFuzzer** aims to assess the security and robustness of LLMs by crafting prompts that can potentially lead to harmful or unintended behavior.
   - The project focuses on **GPT-3** and similar models.

2. **Datasets**:
   - The datasets used in **GPTFuzzer** include:
     - **Harmful Questions**: Sampled from public datasets like **llm-jailbreak-study** and **hh-rlhf**.
     - **Human-Written Templates**: Collected from **llm-jailbreak-study**.
     - **Responses**: Gathered by querying models like **Vicuna-7B**, **ChatGPT**, and **Llama-2-7B-chat**.

3. **Models**:
   - The judgment model is a **finetuned RoBERTa-large** model.
   - The training code and data are available in the repository.
   - During fuzzing experiments, the model is automatically downloaded and cached.

4. **Updates**:
   - The project has received recognition and awards at conferences like **Geekcon 2023**.
   - The team continues to improve the codebase and aims to build a general black-box fuzzing framework for LLMs.

Source: Conversation with Bing, 3/17/2024
(1) sherdencooper/GPTFuzz: Official repo for GPTFUZZER - GitHub. https://github.com/sherdencooper/GPTFuzz.
(2) GPTFUZZER : Red Teaming Large Language Models with Auto ... - GitHub. https://github.com/sherdencooper/GPTFuzz/blob/master/README.md.
(3) GPTFUZZER : Red Teaming Large Language Models with Auto ... - GitHub. https://github.com/CriticalPulsar/GPTFuzz/blob/master/README.md.
(4) undefined. https://avatars.githubusercontent.com/u/37368657?v=4.
(5) undefined. https://github.com/sherdencooper/GPTFuzz/blob/master/README.md?raw=true.
(6) undefined. https://desktop.github.com.
(7) undefined. https://github.com/sherdencooper/GPTFuzz/raw/master/README.md.
(8) undefined. https://opensource.org/licenses/MIT.
(9) undefined. https://camo.githubusercontent.com/a4426cbe5c21edb002526331c7a8fbfa089e84a550567b02a0d829a98b136ad0/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d79656c6c6f772e737667.
(10) undefined. https://img.shields.io/badge/License-MIT-yellow.svg.
(11) undefined. https://arxiv.org/pdf/2309.10253.pdf.
(12) undefined. https://sherdencooper.github.io/.
(13) undefined. https://scholar.google.com/citations?user=Zv_rC0AAAAAJ&amp.
(14) undefined. http://www.dataisland.org/.
(15) undefined. http://xinyuxing.org/.
(16) undefined. https://geekcon.darknavy.com/2023/china/en/index.html.
(17) undefined. https://avatars.githubusercontent.com/u/35443979?v=4.
(18) undefined. https://github.com/CriticalPulsar/GPTFuzz/blob/master/README.md?raw=true.
(19) undefined. https://docs.github.com/articles/about-issue-and-pull-request-templates.
(20) undefined. https://github.com/CriticalPulsar/GPTFuzz/raw/master/README.md.
(21) undefined. https://scholar.google.com/citations?user=Zv_rC0AAAAAJ&hl=en.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

GPTFuzzer

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

SecurityEval

ToxicChat

Usage

License

Modalities

Languages

GPTFuzzer

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit