GPTFuzzer is a fascinating project that explores red teaming of large language models (LLMs) using auto-generated jailbreak prompts. Let's dive into the details:

  1. Project Overview:
  2. GPTFuzzer aims to assess the security and robustness of LLMs by crafting prompts that can potentially lead to harmful or unintended behavior.
  3. The project focuses on GPT-3 and similar models.

  4. Datasets:

  5. The datasets used in GPTFuzzer include:

    • Harmful Questions: Sampled from public datasets like llm-jailbreak-study and hh-rlhf.
    • Human-Written Templates: Collected from llm-jailbreak-study.
    • Responses: Gathered by querying models like Vicuna-7B, ChatGPT, and Llama-2-7B-chat.
  6. Models:

  7. The judgment model is a finetuned RoBERTa-large model.
  8. The training code and data are available in the repository.
  9. During fuzzing experiments, the model is automatically downloaded and cached.

  10. Updates:

  11. The project has received recognition and awards at conferences like Geekcon 2023.
  12. The team continues to improve the codebase and aims to build a general black-box fuzzing framework for LLMs.

Source: Conversation with Bing, 3/17/2024 (1) sherdencooper/GPTFuzz: Official repo for GPTFUZZER - GitHub. https://github.com/sherdencooper/GPTFuzz. (2) GPTFUZZER : Red Teaming Large Language Models with Auto ... - GitHub. https://github.com/sherdencooper/GPTFuzz/blob/master/README.md. (3) GPTFUZZER : Red Teaming Large Language Models with Auto ... - GitHub. https://github.com/CriticalPulsar/GPTFuzz/blob/master/README.md. (4) undefined. https://avatars.githubusercontent.com/u/37368657?v=4. (5) undefined. https://github.com/sherdencooper/GPTFuzz/blob/master/README.md?raw=true. (6) undefined. https://desktop.github.com. (7) undefined. https://github.com/sherdencooper/GPTFuzz/raw/master/README.md. (8) undefined. https://opensource.org/licenses/MIT. (9) undefined. https://camo.githubusercontent.com/a4426cbe5c21edb002526331c7a8fbfa089e84a550567b02a0d829a98b136ad0/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d79656c6c6f772e737667. (10) undefined. https://img.shields.io/badge/License-MIT-yellow.svg. (11) undefined. https://arxiv.org/pdf/2309.10253.pdf. (12) undefined. https://sherdencooper.github.io/. (13) undefined. https://scholar.google.com/citations?user=Zv_rC0AAAAAJ&amp. (14) undefined. http://www.dataisland.org/. (15) undefined. http://xinyuxing.org/. (16) undefined. https://geekcon.darknavy.com/2023/china/en/index.html. (17) undefined. https://avatars.githubusercontent.com/u/35443979?v=4. (18) undefined. https://github.com/CriticalPulsar/GPTFuzz/blob/master/README.md?raw=true. (19) undefined. https://docs.github.com/articles/about-issue-and-pull-request-templates. (20) undefined. https://github.com/CriticalPulsar/GPTFuzz/raw/master/README.md. (21) undefined. https://scholar.google.com/citations?user=Zv_rC0AAAAAJ&hl=en.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages