Texts

HarmfulTasks (Harmful and Malicious Tasks for LLMs)

Introduced by Hasan et al. in Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning

This is a dataset of 225 malicious tasks, which were integrated into ten distinct jailbreaking prompts. The malicious tasks were divided into five categories, namely,

Misinformation and Disinformation
Security Threats and Cybercrimes
Unlawful Behaviors and Activities
Hate Speech and Discrimination
Substance Abuse and Dangerous Practices.

The jailbreaking prompts were carefully selected to cover a diverse range of scenarios. These scenarios included role-playing, simulations, attention-shifting, and privileged execution, and the placement of the malicious task within the jailbreaking prompts was also varied.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

HarmfulTasks (Harmful and Malicious Tasks for LLMs)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Usage

License

Modalities

Languages

HarmfulTasks (Harmful and Malicious Tasks for LLMs)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages