HarmfulTasks (Harmful and Malicious Tasks for LLMs)

Introduced by Hasan et al. in Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning

This is a dataset of 225 malicious tasks, which were integrated into ten distinct jailbreaking prompts. The malicious tasks were divided into five categories, namely,

  1. Misinformation and Disinformation
  2. Security Threats and Cybercrimes
  3. Unlawful Behaviors and Activities
  4. Hate Speech and Discrimination
  5. Substance Abuse and Dangerous Practices.

The jailbreaking prompts were carefully selected to cover a diverse range of scenarios. These scenarios included role-playing, simulations, attention-shifting, and privileged execution, and the placement of the malicious task within the jailbreaking prompts was also varied.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


Modalities


Languages