1 code implementation • 5 Mar 2024 • Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah, Hyunwoo Kim, Yulia Tsvetkov, Yejin Choi, Sherif Saad, Santu Rana
In this paper, we introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent, compared to what is revealed by prompting the target model with the training data directly, which is the dominant approach of quantifying memorization in LLMs.
no code implementations • 21 Jan 2024 • Aly M. Kassem, Sherif Saad
TPRL leverages FLAN T5, a language model, as a generator and employs a self learned policy using a proximal policy gradient to generate the adversarial examples automatically.
1 code implementation • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing 2023 • Aly Kassem, Omar Mahmoud, Sherif Saad
Large Language models (LLMs) are trained on vast amounts of data, including sensitive information that poses a risk to personal privacy if exposed.
1 code implementation • The 12th International Symposium on Foundations & Practice of Security, At Toulouse, France 2020 • William Briguglio, Sherif Saad
This is because the models are complex, and most of them work as a black-box.
BIG-bench Machine Learning Interpretable Machine Learning +2
no code implementations • 18 May 2019 • Sherif Saad, William Briguglio, Haytham Elmiligi
Then, we discuss how malware detection in the wild present unique challenges for the current state-of-the-art machine learning techniques.