Adversarial Text
33 papers with code • 0 benchmarks • 2 datasets
Adversarial Text refers to a specialised text sequence that is designed specifically to influence the prediction of a language model. Generally, Adversarial Text attack are carried out on Large Language Models (LLMs). Research on understanding different adversarial approaches can help us build effective defense mechanisms to detect malicious text input and build robust language models.
Benchmarks
These leaderboards are used to track progress in Adversarial Text
Libraries
Use these libraries to find Adversarial Text models and implementationsLatest papers with no code
Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods
In various real-world applications such as machine translation, sentiment analysis, and question answering, a pivotal role is played by NLP models, facilitating efficient communication and decision-making processes in domains ranging from healthcare to finance.
Goal-guided Generative Prompt Injection Attack on Large Language Models
Although there is currently a large amount of research on prompt injection attacks, most of these black-box attacks use heuristic strategies.
Few-Shot Adversarial Prompt Learning on Vision-Language Models
The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention.
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Vision-language pre-training (VLP) models exhibit remarkable capabilities in comprehending both images and text, yet they remain susceptible to multimodal adversarial examples (AEs).
A Curious Case of Searching for the Correlation between Training Data and Adversarial Robustness of Transformer Textual Models
Traditional adversarial evaluation is often done \textit{only after} fine-tuning the models and ignoring the training data.
Adversarial Text Purification: A Large Language Model Approach for Defense
Adversarial purification is a defense mechanism for safeguarding classifiers against adversarial attacks without knowing the type of attacks or training of the classifier.
Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?
This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes.
How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks
Natural Language Processing (NLP) models based on Machine Learning (ML) are susceptible to adversarial attacks -- malicious algorithms that imperceptibly modify input text to force models into making incorrect predictions.
Iterative Adversarial Attack on Image-guided Story Ending Generation
Multimodal learning involves developing models that can integrate information from various sources like images and texts.
Towards Imperceptible Document Manipulations against Neural Ranking Models
Additionally, current methods rely heavily on the use of a well-imitated surrogate NRM to guarantee the attack effect, which makes them difficult to use in practice.