Adversarial Text

33 papers with code • 0 benchmarks • 2 datasets

Adversarial Text refers to a specialised text sequence that is designed specifically to influence the prediction of a language model. Generally, Adversarial Text attack are carried out on Large Language Models (LLMs). Research on understanding different adversarial approaches can help us build effective defense mechanisms to detect malicious text input and build robust language models.

Libraries

Use these libraries to find Adversarial Text models and implementations
3 papers
2,753

Latest papers with no code

Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods

no code yet • 8 Apr 2024

In various real-world applications such as machine translation, sentiment analysis, and question answering, a pivotal role is played by NLP models, facilitating efficient communication and decision-making processes in domains ranging from healthcare to finance.

Goal-guided Generative Prompt Injection Attack on Large Language Models

no code yet • 6 Apr 2024

Although there is currently a large amount of research on prompt injection attacks, most of these black-box attacks use heuristic strategies.

Few-Shot Adversarial Prompt Learning on Vision-Language Models

no code yet • 21 Mar 2024

The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention.

Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory

no code yet • 19 Mar 2024

Vision-language pre-training (VLP) models exhibit remarkable capabilities in comprehending both images and text, yet they remain susceptible to multimodal adversarial examples (AEs).

A Curious Case of Searching for the Correlation between Training Data and Adversarial Robustness of Transformer Textual Models

no code yet • 18 Feb 2024

Traditional adversarial evaluation is often done \textit{only after} fine-tuning the models and ignoring the training data.

Adversarial Text Purification: A Large Language Model Approach for Defense

no code yet • 5 Feb 2024

Adversarial purification is a defense mechanism for safeguarding classifiers against adversarial attacks without knowing the type of attacks or training of the classifier.

Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?

no code yet • 9 Jun 2023

This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes.

How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks

no code yet • 24 May 2023

Natural Language Processing (NLP) models based on Machine Learning (ML) are susceptible to adversarial attacks -- malicious algorithms that imperceptibly modify input text to force models into making incorrect predictions.

Iterative Adversarial Attack on Image-guided Story Ending Generation

no code yet • 16 May 2023

Multimodal learning involves developing models that can integrate information from various sources like images and texts.

Towards Imperceptible Document Manipulations against Neural Ranking Models

no code yet • 3 May 2023

Additionally, current methods rely heavily on the use of a well-imitated surrogate NRM to guarantee the attack effect, which makes them difficult to use in practice.