Adversarial Text

33 papers with code • 0 benchmarks • 2 datasets

Adversarial Text refers to a specialised text sequence that is designed specifically to influence the prediction of a language model. Generally, Adversarial Text attack are carried out on Large Language Models (LLMs). Research on understanding different adversarial approaches can help us build effective defense mechanisms to detect malicious text input and build robust language models.

Benchmarks

Add a Result

These leaderboards are used to track progress in Adversarial Text

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Adversarial Text models and implementations

QData/TextAttack

3 papers

2,764

Datasets

Latest papers

Most implemented Social Latest No code

TAPE: Assessing Few-shot Russian Language Understanding

RussianNLP/TAPE • 23 Oct 2022

Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes.

23 Oct 2022

Paper
Code

SemAttack: Natural Textual Attacks via Different Semantic Spaces

ai-secure/semattack • • Findings (NAACL) 2022

In particular, SemAttack optimizes the generated perturbations constrained on generic semantic spaces, including typo space, knowledge space (e. g., WordNet), contextualized semantic space (e. g., the embedding space of BERT clusterings), or the combination of these spaces.

03 May 2022

Paper
Code

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

javirandor/wdr • • 10 Apr 2022

Adversarial attacks are a major challenge faced by current machine learning research.

10 Apr 2022

Paper
Code

Adversarial Robustness of Neural-Statistical Features in Detection of Generative Transformers

ecrows/cgtext-detection-adv • 2 Mar 2022

The detection of computer-generated text is an area of rapidly increasing significance as nascent generative models allow for efficient creation of compelling human-like text, which may be abused for the purposes of spam, disinformation, phishing, or online influence campaigns.

02 Mar 2022

Paper
Code

SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text

quocnsh/sepp • 12 Oct 2021

In terms of misclassified texts, a classifier handles the texts with both incorrect predictions and adversarial texts, which are generated to fool the classifier, which is called a victim.

12 Oct 2021

Paper
Code

Semantic-Preserving Adversarial Text Attacks

ericlee8/space • • 23 Aug 2021

In this paper, we propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.

23 Aug 2021

Paper
Code

MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation

huawei-noah/kd-nlp • • ACL 2021

We present, MATE-KD, a novel text-based adversarial training algorithm which improves the performance of knowledge distillation.

12 May 2021

Paper
Code

Persistent Anti-Muslim Bias in Large Language Models

clip-italian/clip-italian • • 14 Jan 2021

It has been observed that large-scale language models capture undesirable societal biases, e. g. relating to race and gender; yet religious bias has been relatively unexplored.

172

14 Jan 2021

Paper
Code

Generating Natural Language Attacks in a Hard Label Black Box Setting

RishabhMaheshwary/hard-label-attack • • 29 Dec 2020

Our proposed attack strategy leverages population-based optimization algorithm to craft plausible and semantically similar adversarial examples by observing only the top label predicted by the target model.

29 Dec 2020

Paper
Code

Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples

QData/TextAttack • • EMNLP (BlackboxNLP) 2020

We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks.

2,764

09 Sep 2020

Paper
Code

Adversarial Text

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result