Backdoor Attack

144 papers with code • 0 benchmarks • 0 datasets

Backdoor attacks inject maliciously constructed data into a training set so that, at test time, the trained model misclassifies inputs patched with a backdoor trigger as an adversarially-desired target class.

Libraries

Use these libraries to find Backdoor Attack models and implementations
3 papers
45
2 papers
75

Most implemented papers

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

Megum1/DFST 21 Dec 2020

Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data.

LIRA: Learnable, Imperceptible and Robust Backdoor Attacks

pibo16/backdoor_attacks ICCV 2021

Under this optimization framework, the trigger generator function will learn to manipulate the input with imperceptible noise to preserve the model performance on the clean data and maximize the attack success rate on the poisoned data.

Targeted Attack against Deep Neural Networks via Flipping Limited Weight Bits

jiawangbai/TA-LBF ICLR 2021

By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem, which can be effectively and efficiently solved using the alternating direction method of multipliers (ADMM) method.

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

thunlp/HiddenKiller ACL 2021

As far as we know, almost all existing textual backdoor attack methods insert additional contents into normal samples as triggers, which causes the trigger-embedded samples to be detected and the backdoor attacks to be blocked without much effort.

Triggerless Backdoor Attack for NLP Tasks with Clean Labels

leileigan/clean_label_textual_backdoor_attack NAACL 2022

To deal with this issue, in this paper, we propose a new strategy to perform textual backdoor attacks which do not require an external trigger, and the poisoned samples are correctly labeled.

Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information

ruoxi-jia-group/narcissus-backdoor-attack 11 Apr 2022

With poisoning equal to or less than 0. 5% of the target-class data and 0. 05% of the training set, we can train a model to classify test examples from arbitrary classes into the target class when the examples are patched with a backdoor trigger.

Neurotoxin: Durable Backdoors in Federated Learning

jhcknzzm/federated-learning-backdoor 12 Jun 2022

In this type of attack, the goal of the attacker is to use poisoned updates to implant so-called backdoors into the learned model such that, at test time, the model's outputs can be fixed to a given target for certain inputs.

Backdoor Attacks Against Dataset Distillation

liuyugeng/baadd 3 Jan 2023

A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset.

Adversarial Feature Map Pruning for Backdoor

retsuh-bqw/fmp 21 Jul 2023

Unlike existing defense strategies, which focus on reproducing backdoor triggers, FMP attempts to prune backdoor feature maps, which are trained to extract backdoor information from inputs.

Universal Jailbreak Backdoors from Poisoned Human Feedback

ethz-spylab/rlhf-poisoning 24 Nov 2023

Reinforcement Learning from Human Feedback (RLHF) is used to align large language models to produce helpful and harmless responses.