Adversarial Attack
596 papers with code • 2 benchmarks • 9 datasets
An Adversarial Attack is a technique to find a perturbation that changes the prediction of a machine learning model. The perturbation can be very small and imperceptible to human eyes.
Source: Recurrent Attention Model with Log-Polar Mapping is Robust against Adversarial Attacks
Libraries
Use these libraries to find Adversarial Attack models and implementationsDatasets
Subtasks
Latest papers
Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts
We verify the effectiveness of the proposed explanations from two aspects: (1) Counterfactual Trace Visualization: the enhanced forgery images are useful to reveal artifacts by visually contrasting the original images and two different visualization methods; (2) Transferable Adversarial Attacks: the adversarial forgery images generated by attacking the detection model are able to mislead other detection models, implying the removed artifacts are general.
READ: Improving Relation Extraction from an ADversarial Perspective
This strategy enables a larger attack budget for entities and coaxes the model to leverage relational patterns embedded in the context.
Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack
While well-trained text detectors have demonstrated promising performance on unseen test data, recent research suggests that these detectors have vulnerabilities when dealing with adversarial attacks such as paraphrasing.
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Deep learning-based monocular depth estimation (MDE), extensively applied in autonomous driving, is known to be vulnerable to adversarial attacks.
$\textit{LinkPrompt}$: Natural and Universal Adversarial Attacks on Prompt-based Language Models
Prompt-based learning is a new language model training paradigm that adapts the Pre-trained Language Models (PLMs) to downstream tasks, which revitalizes the performance benchmarks across various natural language processing (NLP) tasks.
Fast Inference of Removal-Based Node Influence
We propose a new method of evaluating node influence, which measures the prediction change of a trained GNN model caused by removing a node.
Hard-label based Small Query Black-box Adversarial Attack
We consider the hard label based black box adversarial attack setting which solely observes predicted classes from the target model.
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
We find that concealing deformation perturbations in areas insensitive to human eyes can achieve a better trade-off between imperceptibility and adversarial strength, specifically in parts of the object surface that are complex and exhibit drastic curvature changes.
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
This work studies the adversarial robustness of VLMs from the novel perspective of the text prompt instead of the extensively studied model weights (frozen in this work).
Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies
In light of the burgeoning success of reinforcement learning (RL) in diverse real-world applications, considerable focus has been directed towards ensuring RL policies are robust to adversarial attacks during test time.