Search Results for author: Guoliang Dong

Found 6 papers, 2 papers with code

Evaluating and Mitigating Linguistic Discrimination in Large Language Models

no code implementations29 Apr 2024 Guoliang Dong, Haoyu Wang, Jun Sun, Xinyu Wang

The results show that LLMs exhibit stronger human alignment capabilities with queries in English, French, Russian, and Spanish (only 1. 04\% of harmful queries successfully jailbreak on average) compared to queries in Bengali, Georgian, Nepali and Maithili (27. 7\% of harmful queries jailbreak successfully on average).

Repairing Adversarial Texts through Perturbation

no code implementations29 Dec 2021 Guoliang Dong, Jingyi Wang, Jun Sun, Sudipta Chattopadhyay, Xinyu Wang, Ting Dai, Jie Shi, Jin Song Dong

Furthermore, such attacks are impossible to eliminate, i. e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training.

Adversarial Text

Automatic Fairness Testing of Neural Classifiers through Adversarial Sampling

no code implementations17 Jul 2021 Peixin Zhang, Jingyi Wang, Jun Sun, Xinyu Wang, Guoliang Dong, Xingen Wang, Ting Dai, Jin Song Dong

In this work, we bridge the gap by proposing a scalable and effective approach for systematically searching for discriminatory samples while extending existing fairness testing approaches to address a more challenging domain, i. e., text classification.

Fairness text-classification +1

Towards Repairing Neural Networks Correctly

no code implementations3 Dec 2020 Guoliang Dong, Jun Sun, Jingyi Wang, Xinyu Wang, Ting Dai

Neural networks are increasingly applied to support decision making in safety-critical applications (like autonomous cars, unmanned aerial vehicles and face recognition based authentication).

Decision Making Face Recognition

Towards Interpreting Recurrent Neural Networks through Probabilistic Abstraction

1 code implementation22 Sep 2019 Guoliang Dong, Jingyi Wang, Jun Sun, Yang Zhang, Xinyu Wang, Ting Dai, Jin Song Dong, Xingen Wang

In this work, we propose an approach to extract probabilistic automata for interpreting an important class of neural networks, i. e., recurrent neural networks.

Machine Translation Object Recognition

Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing

5 code implementations14 Dec 2018 Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, Peixin Zhang

We thus first propose a measure of `sensitivity' and show empirically that normal samples and adversarial samples have distinguishable sensitivity.

Two-sample testing

Cannot find the paper you are looking for? You can Submit a new open access paper.