Search Results for author: Xinyue Shen

Found 10 papers, 5 papers with code

Comprehensive Assessment of Jailbreak Attacks Against LLMs

no code implementations8 Feb 2024 Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang Zhang

Some jailbreak prompt datasets, available from the Internet, can also achieve high attack success rates on many LLMs, such as ChatGLM3, GPT-3. 5, and PaLM2.

Ethics

Comprehensive Assessment of Toxicity in ChatGPT

no code implementations3 Nov 2023 Boyang Zhang, Xinyue Shen, Wai Man Si, Zeyang Sha, Zeyuan Chen, Ahmed Salem, Yun Shen, Michael Backes, Yang Zhang

Moderating offensive, hateful, and toxic language has always been an important but challenging topic in the domain of safe use in NLP.

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

1 code implementation7 Aug 2023 Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang

The misuse of large language models (LLMs) has garnered significant attention from the general public and LLM vendors.

Community Detection

Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models

1 code implementation23 May 2023 Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, Yang Zhang

Our evaluation result shows that 24% of the generated images using DreamBooth are hateful meme variants that present the features of the original hateful meme and the target individual/community; these generated images are comparable to hateful meme variants collected from the real world.

In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT

no code implementations18 Apr 2023 Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang

In this paper, we perform the first large-scale measurement of ChatGPT's reliability in the generic QA scenario with a carefully curated set of 5, 695 questions across ten datasets and eight domains.

Question Answering

MGTBench: Benchmarking Machine-Generated Text Detection

2 code implementations26 Mar 2023 Xinlei He, Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang

Extensive evaluations on public datasets with curated texts generated by various powerful LLMs such as ChatGPT-turbo and Claude demonstrate the effectiveness of different detection methods.

Benchmarking Question Answering +4

Prompt Stealing Attacks Against Text-to-Image Generation Models

1 code implementation20 Feb 2023 Xinyue Shen, Yiting Qu, Michael Backes, Yang Zhang

In this paper, we perform the first study on understanding the threat of a novel attack, namely prompt stealing attack, which aims to steal prompts from generated images by text-to-image generation models.

Text-to-Image Generation

Backdoor Attacks in the Supply Chain of Masked Image Modeling

no code implementations4 Oct 2022 Xinyue Shen, Xinlei He, Zheng Li, Yun Shen, Michael Backes, Yang Zhang

Different from previous work, we are the first to systematically threat modeling on SSL in every phase of the model supply chain, i. e., pre-training, release, and downstream phases.

Contrastive Learning Self-Supervised Learning

Nonconvex Sparse Logistic Regression with Weakly Convex Regularization

no code implementations7 Aug 2017 Xinyue Shen, Yuantao Gu

In this work we propose to fit a sparse logistic regression model by a weakly convex regularized nonconvex optimization problem.

regression

Disciplined Multi-Convex Programming

3 code implementations12 Sep 2016 Xinyue Shen, Steven Diamond, Madeleine Udell, Yuantao Gu, Stephen Boyd

A multi-convex optimization problem is one in which the variables can be partitioned into sets over which the problem is convex when the other variables are fixed.

Optimization and Control

Cannot find the paper you are looking for? You can Submit a new open access paper.