Search Results for author: Youliang Yuan

Found 7 papers, 4 papers with code

How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

1 code implementation18 Mar 2024 Jen-tse Huang, Eric John Li, Man Ho Lam, Tian Liang, Wenxuan Wang, Youliang Yuan, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Michael R. Lyu

Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs).

Decision Making

New Job, New Gender? Measuring the Social Bias in Image Generation Models

no code implementations1 Jan 2024 Wenxuan Wang, Haonan Bai, Jen-tse Huang, Yuxuan Wan, Youliang Yuan, Haoyi Qiu, Nanyun Peng, Michael R. Lyu

BiasPainter uses a diverse range of seed images of individuals and prompts the image generation models to edit these images using gender, race, and age-neutral queries.

Fairness Image Generation

The Earth is Flat? Unveiling Factual Errors in Large Language Models

no code implementations1 Jan 2024 Wenxuan Wang, Juluan Shi, Zhaopeng Tu, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu

Current methods for evaluating LLMs' veracity are limited by test data leakage or the need for extensive human labor, hindering efficient and accurate error detection.

In-Context Learning Multiple-choice

A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models

no code implementations1 Jan 2024 Yuxuan Wan, Wenxuan Wang, Yiliu Yang, Youliang Yuan, Jen-tse Huang, Pinjia He, Wenxiang Jiao, Michael R. Lyu

In addition, the test cases of LogicAsker can be further used to design demonstration examples for in-context learning, which effectively improves the logical reasoning ability of LLMs, e. g., 10\% for GPT-4.

Code Generation In-Context Learning +2

All Languages Matter: On the Multilingual Safety of Large Language Models

1 code implementation2 Oct 2023 Wenxuan Wang, Zhaopeng Tu, Chang Chen, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu

In this work, we build the first multilingual safety benchmark for LLMs, XSafety, in response to the global deployment of LLMs in practice.

Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench

1 code implementation2 Oct 2023 Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho Lam, Shujie Ren, Youliang Yuan, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu

Large Language Models (LLMs) have recently showcased their remarkable capacities, not only in natural language processing tasks but also across diverse domains such as clinical medicine, legal consultation, and education.

Benchmarking

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

1 code implementation12 Aug 2023 Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Pinjia He, Shuming Shi, Zhaopeng Tu

We propose a novel framework CipherChat to systematically examine the generalizability of safety alignment to non-natural languages -- ciphers.

Ethics

Cannot find the paper you are looking for? You can Submit a new open access paper.