Search Results for author: Shihan Dou

Found 23 papers, 17 papers with code

CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

1 code implementation26 Feb 2024 Huijie Lv, Xiao Wang, Yuansen Zhang, Caishuang Huang, Shihan Dou, Junjie Ye, Tao Gui, Qi Zhang, Xuanjing Huang

Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs).

Code Completion Response Generation

Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution

no code implementations18 Feb 2024 Nuo Xu, Jun Zhao, Can Zu, Sixian Li, Lu Chen, Zhihao Zhang, Rui Zheng, Shihan Dou, Wenjuan Qin, Tao Gui, Qi Zhang, Xuanjing Huang

To address this issue, we propose a cost-effective preference learning strategy, optimizing reward models by distinguishing between human and machine translations.

Machine Translation Translation

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

1 code implementation8 Feb 2024 Zhiheng Xi, Wenxiang Chen, Boyang Hong, Senjie Jin, Rui Zheng, wei he, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuanjing Huang

In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models.

GSM8K reinforcement-learning +1

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

1 code implementation21 Jan 2024 Songyang Gao, Qiming Ge, Wei Shen, Shihan Dou, Junjie Ye, Xiao Wang, Rui Zheng, Yicheng Zou, Zhi Chen, Hang Yan, Qi Zhang, Dahua Lin

This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences.

Open the Pandora's Box of LLMs: Jailbreaking LLMs through Representation Engineering

no code implementations12 Jan 2024 Tianlong Li, Shihan Dou, Wenhao Liu, Muling Wu, Changze Lv, Xiaoqing Zheng, Xuanjing Huang

To overcome these limitations, we propose a novel jailbreaking approach, named Jailbreaking LLMs through Representation Engineering (JRE).

Prompt Engineering

LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

1 code implementation15 Dec 2023 Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, ShiLiang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks.

Language Modelling Multi-Task Learning +1

Tailoring Personality Traits in Large Language Models via Unsupervisedly-Built Personalized Lexicons

no code implementations25 Oct 2023 Tianlong Li, Shihan Dou, Changze Lv, Wenhao Liu, Jianhan Xu, Muling Wu, Zixuan Ling, Xiaoqing Zheng, Xuanjing Huang

Users can utilize UBPL to adjust the probability vectors of predicted words in the decoding phase of LLMs, thus influencing the personality expression of LLMs.

Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback

no code implementations8 Oct 2023 Wei Shen, Rui Zheng, WenYu Zhan, Jun Zhao, Shihan Dou, Tao Gui, Qi Zhang, Xuanjing Huang

Reinforcement learning from human feedback serves as a crucial bridge, aligning large language models with human and societal values.

Language Modelling

On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection

1 code implementation27 Jun 2023 Songyang Gao, Shihan Dou, Qi Zhang, Xuanjing Huang, Jin Ma, Ying Shan

Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications.

text-classification Text Classification

DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization

1 code implementation27 Jun 2023 Songyang Gao, Shihan Dou, Yan Liu, Xiao Wang, Qi Zhang, Zhongyu Wei, Jin Ma, Ying Shan

Adversarial training is one of the best-performing methods in improving the robustness of deep language models.

CausalAPM: Generalizable Literal Disentanglement for NLU Debiasing

no code implementations4 May 2023 Songyang Gao, Shihan Dou, Junjie Shan, Qi Zhang, Xuanjing Huang

Dataset bias, i. e., the over-reliance on dataset-specific literal heuristics, is getting increasing attention for its detrimental effect on the generalization ability of NLU models.

Causal Inference Disentanglement +2

Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence Embedding

1 code implementation14 Oct 2022 Songyang Gao, Shihan Dou, Qi Zhang, Xuanjing Huang

Dataset bias has attracted increasing attention recently for its detrimental effect on the generalization ability of fine-tuned models.

Sentence Sentence Embedding +2

VulCNN: An Image-inspired Scalable Vulnerability Detection System

1 code implementation International Conference on Software Engineering 2022 Yueming Wu, Deqing Zou, Shihan Dou, Wei Yang, Duo Xu, Hai Jin

Furthermore, we conduct a case study on more than 25 million lines of code and the result indicates that VulCNN has the ability to detect large-scale vulnerability.

Image Classification Vulnerability Detection

Decorrelate Irrelevant, Purify Relevant: Overcome Textual Spurious Correlations from a Feature Perspective

2 code implementations COLING 2022 Shihan Dou, Rui Zheng, Ting Wu, Songyang Gao, Junjie Shan, Qi Zhang, Yueming Wu, Xuanjing Huang

Most of the existing debiasing methods often identify and weaken these samples with biased features (i. e., superficial surface features that cause such spurious correlations).

Fact Verification Natural Language Inference +1

Cannot find the paper you are looking for? You can Submit a new open access paper.