HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus

6 Sep 2023  ·  Zhenpeng Su, Xing Wu, Wei Zhou, Guangyuan Ma, Songlin Hu ·

ChatGPT has gained significant interest due to its impressive performance, but people are increasingly concerned about its potential risks, particularly around the detection of AI-generated content (AIGC), which is often difficult for untrained humans to identify. Current datasets utilized for detecting ChatGPT-generated text primarily center around question-answering, yet they tend to disregard tasks that possess semantic-invariant properties, such as summarization, translation, and paraphrasing. Our primary studies demonstrate that detecting model-generated text on semantic-invariant tasks is more difficult. To fill this gap, we introduce a more extensive and comprehensive dataset that considers more types of tasks than previous work, including semantic-invariant tasks. In addition, the model after a large number of task instruction fine-tuning shows a strong powerful performance. Owing to its previous success, we further instruct fine-tuning T\textit{k}-instruct and build a more powerful detection system.

PDF Abstract

Datasets


Introduced in the Paper:

HC3 Plus

Used in the Paper:

LCSTS

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here