ChatGPT Paraphrases

This is a dataset of paraphrases created by ChatGPT.

We used this prompt to generate paraphrases:
Generate 5 similar paraphrases for this question, show it like a numbered list without commentaries: {text}

This dataset is based on the Quora paraphrase question, texts from the SQUAD 2.0 and the CNN news dataset.

We generated 5 paraphrases for each sample, totally this dataset has about 350k data rows. You can make 30 rows from a row from each sample. In this way you can make 10.5 millions train pairs (350k rows with 5 paraphrases -> 6x5x350000 = 10.5 millions of bidirected or 6x5x350000/2 = 5.25 millions of unique pairs).

We used:

  • 231927 questions from the Quora dataset

  • 92005 texts from the Squad 2.0 dataset

  • 29110 texts from the CNN news dataset

Structure of the dataset:

  • text column - an original sentence or question from the datasets

  • paraphrases - a list of 5 paraphrases

  • category - question / sentence

  • source - quora / squad_2 / cnn_news

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • openrail

Modalities


Languages