CommitChronicle is a dataset for commit message generation (and/or completion).

Its key features:

  • large-scale and multilingual: contains 10.7M commits from 11.9k GitHub repositories in 20 programming languages;
  • diverse: avoids restrictive filtering on commit messages or commit diffs structure;
  • suitable for experiments with commit history: provides metadata about commit authors and dates and uses split-by-project.

Available on 🤗 : JetBrains-Research/commit-chronicle

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages