JuICe is a corpus of 1.5 million examples with a curated test set of 3.7K instances based on online programming assignments. Compared with existing contextual code generation datasets, JuICe provides refined human-curated data, open-domain code, and an order of magnitude more training data.

Source: JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages