pd4ml (Physics Data for Machine Learning)

Introduced by Benato et al. in Shared Data and Algorithms for Deep Learning in Fundamental Physics

pd4ml is a collection of datasets from fundamental physics research -- including particle physics, astroparticle physics, and hadron- and nuclear physics -- for supervised machine learning studies. These datasets, containing hadronic top quarks, cosmic-ray induced air showers, phase transitions in hadronic matter, and generator-level histories, are made public to simplify future work on cross-disciplinary machine learning and transfer learning in fundamental physics.

It currently consists on 5 datasets:

  • Top Tagging Landscape (Classification)
    • Train/val/test: 1.2M/400k/400k
    • Structure: Four vectors
    • Dimension: 200 particles, 4 features/particle
  • Smart Backgrounds (Classification)
    • Train/val/test: 157k/39k/84k
    • Structure: Decay Graph
    • Dimension: 100 particles, 9 features/particle
  • Spinodal or Not (Classification)
    • Train/val/test: 16.3k/4k/8.7k
    • Structure: 2D Histogram
    • Dimension: 20x20 histogram of pion spectra
  • EoS (Classification)
    • Train/val/test: 121k/25k/54k
    • Structure: 2D Histogram
    • Dimension: 24x24 histogram of pion spectra
  • Air Showers (Regression)
    • Train/val/test: 56k/30k/14k
    • Structure: 81 1D Traces
    • Dimension: 81 stations, 80 signal bins + timing

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • Unknown

Modalities


Languages