Artificial Intelligence

Aspen Open Jets

Merging data and simulation for machine learning

Project: Aspen Open Jets: Unlocking LHC Data for Foundation Models in Particle Physics


In 2024, members of the CMS collaboration released a new dataset called Aspen Open Jets, combining simulated jets with real data from proton-proton collisions at the Large Hadron Collider. This dataset consists of approximately 180 million jets and is designed to be used for training high-energy physics foundation models — powerful computer programs that can learn from large amounts of data and then be adapted to new tasks.

Researchers are using this data to develop models that predict the behavior of jets — collimated sprays of particles that are key to understanding particle collisions. The Aspen Open Jets dataset allows the training these foundation models, and early results show significant improvements in jet prediction, opening up new possibilities for exploring the fundamental physics of the universe.

Image credit: CERN