Merging data and simulation for machine learning
Project: Aspen Open Jets: Unlocking LHC Data for Foundation Models in Particle Physics
In 2024, members of the CMS collaboration released a new dataset called Aspen Open Jets, combining simulated jets with real data from proton-proton collisions at the Large Hadron Collider. This dataset consists of approximately 180 million jets and is designed to be used for training high-energy physics foundation models — powerful computer programs that can learn from large amounts of data and then be adapted to new tasks.
Researchers are using this data to develop models that predict the behavior of jets — collimated sprays of particles that are key to understanding particle collisions. The Aspen Open Jets dataset allows the training these foundation models, and early results show significant improvements in jet prediction, opening up new possibilities for exploring the fundamental physics of the universe.

Resource
Advancing AI/ML research with public datasets
Project: MicroBooNE public datasets
Fermilab and its affiliated experiments produce large amounts of high-quality scientific data, which can be useful for testing models not originally designed for physics applications, particularly in the field of foundation models. Fermilab actively supports the public use of its datasets for AI and machine learning research. In 2017, the MicroBoone neutrino experiment released a dataset containing over one million neutrino events for public analysis. This dataset has since been used to develop machine learning algorithms capable of handling complex point clouds of events — a challenging task that now serves as a benchmark for AI/ML techniques.

Using AI to diagnose beam loss in real time
Project: READS: Disentangling beam losses in the Fermilab Main Injector enclosure using real-time Edge AI
At Fermilab’s Accelerator Complex, the Main Injector, a powerful linear accelerator, accelerates proton beams to exceptional energies, while the Recycler Ring uses a circular path to recycle and further boost these beams. These two machines, operating in tandem, can sometimes fall out of sync and cause beam loss, which requires precise diagnostics.
READS, the Accelerator Real-time Edge AI for Distributed Systems project, has developed an AI/ML model deployed on fast FPGA hardware to analyze this data in real time. This system disentangles mixed beam losses and assigns probabilities to each machine’s contribution. Its real-time inferences are streamed to the accelerator controls network, helping operators tune the machines for optimal performance.

Making complex physics simulation more efficient
Project: Efficient many-jet event generation with Flow Matching
High-energy physics requires a lot of very accurate simulation data, which can be computationally expensive. Using Flow Matching combined with Continuous Normalizing Flows, an AI method that can run data both ways through the model, simulations can run up to 150 times more efficiently without losing accuracy.
AI chatbot to improve accelerator operations
Project: Converting digital logbooks into a conversational chatbot
As part of accelerator operations, operators carefully log their time and actions in an electronic logbook (eLog) whenever there are changes to how the accelerator is running. These detailed entries are valuable to other operators for troubleshooting problems, but can be challenging to interpret because operators often describe similar problems using different terminology. To address this, Fermilab developed an AI bot that can read eLogs and answer questions about operator’s response to past problems. This tool has measurably improved accelerator performance with demonstrable decreases in downtime.
Resource
AI models speed up particle simulations at the HL-LHC
Project: CaloDiffusion: Denoising diffusion models with geometry adaptation for high fidelity calorimeter simulation
Researchers are using innovative computing programs, called generative machine learning models, to simulate data from the upcoming High-Luminosity Large Hadron Collider. These models are faster and more efficient than traditional physics-based computing simulations, which is important given that the new collider will generate huge amounts of data from complex detectors.
One of these programs, called CaloDiffusion, uses 3D computer models to simulate how particles interact with detectors. It belongs to a new class of machine learning models known as denoising diffusion models, which have recently become the leading approach in image generation tasks. CaloDiffusion adapts to the detector’s geometry, including its irregular structures, and produces results that closely match traditional simulations, offering a powerful and scalable solution for future collider experiments.

Resource
New AI tool to reconstruct neutrino events
Project: NuGraph: Graph neural network for neutrino physics event reconstruction
To analyze data from liquid-argon time projection chambers, scientists at Fermilab have developed a neural network called NuGraph. The system interprets the energy traces left behind by particles in the detector, transforming them into complex graphs where energy depositions are represented as interconnected nodes across multiple layers.
Building on this foundation, NuGraph2 introduces enhanced capabilities for both identifying and classifying particle interactions with greater accuracy.
Using a graph neural network with specialized decoders, NuGraph2 can distinguish true particle signal from cosmic background noise and classify those signals into different types of particle interactions. This innovative approach offers a powerful tool for studying elusive particles like neutrinos.

AI-enabled detectors for faster particle tracking in physics experiments
Project: Towards on-sensor inference of charged particle track parameters and uncertainties
Tracking particles in high-energy physics experiments presents a unique challenge. Next-generation detectors will enable more precise measure of the angles at which particles pass through. While this technology enhances offline tracking, its full potential remains limited by constraints at the lowest-level hardware trigger. To address this, Fermilab researchers are integrating mixture density networks directly into the detector hardware. These networks can estimate particle angles and positions, along with associated uncertainty, which greatly improve the speed of the tracking process.
Resource
Research toward reliable uncertainty quantification
Project: Quantifying Uncertainty in AI
Uncertainty quantification in AI models is a foundational challenge in modern data analysis.
Scientists are exploring how AI can accurately estimate uncertainty in scientific calculations, like those used in astrophysics. Current AI-based methods struggle to reliably assess this uncertainty, particularly when dealing with noisy data or complex problems. This highlights the need for improved calibration techniques to ensure AI models provide meaningful and trustworthy uncertainty estimates. At Fermilab, scientists study how error propagates through data to AI methods, and how to quantify those impacts.
Using domain adaptation to increase AI model robustness
Project: SIDDA: SInkhorn Dynamic Domain Adaptation for image classification with equivariant neural networks
When a neural network is trained on one set of data, it often doesn’t perform well on a slightly different one. This happens when the underlying patterns of the data change, even if the relationships between the data points remain the same, also known as domain shift. This effect can be reduced using a technique called domain adaptation, or DA, which helps the neural network only learn things about shared inputs in both datasets.
Fermilab is leading the development of domain adaptation algorithms for science. The SIDDA method achieves this without complex adjustments using equivariant neural networks, also reducing computational costs.
