SuryaBench: Benchmark Dataset for Advancing Machine Learning Applications in Heliophysics and Space Weather

Authors: Dinesha V. Hegde (Center for Space Plasma and Aeronomic Research (CSPAR), the University of Alabama in Huntsville (UAH)), Sujit Roy (Earth System Science Center (ESSC), UAH, NASA MSFC), Johannes Schmude (IBM Research), Rohit Lal (ESSC, UAH), Vishal Gaur (ESSC, UAH), Amy Lin (ESSC, UAH), Kshitiz Mandal (ESSC, UAH), Talwinder Singh (Georgia State University (GSU), Andrés Muñoz-Jaramillo (South West Research Institute), Kang Yang (GSU), Chetraj Pandey (GSU), Jinsu Hong (GSU), Berkay Aydin (GSU), Ryan McGranaghan (NASA JPL), Spiridon Kasapis (Princeton University), Vishal Upendran (SETI Institute), Shah Bahauddin (Laboratory for Atmospheric and Space Physics, University of Colorado Boulder), Daniel da Silva (NASA GSFC), Marcus Freitag (IBM Research), Iksha Gurung (ESSC, UAH), Nikolai Pogorelov (Department of Space Science & CSPAR, UAH), Campbell Watson (IBM Research), Manil Maskey (NASA MSFC), Juan Bernabe-Moreno (IBM Research), Rahul Ramachandran (NASA MSFC)

We introduce SuryaBench, an open-source, machine learning (ML)-ready heliophysics dataset suit designed to advance ML applications in solar physics and space weather forecasting. The core dataset is derived from NASA’s Solar Dynamics Observatory (SDO), including processed high-resolution imagery from the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI), spanning May 2010 to December 2024. To ensure suitability for ML tasks, the data have been preprocessed through correction of spacecraft roll angles, orbital adjustments, exposure normalization, and degradation compensation.

In addition to the core SDO data, SuryaBench includes auxiliary benchmark datasets for key heliophysics and space weather tasks, including active region segmentation, active region emergence forecasting, coronal magnetic field extrapolation, solar flare prediction, solar Extreme Ultraviolet (EUV) spectra prediction, and solar wind speed estimation. SuryaBench has also been used to train Surya, a heliophysics foundation model, demonstrating its utility as a large-scale resource for developing generalizable AI models.

By establishing a unified, standardized data collection with task-specific benchmarks, SuryaBench aims to facilitate benchmarking, enhance reproducibility, and accelerate the development of AI-driven models for space weather prediction, bridging solar physics, machine learning, and operational forecasting.

SHINE