Authors: Griffin Goodwin (Georgia State University), Viacheslav Sadykov (Georgia State University), Petrus Martens (Georgia State University)
In this study, our goal is to explore the factors that influence machine learning-based predictions of solar flares in a simulated operational environment. Using Georgia State University’s Space Weather Analytics for Solar Flares benchmark dataset (doi:10.7910/DVN/EBCFKM), we investigate the impact of training windows, data volumes, and the solar X-ray background flux on decision tree, support vector machine, and feed-forward neural network model performance.
We train classifiers using three window types: stationary, rolling, and expanding. The stationary window utilizes a single set of data available before the forecasting instance, which remains constant throughout the testing period. The rolling window utilizes data from a constant time interval before the forecasting instance, which moves with the testing period. Finally, the expanding window utilizes all available data before the forecasting instance. Each window was tested with a range of time scales (5, 8, 11, 14, 17, 20 months) and feature counts (1, 5, 10, 25, 50, 120). To our surprise, skill scores remained consistent regardless of the window, feature, and classifier combination used. This implies that comparable forecasts can be achieved even with limited amounts of data and training, highlighting the inherent simplicity of our point-in-time magnetogram data.
In addition, we find a moderate positive correlation (ρ = 0.64 ± 0.08) between the X-ray background flux and the percentage of non-flaring false positive predictions. This suggests that the solar cycle phase has a large influence on forecasting performance.
Lastly, to gain better intuition about our results, we present a novel visualization that provides insightful analysis into the temporal performance of a classifier.