Authors: Talwinder Singh (Georgia State University), Luke Farris (The University of Alabama in Huntsville), Christian Hall (The University of Alabama in Huntsville), Timothy Newman (The University of Alabama in Huntsville), Bernard Benson (McLeod Software Corporation), Syed Raza (The University of Alabama in Huntsville), Nikolai Pogorelov (The University of Alabama in Huntsville)
technologies. In this work, we have performed an in-depth study into solar flare forecasting using machine-learning (ML) models. These ML models utilize Space-weather HMI Active Region Patch (SHARP) parameters extracted from the SDO Helioseismic and Magnetic Imager (HMI) data. Six models are considered, including k-Nearest Neighbors (k-NN), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Quadratic Discriminant Analysis (QDA), Random Forest (RF) Classifier, and Support Vector Machine (SVM). The importance of each SHARP parameter in model performance is examined. A major contribution of the work here is a consideration of the effects of multiple data curation methods on the predictive performance of the models. The results here indicate that using high quality data can markedly improve accuracy. Further, the work demonstrates that using SHARPs that contain more than one active region (AR) limits the performance of these models. Another major component of this work examines the incorporation of AR flaring history into the forecasting process. That examination indicates that such incorporation substantively improves forecasting accuracy. The performance of the models with varying lead times is also explored here, which can provide insight on the best prediction windows.