Authors: Alison Jarvis (CIRES/NOAA), Christian Bethge (CIRES/NOAA), Jonathan M. Darnel (CIRES/NOAA), Gabriel I. Dima (CIRES/NOAA), J. Marcus Hughes (SWRI), Fadil Inceoglu (CIRES/NOAA), Larisza D. Krista (CIRES/NOAA), Donald Schmit (CIRES/NOAA)
Real-time identification and characterization of solar phenomena such as active regions, coronal holes and flares is an effective tool in forecasting space weather events. Many algorithms have been developed to automatically perform this task, an example of which is a NOAA operational, machine learning based algorithm which generates a product called thematic maps. These are real-time maps of the sun derived from GOES-R Solar Ultraviolet Imager (SUVI) data, in which pixels are classified into categories representing different solar phenomena. We propose that one of the major limitations to improving these types of algorithms is the source of ground truth data used to train and validate them, given that any biases or inaccuracies present in said data limit the overall accuracy of even the best trained algorithm. We use our challenges in developing thematic maps to exemplify this. For well-studied phenomena such as coronal holes and flares, we compare multiple sources of ground truth data commonly used to train and/or validate these algorithms, and identify discrepancies among them. We further compare these sources to hand drawn expert labels, and assess consistency amongst labels assigned by different experts. Finally, we discuss methods to consolidate inconsistent annotations into consensus labels.