Authors: Daniel Carpenter (University of Michigan), Henry Han (Baylor University), Liang Zhao (University of Michigan), Susan T. Lepri (University of Michigan)
The solar wind has been traditionally classified according to its in-situ physical properties, such as proton speed (or other related proton properties, like entropy or proton temperature), heavy ion charge states, and elemental composition. However, all of these classification schemes are more or less subjective and have some arbitrary criteria. In addition, they all have limitations that prevent them from associating any given plasma parcel detected in-situ to a specific coronal source region on the Sun. Machine learning methods have been successful in characterizing in-situ solar wind using unsupervised deep clustering and dimensionality reduction techniques, such as, Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). In this study, we explore the impact of distance metrics in several machine learning classification algorithms, for example, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), on in-situ solar wind data clustering. We evaluate the metric performance by applying it to dimension reduction stacking and deep clustering techniques (e.g., PCA+TSNE). We further analyze the classification results by comparing the different in-situ properties of the solar wind in the different categories. Our work demonstrates the potential for customized distance metrics to improve the interpretability and performance of in-situ solar wind deep clustering approaches.