Authors: Viacheslav Sadykov (Physics & Astronomy Department, Georgia State University), Leon Ofman (Department of Physics, Catholic University of America and NASA Goddard Space Flight Center, Code 671, Greenbelt, MD), Scott Boardsen (Goddard Planetary Heliophysics Institute, University of Maryland in Baltimore County and NASA Goddard Space Flight Center, Code 674, Greenbelt, MD), Yogesh (Department of Physics, Catholic University of America and NASA Goddard Space Flight Center, Code 671, Greenbelt, MD)
Analysis of ion-kinetic instabilities in the solar wind plasmas is crucial for understanding the solar wind energetics and dynamics in the heliosphere, as evident from spacecraft observations. In this work, we explore machine learning and deep learning classification and regression models to identify and categorize the stable and unstable cases of ion velocity distribution functions (VDFs). The VDFs are obtained from hybrid particle-in-cell (hybrid-PIC) models of kinetic protons and alpha particles with background electron fluid. Using 22 hybrid-PIC models with different initialization parameters (anisotropies of ion populations, relative drift velocities, abundances, and plasma beta settings), we prepare an extensive data set of more than 1200 VDFs corresponding to stable and unstable cases and related anisotropy and magnetic energy growth rate. We also calculate the properties of the VDFs, such as the moments along and across the direction of the background magnetic field, the temperature anisotropies of the distributions, and the ion beam kinetic properties. The machine learning algorithms tested in this work are (1) the standard feature-based classifiers/regressors, such as support vector machines and multi-layer perceptrons, applied to the VDF moments, and (2) the deep learning model based on the convolutional neural networks (CNN) applied directly to VDFs as images in the parallel and perpendicular velocity phase space plane. Our initial results demonstrate that the best-performing classifiers, such as Random Forest or CNN, demonstrate an accuracy of ~0.95 on the classification problem. We also discuss how different strategies of sampling the simulation runs affect the performance of classifiers.