100 likes | 246 Vues
This work introduces a distributed genetic algorithm (DGA) for feature selection in the context of the Gaia Radial Velocity Spectrometer (RVS) data analysis. Utilizing artificial neural networks (ANN) for parameterization of physical parameters like effective temperatures and metallicities, the study employs a discrete wavelet transform to preprocess the spectra. By addressing dimensionality reduction and computational efficiency through DGA, we enhance the model's performance on simulated data across varying signal-to-noise ratios (SNR). The implementation is executed using C++ with MPI and OpenMP, showcasing scalable solutions for complex tasks.
E N D
DistributedGeneticAlgorithmforfeatureselection in Gaia RVS spectra Applicationto ANN parameterization D.Fustes, D.Ordóñez, C.Dafonte, M.Manteiga and B. Arcay
Introduction • GGG (Galician Group for Gaia): Part of CU8 in DPAC. Involved in classification and parameterization tasks using AI techniques • Work with simulated data of the RVS instrument: • Estimation of physical parameters: • Effective temperatures • Superficial gravities • Metallicities • Abundancies of alpha elements
Gaia RVS simulated data • Library compiled by A. Recio, P. de Laverny and B. Plez • 971 points per spectra. • Different SNR levels: 5,10,50, 200, .. • 70% data to train the Network and 30% to test the model • Use of ANN networks to perform the parameterization
Discrete Wavelet Transform • Redundant filtering process: • High-pass filters to generate Details • Low-pass filters to generate Approximations • Use of level 3 DWT: A3+D3+D2+D1, 997 points
Feature selection • Reduce the spectra to fewer dimensionality • Reduce the complexity of the models • Reduce the computational needs • Variability-based methods: Reduce the dimensionality of a set capturing most of its variability (PCA) • They can not be specialized to capture the features relevant to the estimation of each parameter • Genetic Algorithm to select relevant areas for each parameter
Genetic algorithm • Based on the Evolution’s Theory • Best individuals reproduce and pass to the next generation • Fitness function: Train the ANN, test it and inverse the mean error. Computationally expensive!!!
Distributed computation • Huge computation needs lead to scalable solutions • Multicomputers are cheaper than supercomputers • Ways to distribute the algorithm • Low level: Distribute the ANN computation: • It should be performed in hardware • Medium level: Distribute the ANN learning • Possible with batch learning • Online learning perform better in this case • High level: Distribute the fitness computation • It was implemented in C++ with MPI and OpenMp
Results(1) • SNR 200 • Original spectra
Results(2) • SNR 200 • Wavelet domain