Distributed Genetic Algorithm for Feature Selection in Gaia RVS Spectra Applications

DistributedGeneticAlgorithmforfeatureselection in Gaia RVS spectra Applicationto ANN parameterization D.Fustes, D.Ordóñez, C.Dafonte, M.Manteiga and B. Arcay

Introduction • GGG (Galician Group for Gaia): Part of CU8 in DPAC. Involved in classification and parameterization tasks using AI techniques • Work with simulated data of the RVS instrument: • Estimation of physical parameters: • Effective temperatures • Superficial gravities • Metallicities • Abundancies of alpha elements

Gaia RVS simulated data • Library compiled by A. Recio, P. de Laverny and B. Plez • 971 points per spectra. • Different SNR levels: 5,10,50, 200, .. • 70% data to train the Network and 30% to test the model • Use of ANN networks to perform the parameterization

Discrete Wavelet Transform • Redundant filtering process: • High-pass filters to generate Details • Low-pass filters to generate Approximations • Use of level 3 DWT: A3+D3+D2+D1, 997 points

Feature selection • Reduce the spectra to fewer dimensionality • Reduce the complexity of the models • Reduce the computational needs • Variability-based methods: Reduce the dimensionality of a set capturing most of its variability (PCA) • They can not be specialized to capture the features relevant to the estimation of each parameter • Genetic Algorithm to select relevant areas for each parameter

Genetic algorithm • Based on the Evolution’s Theory • Best individuals reproduce and pass to the next generation • Fitness function: Train the ANN, test it and inverse the mean error. Computationally expensive!!!

Distributed computation • Huge computation needs lead to scalable solutions • Multicomputers are cheaper than supercomputers • Ways to distribute the algorithm • Low level: Distribute the ANN computation: • It should be performed in hardware • Medium level: Distribute the ANN learning • Possible with batch learning • Online learning perform better in this case • High level: Distribute the fitness computation • It was implemented in C++ with MPI and OpenMp

Results(1) • SNR 200 • Original spectra

Results(2) • SNR 200 • Wavelet domain

Thank You for your attention!!!Any question?

Distributed Genetic Algorithm for Feature Selection in Gaia RVS Spectra Applications

Distributed Genetic Algorithm for Feature Selection in Gaia RVS Spectra Applications

Presentation Transcript

Distributed Probabilistic Model-Building Genetic Algorithm

Feature selection

Feature Selection

Feature selection

Feature Selection

Efficient huge-scale feature selection with speciated genetic algorithm

Feature Selection

Genetic-Algorithm-Based Instance and Feature Selection

Feature Selection

Feature selection

Feature Selection

Feature Selection

Feature Selection

A genetic algorithm-based method for feature subset selection

Hierarchical Distributed Genetic Algorithm for Image Segmentation

Feature selection

Feature Selection

Feature Selection

Feature selection

Genetic-Algorithm-Based Instance and Feature Selection

Sequential Genetic Search for Ensemble Feature Selection