1 / 37

Classification

Classification. Ashish Mahabal aam at astro.caltech.edu iPTF Summer School Caltech 2014-08-25. Need for classification. Astro datasets getting larger (TB -> PB -> …) SDSS/ CRTS/PTF/…/LSST/SKA/LIGO Transient science (multi-epoch surveys) Spectroscopy is a bottleneck

kalkin
Télécharger la présentation

Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification Ashish Mahabal aam at astro.caltech.edu iPTF Summer School Caltech 2014-08-25

  2. Need for classification Astro datasets getting larger (TB -> PB -> …) SDSS/CRTS/PTF/…/LSST/SKA/LIGO Transient science (multi-epoch surveys) Spectroscopy is a bottleneck Early characterization and classification is a must Separating ordinary and known from unknown and interesting Given the data volumes, it should be automated Ashish Mahabal

  3. Semantic Tree of Astronomical Variables and Transients AGN Subtypes SN Subtypes To understand transients, the variables need to be understood too.

  4. Computer Science Mathematics and Statistics Efficient algorithms and optimization Machine Learning abstractions and summaries Data Science Domain Specific Knowledge galaxy proximity Galactic latitude etc. Ashish Mahabal

  5. Automated Classification Techniques • Implementation of clustering algorithms in a machine-learning (ML) or AI setting • Examples: star-galaxy separation, automated galaxy morphology classification, stellar or galaxy spectral types, etc. • Supervised classifiers: a set of learning examples is provided; the number of possible classes is known • Examples: SVMs, Decision Trees, … • Unsupervised classifiers: the program decides how many classes are needed to account for the diversity of the data, and classifies on the basis of the data

  6. Variety of available tools • Python • PyML • scikit-learn • R • http://cran.r-project.org/web/views/MachineLearning.html • Matlab Ashish Mahabal

  7. From Python’s scikit-learn Ashish Mahabal

  8. Transient classification • Characteristic properties • proximity to a galaxy • Galactic latitude • proximity to a radio source • Lightcurve based quantities • amplitude • skew • Stetson J Quantify these make “priors” out of them Ashish Mahabal

  9. Simple(r) classification problem Stars and galaxies Ashish Mahabal

  10. Enter clustering • Determine the number of classes • Stars • Galaxies Ashish Mahabal

  11. Possible complications • Star - galaxy • Galaxy - galaxy (E, S0, S, Ir) • Quasar - star • Dwarfs - main sequence Ashish Mahabal

  12. Enter clustering • Determine the number of classes • Understand their properties • Extendedness • Light concentration Ashish Mahabal

  13. Measure parameters that are handles for these properties • Pixels occupied • Ratio of flux in two apertures Arun Kumar Ashish Mahabal

  14. Enter clustering • Determine the number of classes • Understand their properties • Measure parameters that are handles for these properties • Plot the parameters • “Separate” the clusters Ashish Mahabal

  15. Classification is an integral part of A’nomy • Clustering is the means to separate the classes (in an unsupervised manner) Ashish Mahabal

  16. Simple classification problemComplications: just stars and galaxies? • Stars • Galaxies • CCD defects • Cosmic rays • Bleed trails • Satellite trails • Asteroids! e.g. real-bogus or CRTS’ NN for artifact removal Ashish Mahabal

  17. Complications • How many classes are there? • Are they cleanly separated? • Brighter stars • Distant galaxies • Grazing cosmic rays Ashish Mahabal

  18. Complications How many classes are there? Are they cleanly separated? Do all objects belong to these classes? Djorgovski Ashish Mahabal

  19. Complications • How many classes are there? • Are they cleanly separated? • Do all objects belong to these classes? • Could we add observables to classify better and find rarer objects? • Another waveband? • A third one? • More epochs? Ashish Mahabal

  20. Typical Parameter Space for S/G Classif. Stellar locus  Galaxies (From DPOSS) Ashish Mahabal

  21. Automated Star-Galaxy Classification:Decision Trees (DTs) (Weir et al. 1995) Ashish Mahabal

  22. An Example: Classification of DPOSS Sources with AutoClass (an unsupervised Bayesian classifier) Class 1: stellar (PSF) Class 2: star with a fuzz Class 3: early-type galaxy Class 4: late-type galaxy Ashish Mahabal

  23. Classification is an integral part of A’nomy • Clustering is the means to separate the classes • Outliers are the interesting rarer objects which • do not belong to the main classes Ashish Mahabal

  24. Semantic Tree of Astronomical Variables and Transients AGN Subtypes SN Subtypes To understand transients, the variables need to be understood too.

  25. Debosscher+07 Richards+11 Richards+11

  26. Broad, incomplete hierarchy All transients SN CV, blazars, periodic SN I SN II CV, blazars periodic To other classifiers CV blazars

  27. Light-curve features • Measure features (metrics) for all light curves Amplitude skew beyond1std Adam Miller

  28. amplitude and std-dev for six classes of variables from CRTS Ashish Mahabal

  29. Separation is better understood when shown as density Ashish Mahabal

  30. stetson_j fold_2p_slope_10% flux_%_mid20 fold_2p_slope_90% stetson_k flux_%_mid35 flux_%_mid50 flux_%_mid65 flux_%_mid80 Many features - not all are independent Adam Miller medperc90_p2_p scatter_res_raw pair_slope_trend p2p_scatter_pfold_over_mad p2p_scatter_2praw QSO freq_signif percent_difference_flux_percentile non_QSO freq_n_alias freq_varrat max_slope freq_rrd skew beyond1std std freq_y_offset MAD Amplitude freq_model_min_delta_mag small_kurtosis p2p_scatter_over_mad freq_model_max_delta_mag percent_amplitude linear_trend freq_model_phi1_phi2 median_buffer_range_percentage p2p_ssqr_diff_over_var

  31. A Variety of parameters that can be used Not all parameters are always present • Discovery: magnitudes, delta-magnitudes • Contextual: • distance to nearest star • Magnitude of the star • color of that star • normalized distance to nearest galaxy • Distance to nearest radio source • Flux of nearest radio source • Galactic latitude • Follow-up • Colors (g-r, r-I, i-z etc.) • Prior classifications (event type) • Characteristics from light-curve • amplitude • Median buffer range percentage • Standard deviation • Stetson k • Flux percentile ratio mid80 • Prior outburst statistic http://ki-media.blogspot.com/ Bayesian Networks best to deal with such datasets as they can deal with missing data and the structure can be learnt from the data – at least in principle Ashish Mahabal

  32. Relative significance of parameters Linear trend: sign(linear trend) × log(linear trend| + 1e−06) sign(linear trend) ×√{|linear trend|} med_buf_range_per: −log(1 − med_buf_range_per) Kurtosis: log(3 + kurtosis) Parameters from Richards et al.

  33. Bits we will leave out Periodicity • Kepler – dense light-curves • irregular and sparse light-curves (most surveys) • best phasing, characteristic time-scales etc. GPR • interpolation • regular grid Ashish Mahabal

  34. Using 900 non-SNe and 600 SNe Non-SNe (1) SNe (2) 80-90% completeness using just these parameters 1 2 2 1 1 2 Ashish Mahabal

  35. Bigger Bayesian Network picture Ashish Mahabal

  36. Various methods • Support Vector machines (SVM) • Random Forests • Decision Trees (Deep learning for images) Ashish Mahabal

  37. Citizen science classfications as a path to machine learning Most data never seen by scientists Pattern matching techniques not mature enough (and may never be as mature as humans – but large data makes a difference) - Hanny’sVoorwerp is an excellent example citizen sky is a path to better understanding (not an end)

More Related