1 / 19

Rule-based Cross-matching of Very Large Catalogs

Rule-based Cross-matching of Very Large Catalogs. Patrick Ogle and the NED Team IPAC, California Institute of Technology. NASA Extragalactic Database (NED). A fusion of multi-wavelength extragalactic data from journal articles and large catalogs. NED Holdings (October 2014). 2MASS PSC.

Télécharger la présentation

Rule-based Cross-matching of Very Large Catalogs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rule-based Cross-matching of Very Large Catalogs Patrick Ogle and the NED Team IPAC, California Institute of Technology

  2. NASA Extragalactic Database (NED) A fusion of multi-wavelength extragalactic data from journal articles and large catalogs

  3. NED Holdings (October 2014) 2MASS PSC And much more, including classifications, notes, images, spectra…

  4. New Cross-matching Algorithm • Very Large Catalogs (VLCs, >107 sources) • Find candidate matches in NED • Select best match • Rule-based • Statistical analysis • Match data recorded in DB • Reversible and iterable GALEX ASC (NUV) vs. SDSS DR6 (gri, 6’x6’)

  5. Cross-match Inputs • VLC Source and NED Object Positions (RA, Dec, ±) Source-Object Separation (s, ±σ) • Source and Object Types (galaxy, galaxy cluster, star, UV source, etc…) • Background Object Density (measured for each source) • Instrumental Beam Size • Other: redshift, photometry, diameters

  6. NED Pipeline for Very Large Catalogs • Source Loader • Load Very Large Catalog (VLC) source names and positions into NED. • CSearch (PostgreSQL) • Find match candidates withNED near position search • Count background objects • Spatial indexing will speed up search (e.g. Q3C, HTM) • MatchExpert(python) • Select best match from CSearch match candidates • Object associations for no-matches • Record match statistics for each match • Match statistic distributions and integrals • Code migration to DBMS for speed • Object Loader (PostgreSQL) • Create NED cross-IDs • new objects • associations Source Loader CSearch MatchEx Object Loader

  7. MatchEx Logic S<Scut Thresholds Type Match Name Prefix Match P>Pcut Match List from Csearch S1/S2 <0.33 Error Circles Overlap Single Good Match Create NED object and associations No Match NED dup. NED Cross-ID Match

  8. Associations • Where a match is not made to a nearby object, an association record may be created. • Association types: • Source and object position error circles overlap () • Object is within the beam (PSF) of the source () Error Circles Overlap Create Error Overlap Association record No Match S<beam Create In Beam Association record

  9. Application to GALEX ASC Catalog GALEX ASC (NUV) vs. NED NED object GALEX search region Background region • GALEX All-Sky Catalog of ~40 milllion unique NUV sources created by M. Seibert (2012) • Matched against ~180 million NED objects(2013) SDSS DR6 (g,r,i) SDSS DR6 (gri, 6’x6’)

  10. Poisson Match Probability • Search radius: rs= 7.5″ for GALEX • Background radius: rb=46.5″ for GALEX • Density of background NED objects: n = N/(πrb2) • Expected number inside s: <Ns> = N(s/rb)2, s = separation • Poisson probability of x = k objects closer than s: • Ps(x=k) = <Ns>k exp(-<Ns>)/k! • For k=0, simplifies to: Ps(x=0) = exp(-<Ns>) = exp(-N(s/rb)2) • False-match probability: Pf = 1-Ps(0) rb Example: N = 4, s/rb= 0.08 Ps(0) = 0.975 Pf= 0.025 rs s

  11. Optimizing Match Selection • Optimize on 100K subsample in SDSS region • False-positive rate decreases with increasing Poisson cutoff. • False negative rate increases with Poisson cutoff. • Give 10x weight to false positives--it’s worse to make an incorrect match than to miss a match. • Poisson cutoff value of 90% minimizes the combined, weighted error rate.

  12. GALEX ASC Match Results: Totals • 39,570,031 input GALEX ASC UV sources • NED (2013) contained ~180 million distinct objects • 10,595,382 (26.8%) of the ASC sources matched NED objects  Cross-IDs • 28,974,649 (73.2%) are not matched new NED objects • 68.2% of GASC sources are in blank NED fields • 5.0% have multiple match candidates Image credit : GALEX NASA/JPL-Caltech/SSC

  13. GALEX ASC Match Results: Background Rejection and False-Negative Rate • Uncorrelated background out to 15 arcsec fit by straight line: dN/ds ~ s • MatchEx is successful at filtering out this background. • False-negative rate fn = 2.4% estimated by comparison to background-subtracted • match candidates (red line). false negatives Separation (arcsec)

  14. GALEX ASC Results: False Positive Rate • The false-positive match rate is estimated by summing the Poisson statistic (1-P) over all matches and dividing by the total number of sources : fp=0.25% 20 15 Number 10 5

  15. GALEX ASC Results: Position Error Distribution • The distribution of normalized separation r=s/σ deviates from a Gaussian. The peak is at 0.9 instead of 1.0, and the tail is stronger. Important Lessons Learned: Do not assume reported catalog position errors are correct. Do not assume position error distributions are Gaussian. A 3.5σ threshold on match separation rejected more candidates than expected. Derivative of a Gaussian Number r=s/σ

  16. Comparison to SDSS Photometry • While no color criteria were used to select matches to GALEX sources, the NUV-g colors of GALEX-SDSS matches were checked: Most matches have -7<NUV-g<7 • GALEX ASC range: 14<NUV<24 • Detection rate falls at NUV>21.7

  17. Results by Object Type • Object Types ordered by candidate match frequency • Most GALEX sources matched to galaxies (G) and stars (*) • QSO, Galactic star (!*), UV excess object (UvES), and WD* matches overrepresented, • as might be expected for a UV-selected catalog. • Matches to RadioS, XrayS, GGroup, and GPair candidates were disallowed.

  18. GALEX Photometry in NED • GALEX ASC photometry added to NED spectral energy distribution of 3C 382 (CGCG 173-014) • Over 145 million GALEX ASC NUVand FUV photometry • records added to NED (2 extraction methods per band)

  19. VLCs in NED, now and future • GALEX ASC: ~40,000,000 UV sources loaded and matched (2013) • GALEX MSC: ~22,000,000 UV sources loaded and matched (2014) • Spitzer Source List: ~42,000,000 MIR sources (2014) • 2MASS PSC: ~471,000,000 NIR sources loaded(2015 finish) • AllWISE: ~748,000,000 MIR sources (2015 start) • SDSS DR10: ~469,000,000 Vis sources (2015 start) • SDSS DR6: ~154,000,000 Vis sources loaded and matched (out of 217M), excluding sources with undesirable flag values (2008) NED aims to quadruple its object holdings in the next year!

More Related