80 likes | 216 Vues
SkyQuery is a powerful distributed query engine designed for astronomers to efficiently crossmatch astronomical catalogs across multiple wavelengths, including infrared (2MASS), visible (DSS), and ultraviolet (Galex). Handling over 100 million objects with database sizes of 1TB to 10TB, SkyQuery democratizes access by enabling on-demand crossmatching of user-specified catalog pairs and regions of interest. Utilizing SQL for complex queries, it operates directly within the database environment, ensuring speed and scalability while remaining transparent for users.
E N D
László Dobos1, Tamás Budavári2, Alex Szalay2, István Csabai1 1 Eötvös Loránd University, Hungary 2JohnsHopkins University, Baltimore SkyQuery:A distributedqueryengineforastronomy
The multiwavelengthsky infrared (2MASS) visible (DSS) ultraviolet (Galex)
Crossmatching • Astronomicalcatalogs • in RDBMS • o(100 million) objects • o(1TB – 10TB) DB size • Donebycoordinates • RA, Dec • Astrometricerror • Differentskycoverage • Differentwavelengthrange • Movingobjects etc.
Crossmatchingondemand • Crossmatchanynumber of catalogs • Allcombinationscannot be precomputed • Maybe catalogpairs? • Usercanspecify • List of catalogstomatch • Region of interes • Priorsfornon-coordinate-basedmatching
Problemdescription • Astronomers„script” whattheydo • multiplere-runs, tweakparameters etc. • huge web forms: no-no • Alldatain RDBMS • runcomputationinsidethedatabase • usemultiple servers and parallelize • must be transparentforusers • Problemdescriptionin SQL • functions and languageextensionstosupportastronomy • syntaxtoformulatethecoordinate-basedprobabilisticjoin • spatialconstraints: celestialregions
Sample SQL query SELECTs.objId, g.objID, t.objID, s.ra, s.dec, g.ra, g.dec, t.ra, t.dec, x.ra, x.decFROMSDSSDR7:GalaxiesAS sCROSS JOIN Galex:GalaxiesAS g CROSS JOIN TwoMASS:ExtendedSourcesAS tXMATCH BAYESIAN AS xMUST s ONPOINT(s.cx, s.cy, s.cz), 0.1MUST g ONPOINT(g.ra, g.dec), 0.2 MAY t ONPOINT(t.ra, t.dec), 0.5HAVING LIMIT 1e3 REGIONCIRCLE J2000 165.7, 0.3, 60 Standard SQL Probabilisticcrossmatch Spatialconstraint
Zonealgorithms • Pure SQL:Can leverage from query optimizer of SQL Server • Divide sphere into zones • ZoneID: very simple hash on declination • Indexes built on ZoneID and right ascension help very quick pre-filtering of match candidates • very well parallelized on multi-core machines • [Gray, Szalay & Nieto-Santisteban 2006, The Zones Algorithm for Finding Points-Near-a-Point or Cross-Matching Spatial Datasets]