1 / 12

How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh)

How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases. ILMT Software Data acquisition Clustering Framework (HPC) Data reduction Image subtraction (application to GL) Databases (RDBMS) ….

leroyt
Télécharger la présentation

How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases • ILMT Software • Data acquisition • Clustering Framework (HPC) • Data reduction • Image subtraction (application to GL) • Databases (RDBMS) • …. • GAIA QSO Classifier Software Poels, J. ARC Liège, 11 February 2010

  2. Motivation • We explore ways of doing spatial search within a relational database • hierarchical triangular mesh and HEALPix (a tessellation of the sphere) as a zoned bucketing system, representing areas as disjunctive-normal form constraints. • The approach has the virtue that the zone mechanism works well on B-Trees native to all SQL systems and integrates naturally with current query optimizers • Involved projects: • SDSS (Sloan Digital Sky Survey) • GSC (Guide Star Catalog) Palomar and UK Schmidt surveys • COBE (Cosmic Background Explorer) • WMAP (Wilkinson Microwave Anisotropy Probe) • ….

  3. ILMT_operating_param time temperature CCD position rotation (ppm) ........

  4. Iterative_cat_1 Iterative_cat_2 Iterative_cat_(N-1) Iterative_cat_N ILMT_reference_cat 2MASS/SDSS cat ILMT_reference_cat

  5. Fits_files_cat fits_file_id filename path file_type processing_level x_global_0 y_global_0 alpha_0

  6. Reference_img_cat source_id fits_file_id1 fits_file_id2 x_local y_local x_global y_global alpha delta flux_aperture flux_psf_fit Mag_R Object_type Processing_No Fwhm Isolated_flag

  7. Night_1_cat Night_2_cat Night_3_cat Night_N_cat Objects_rowid_ptr source_id fits_file_id1 fits_file_id2 x_local y_local x_global y_global alpha delta flux_aperture flux_psf_fit Mag_R Object_type Processing_Nr Fwhm Isolated_flag source_id fits_file_id1 fits_file_id2 x_local y_local x_global y_global alpha delta flux_aperture flux_psf_fit Mag_R Object_type Processing_Nr Fwhm Isolated_flag source_id fits_file_id1 fits_file_id2 x_local y_local x_global y_global alpha delta flux_aperture flux_psf_fit Mag_R Object_type Processing_Nr Fwhm Isolated_flag source_id fits_file_id1 fits_file_id2 x_local y_local x_global y_global alpha delta flux_aperture flux_psf_fit Mag_R Object_type Processing_Nr Fwhm Isolated_flag source_id Night1_rowid Night2_rowid … … … … … … … NightN_rowid Ref_img_cat_rowid Night catalogs

  8. Is this (horizontal time) model suitable ? • At the beginning the RDBMS should run smoothly but after 5 years of operation ? • Indexing is not an easy task • A given source will be measures an order of 10^3 where each measure set featuring ~ 200 bytes • Assuming ~10x10^6 sources we get a multi-TB DB (for alphanumeric data only)! Consider also ~ 25TB of image data • Example of performance bottleneck: Search all constant point-like sources. We have to scan the whole DB and for each source, track its history. This means that for each source we have to issue 10x10^6x(N-1) SQL statements ! Forget it. • Beyond the query complexity, the DMS prefetch the rows which are more likely to be read: useless and slow disk activity. • The data must be rearranged • Solution: HTM

  9. HTM (Hierachical Triangular Mesh) • HTM [18] maps triangular regions of the sphere to unique identifiers • The technique for subdividing the sphere in spherical triangles is a recursive process • At each level of the recursion, the area of the resulting triangles is roughly the same • In areas with a larger data density, the recursion process can be applied with a greater level of detail than in areas with lower density • The starting point is a spherical octahedron which identifies 8 spherical triangles of equal size

  10. The term quadtree is used to describe a class of hierarchical data structures based on the principle of a fast recursive decomposition of the space. • Sky tessellation with various mapping functions have been proposed. It is a matter of fact that the astronomical community is accepting the HTM and HEALPix (Hierarchical, Equal Area, and iso-Latitude Pixelisation) schema as the default for object catalogues and for maps visualization and analysis, respectively. HEALPix gives a hierarchical iso-area and iso-latitude tessellation of the sphere and so are convenient for harmonic data analysis on the sphere (densities, integrals, spherical harmonics, Fourier transforms,etc.,)

  11. Using a 64 bit long integer to store the index IDs leads to a limit for the pixels size of about 7.7 and 0.44 milli-arcsec on a side for HTM and HEALPix, respectively. Being able to quickly retrieve the list of objects in a given sky region is crucial in several projects.

  12. Application to the ILMT • The indexing scheme does not have to go as deep as the actual pixel resolution • Each triangle is linked to its own database table (the GSC approach) • SQL queries involving searches based on RA,DEC (+range) quickly provide the • HtmId(s) which in turn is used to build the table_name(s) to be accessed. • We can dynamically choose a triangle surface coverage depending on the • maximum number of sources (e.g. max 100 sources per triangle) • One single SQL statement returns a cursor with the whole source_id history • The problem is now: for each triangle database table, select distinct all • source_id and fetch history rows ~ 10x10^6 SQL runs ! Manageable  • The indexing C++ software is freely available at: • http://www.sdss.jhu.edu/

More Related