560 likes | 691 Vues
This document outlines the Critical Design Review held on November 5-6, 2007, focusing on the PS1 project, which involves a distributed database architecture for managing astronomical data. Key topics include ingest workflows, spatial partitioning, indexing algorithms, and efficient SQL implementations to facilitate cross-matching and nearby searches. The design emphasizes partitioning for improved data volume management and scalability, with distributed partitioned views and custom partitioning strategies for optimal performance. Best practices for handling large datasets and enhancements in SQL Server 2005 are also discussed.
E N D
PS1 PSPSObject Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA
Detail Design • General Concepts • Distributed Database architecture • Ingest Workflow • Prototype
Zones Declination (Dec) Right Ascension (RA) Zones (spatial partitioning and indexing algorithm) • Partition and bin the data into declination zones • ZoneID = floor ((dec + 90.0) / zoneHeight) • Few tricks required to handle spherical geometry • Place the data close on disk • Cluster Index on ZoneID and RA • Fully implemented in SQL • Efficient • Nearby searches • Cross-Match (especially) • Fundamental role in addressing the critical requirements • Data volume management • Association Speed • Spatial capabilities
Zoned Table ZoneID = floor ((dec + 90.0) / zoneHeight) * ZoneHeight = 8 arcsec in this example
SQL CrossNeighbors SELECT * FROM prObj1 z1 JOIN zoneZone ZZ ON ZZ.zoneID1 = z1.zoneID JOIN prObj2 z2 ON ZZ.ZoneID2 = z2.zoneID WHERE z2.ra BETWEEN z1.ra-ZZ.alpha AND z2.ra+ZZ.alpha AND z2.dec BETWEEN z1.dec-@r AND z1.dec+@r AND (z1.cx*z2.cx+z1.cy*z2.cy+z1.cz*z2.cz) > cos(radians(@r))
Partitions • SQL Server 2005 introduces technology to handle tables which are partitioned across different disk volumes and managed by a single server. • Partitioning makes management and access of large tables and indexes more efficient • Enables parallel I/O • Reduces the amount of data that needs to be accessed • Related tables can be aligned and collocated in the same place speeding up JOINS
Partitions • 2 key elements • Partitioning function • Specifies how the table or index is partitioned • Partitioning schemas • Using a partitioning function, the schema specifies the placement of the partitions on file groups • Data can be managed very efficiently using Partition Switching • Add a table as a partition to an existing table • Switch a partition from one partitioned table to another • Reassign a partition to form a single table • Main requirement • The table must be constrained on the partitioning column
Partitions • For the PS1 design, • Partitions mean File Group Partitions • Tables are partitioned into ranges of ObjectID, which correspond to declination ranges. • ObjectID boundaries are selected so that each partition has a similar number of objects.
Distributed Partitioned Views • Tables participating in the Distributed Partitioned View (DVP) reside on different databases which reside in different databases which reside on different instances or different (linked) servers
Concept: Slices • In the PS1 design, the bigger tables will be partitioned across servers • To avoid confusion with the File Group Partitioning, we call them “Slices” • Data is glued together using Distributed Partitioned Views • The ODM will manage slices. Using slices improves system scalability. • For PS1 design, tables are sliced into ranges of ObjectID, which correspond to broad declination ranges. Each slice is subdivided into partitions that correspond to narrower declination ranges. • ObjectID boundaries are selected so that each slice has a similar number of objects.
Detail Design Outline • General Concepts • Distributed Database architecture • Ingest Workflow • Prototype
PS1 Distributed DB system objZoneIndx orphans_l1 Detections_l1 LnkToObj_l1 detections detections objZoneIndx Orphans_ln Detections_ln LnkToObj_ln Linked servers Load Support1 Load Supportn LoadAdmin PartitionsMap Linked servers P1 Pm [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta [Objects_pm] [LnkToObj_pm] [Detections_pm] Meta PS1 PartitionsMap Objects LnkToObj Meta Detections PS1 database Query Manager (QM) Legend Database Full table [partitioned table] Output table Partitioned View Web Based Interface (WBI)
Design Decisions: ObjID • Objects have their positional information encoded in their objID • fGetPanObjID (ra, dec, zoneH) • ZoneID is the most significant part of the ID • It gives scalability, performance, and spatial functionality • Object tables are range partitioned according to their object ID
ObjectID Clusters Data Spatially Dec = –16.71611583 ZH = 0.008333 ZID = (Dec+90) / ZH = 08794.0661 ObjectID = 087941012871550661 RA = 101.287155 ObjectID is unique when objects are separated by >0.0043 arcsec
Design Decisions: DetectID • Detections have their positional information encoded in the detection identifier • fGetDetectID (dec, observationID, runningID, zoneH) • Primary key (objID, detectionID), to align detections with objects within partitions • Provides efficient access to all detections associated to one object • Provides efficient access to all detections of nearby objects
DetectionID Clusters Data in Zones Dec = –16.71611583 ZH = 0.008333 ZID = (Dec+90) / ZH = 08794.0661 DetectID = 0879410500001234567 ObservationID = 1050000 Running ID = 1234567
ODM Capacity 5.3.1.3 The PS1 ODM shall be able to ingest into the ODM a total of • 1.51011 P2 detections • 8.31010 cumulative sky (stack) detections • 5.5109 celestial objects together with their linkages.
PS1 Table Sizes - Monolithic Sizes are in TB
What goes into the main Server Linked servers P1 Pm PS1 PartitionsMap Objects LnkToObj Meta PS1 database Objects LnkToObj Meta PartitionsMap Legend Database Full table [partitioned table] Output table Distributed Partitioned View
What goes into slices Linked servers P1 Pm [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta PS1 PartitionsMap Objects LnkToObj Meta PS1 database [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta PartitionsMap Legend Database Full table [partitioned table] Output table Distributed Partitioned View
What goes into slices Linked servers P1 Pm [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta PS1 PartitionsMap Objects LnkToObj Meta PS1 database [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta PartitionsMap Legend Database Full table [partitioned table] Output table Distributed Partitioned View
Duplication of Objects & LnkToObj • Objects are distributed across slices • Objects, P2ToObj, and StackToObj are duplicated in the slices to parallelize “inserts” & “updates” • Detections belong into their object’s slice • Orphans belong to the slice where their position would allocate them • Orphans near slices’ boundaries will need special treatment • Objects keep their original object identifier • Even though positional refinement might change their zoneID and therefore the most significant part of their identifier
Glue = Distributed Views Linked servers P1 Pm [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta PS1 PartitionsMap Objects LnkToObj Meta Detections PS1 database Detections Legend Database Full table [partitioned table] Output table Distributed Partitioned View
Partitioning in Main Server • Main server is partitioned (objects) and collocated (lnkToObj) by objid • Slices are partitioned (objects) and collocated (lnkToObj) by objid Linked servers P1 Pm PS1 PS1 database Query Manager (QM) Web Based Interface (WBI)
PS1 Table Sizes - Main Server Sizes are in TB
PS1 Table Sizes - Each Slice Sizes are in TB
PS1 Table Sizes - All Servers Sizes are in TB
Detail Design Outline • General Concepts • Distributed Database architecture • Ingest Workflow • Prototype
PS1 Distributed DB system objZoneIndx orphans_l1 Detections_l1 LnkToObj_l1 detections detections objZoneIndx Orphans_ln Detections_ln LnkToObj_ln Linked servers Load Support1 Load Supportn LoadAdmin PartitionsMap Linked servers P1 Pm [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta PS1 PartitionsMap Objects LnkToObj Meta Detections PS1 database Query Manager (QM) Legend Database Full table [partitioned table] Output table Partitioned View Web Based Interface (WBI)
“Insert” & “Update” • SQLInsert and Update are expensive operations due to logging and re-indexing • In the PS1 design, Insert and Update have been re-factored into sequences of: Merge + Constrain + Switch Partition • Frequency • f1: daily • f2: at least monthly • f3: TBD (likely to be every 6 months)
Ingest Workflow X(1”) DZone DXO_1a X(2”) NoMatch Resolve Detect DXO_2a P2PsfFits Orphans P2ToObj ObjectsZ CSV
Ingest @ frequency = f1 Orphans_1 P2ToPsfFits_1 P2ToObj_1 StackToObj P2ToObj Objects 1 11 2 12 13 3 Orphans_1 Stack*_1 P2ToPsfFits_1 Objects_1 P2ToObj_1 ObjectsZ P2ToObj P2PsfFits Metadata+ Orphans SLICE_1 LOADER MAIN
Updates @ frequency = f2 StackToObj P2ToObj Objects 1 11 2 12 3 13 Objects P2ToObj_1 Stack*_1 P2ToPsfFits_1 Orphans_1 Objects_1 SLICE_1 LOADER MAIN Metadata+
Updates @ frequency = f2 StackToObj P2ToObj Objects 1 11 2 12 13 3 Objects Objects_1 P2ToPsfFits_1 Stack*_1 Orphans_1 P2ToObj_1 Objects_1 Objects Metadata+ SLICE_1 LOADER MAIN
Snapshots @ frequency = f3 Objects P2ToObj StackToObj Objects 1 2 3 Snapshot Metadata+ MAIN
Batch Update of a Partition select into 1 1 2 1 2 3 A1 A2 A3 … merged select into … where select into … where select into … where B1 + PK index B2 + PK index B3 + PK index switch switch switch B1
Scaling-out • Apply Ping-Pong strategy to satisfy query performance during ingest 2 x ( 1 main + m slices) [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_p2] [LnkToObj_p2] [Detections_p2] Meta Linked servers P1 P2 Pm P1 [Objects_pm] [LnkToObj_pm] [Detections_pm] [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta P2 P3 Pm-1 Pm PS1 PS1 Detections Detections PartitionsMap Objects LnkToObj Meta PartitionsMap Objects LnkToObj Meta PS1 database Query Manager (QM) Legend Database Duplicate Full table [partitioned table] Partitioned View Duplicate P view
Scaling-out • More robustness, fault-tolerance, and reabilability calls for 3 x ( 1 main + m slices) [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_p2] [LnkToObj_p2] [Detections_p2] Meta Linked servers P1 P2 Pm P1 [Objects_pm] [LnkToObj_pm] [Detections_pm] [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta P2 P3 Pm-1 Pm PS1 PS1 Detections Detections PartitionsMap Objects LnkToObj Meta PartitionsMap Objects LnkToObj Meta PS1 database Query Manager (QM) Legend Database Duplicate Full table [partitioned table] Partitioned View Duplicate P view
Adding New slices SQL Server range partitioning capabilities make it easy • Recalculate partitioning limits • Transfer data to new slices • Remove data from slices • Define an d Apply new partitioning schema • Add new partitions to main server • Apply new partitioning schema to main server
Detail Design Outline • General Concepts • Distributed Database architecture • Ingest Workflow • Prototype
ODM Ingest Performance 5.3.1.6 The PS1 ODM shall be able to ingest the data from the IPP at two times the nominal daily arrival rate* * The nominal daily data rate from the IPP is defined as the total data volume to be ingested annually by the ODM divided by 365. • Nominal daily data rate: • 1.51011 / 3.5 / 365 = 1.2108 P2 detections / day • 8.31010 / 3.5 / 365 = 6.5107 stack detections / day
Number of Objects * “SDSS” includes a mirror of 11.3 < < 30 objects to < 0 Total GB of csv loaded data: 300 GB CSV Bulk insert load: 8 MB/s Binary Bulk insert: 18-20 MB/s CreationStarted: October 15th 2007 Finished: October 29th 2007 (??) Includes • 10 epochs of P2PsfFits detections • 1 epoch of Stack detections
Size of Prototype Database Table sizes are in billions of rows
Size of Prototype Database Table sizes are in GB 9.6 TB of data in a distributed database
Ingest and Association Times Educated Guess Wild Guess