130 likes | 260 Vues
This document outlines the comprehensive data management strategies employed by the Computational Research Division at Berkeley Lab, focusing on the significant challenges posed by data volume and processing. It details the data volume generated by the Planck project, highlighting the efficiency of NERSC's high-performance computing systems like Seaborg and Jacquard. Key issues include data storage solutions, security protocols, and rapid data transfer methods. The update on Project Columbia emphasizes the need for reliable data analysis infrastructure, aiming to optimize the research capabilities of scientists.
E N D
Computing Facilities & Capabilities Julian Borrill Computational Research Division, Berkeley Lab & Space Sciences Laboratory, UC Berkeley
Computing Issues • Data Volume • Data Processing • Data Storage • Data Security • Data Transfer • Data Format/Layout Its all about the data
Data Volume • Planck data volume drives (almost) everything • LFI : • 22 detectors with 32.5, 45 & 76.8 Hz sampling • 4 x 1010 samples per year • 0.2 TB time-ordered data + 1.0 TB full detector pointing data • HFI : • 52 detectors with 200 Hz sampling • 3 x 1011 samples per year • 1.3 TB time-ordered data + 0.2 TB full boresight pointing data • LevelS (e.g. CTP “Trieste” simulations) : • 4 LFI detectors with 32.5 Hz sampling • 4 x 109 samples per year • 2 scans x 2 beams x 2 samplings x 7 components + 2 noises • 1.0 TB time-ordered data + 0.2 TB full detector pointing data
Data Processing • Operation count scales linearly (& inefficiently) with • # analyses, # realizations, # iterations, # samples • 100 x 100 x 100 x 100 x 1011 ~ O(10) Eflop (cf. '05 Day in the Life) • NERSC • Seaborg : 6080 CPU, 9 Tf/s • Jacquard : 712 CPU, 3 Tf/s (cf. Magique-II) • Bassi : 888 CPU, 7 Tf/s • NERSC-5 : O(100) Tf/s, first-byte in 2007 • NERSC-6 : O(500) Tf/s, first-byte in 2010 • Expect allocation of O(2 x 106) CPU-hours/year => O(4) Eflop/yr (10GHz CPUs @ 5% efficiency) • USPDC cluster • Specification & location TBD, first-byte in 2007/8 • O(100) CPU x 80% x 9000 hours/year => O(0.4) Eflop/yr (5GHz CPUs @ 3% efficiency) • IPAC small cluster dedicated to ERCSC
Processing 9 Tf/s NERSC Seaborg 3 Tf/s NERSC Jacquard 7 Tf/s NERSC Bassi 0.1 Tf/s ERCSC Cluster 0.5 Tf/s USPDC Cluster 100 Tf/s NERSC 5 (2007) 500 Tf/s NERSC 6 (2010)
Data Storage • Archive at IPAC • mission data • O(10) TB • Long-term at NERSC using HPSS • mission + simulation data & derivatives • O(2) PB • Spinning disk at USPDC cluster & at NERSC using NGF • current active data subset • O(2 - 20) TB • Processor memory at USPDC cluster & at NERSC • running job(s) • O(1 - 10+) GB/CPU & O(0.1 - 10) TB total
Processing + Storage 9 Tf/s 6 TBNERSC Seaborg 2/20 PB NERSC HPSS 3 Tf/s 2 TBNERSC Jacquard 10 TB IPAC Archive 20/200 TB NERSC NGF 7 Tf/s 4 TB NERSC Bassi 0.1 Tf/s 50 GBERCSC Cluster 2 TB USPDC Cluster 0.5 Tf/s 200 GB USPDC Cluster 100 Tf/s 50 TB NERSC-5 (2007) 500 Tf/s 250 TB NERSC-6 (2010)
Data Security • UNIX filegroups • special account : user planck • permissions _r__/___/___ • Personal keyfob to access planck acount • real-time grid-certification of individuals • keyfobs issued & managed by IPAC • single system for IPAC, NERSC & USPDC cluster • Allows securing of selected data • e.g. mission vs simulation • Differentiates access to facilities and to data • standard personal account & special planck account
Processing + Storage + Security PLANCK KEYFOB REQUIRED 9 Tf/s 7 TB NERSC Seaborg 2/20 PB NERSC HPSS 3 Tf/s 2 TB NERSC Jacquard 10 TB IPAC Archive 20/200 TB NERSC NGF 7 Tf/s 4 TB NERSC Bassi 0.1 Tf/s 50 GB ERCSC Cluster 2 TB USPDC Cluster 0.5 Tf/s 200 GB USPDC Cluster 100 Tf/s 50 TB NERSC-5 (2007) 500 Tf/s 250 TB NERSC-6 (2010)
Data Transfer • From DPCs to IPAC • transatlantic tests being planned • From IPAC to NERSC • 10 Gb/s over Pacific Wave, CENIC + ESNet • tests planned this summer • From NGF to/from HPSS • 1 Gb/s being upgraded to 10+ Gb/s • From NGF to memory (most real-time critical) • within NERSC • 8-64 Gb/s depending on system (& support for this) • offsite depends on location • 10Gb/s to LBL over dedicated data link on Bay Area MAN • fallback exists : stage data on local scratch space
Processing + Storage + Security + Networks PLANCK KEYFOB REQUIRED 9 Tf/s 7 TB NERSC Seaborg 2/20 PB NERSC HPSS 8 Gb/s 3 Tf/s 2 TBNERSC Jacquard 10 Gb/s 10 TB IPAC Archive 20/200 TB NERSC NGF DPCs 10 Gb/s ? 10 Gb/s 10 Gb/s 7 Tf/s 4 TB NERSC Bassi ? ? ? ? 64 Gb/s 0.1 Tf/s 50 GB ERCSC Cluster 2 TB USPDC Cluster 0.5 Tf/s 200 GBUSPDC Cluster 100 Tf/s 50 TB NERSC-5 (2007) 500 Tf/s 250 TB NERSC-6 (2010) ?
Project Columbia Update • Last year we advertised our proposed use of NASA's new Project Columbia (5 x 2048 CPU, 5 x 12 Tf/s), potentially including a WAN-NGF. • We were successful in pushing for Ames' connection to the Bay Area MAN, providing a 10Gb/s dedicated data connect. • We were unsuccessful in making much use of Columbia: • disk read performance varies from poor to atrocious, effectively disabling data analysis (although simulation is possible). • foreign nationals are not welcome, even if they have passed JPL security screening ! • We have provided feedback to Ames and HQ, but for now we are not pursuing this resource.
Data Formats • Once data are on disk they must be read by codes that do not know (or want to know) their format/layout: • to analyze LFI, HFI, LevelS, WMAP, etc data sets • both individually and collectively • to be able to operate on data while it is being read • e.g. weighted co-addition of simulation components • M3 provides a data abstraction layer to make this possible • Investment in M3 has paid huge dividends this year: • rapid (10 min) ingestion of new data formats, such as PIOLIB evolution and WMAP • rapid (1 month) development of interface to any compressed pointing, allowing on-the-fly interpolation & translation • immediate inheritance of improvements (new capabilities & optimization/tuning) by the growing number of M3-based codes