1 / 21

Tecnolog í as Grid aplicadas a Ciencias de la Tierra Manuel López-Puertas (IAA, Granada)

Tecnolog í as Grid aplicadas a Ciencias de la Tierra Manuel López-Puertas (IAA, Granada). Page 1. 1ª Reuni ó n e-CA, Granada, 19-20 Junio 2007. EGEE. EU grid project, follow on of DataGrid (2004-2008) 202 sites, 47 countries worldwide 30.000 CPUs, 11 Petabytes, 30.000 concurrent jobs

winningham
Télécharger la présentation

Tecnolog í as Grid aplicadas a Ciencias de la Tierra Manuel López-Puertas (IAA, Granada)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tecnologías Grid aplicadas a Ciencias de la Tierra Manuel López-Puertas (IAA, Granada) Page 1 1ª Reunión e-CA, Granada, 19-20 Junio 2007

  2. EGEE • EU grid project, follow on of DataGrid (2004-2008) • 202 sites, 47 countries worldwide • 30.000 CPUs, 11 Petabytes, 30.000 concurrent jobs • Originally from High Energy and Life Sciences => expanded to many other areas (Geosciences) • Ideal for scientific research where the time and resources are impractical for traditional IT • 25 linked grid projects  More details of EGEE in posters

  3. EGEE Earth Science grid projects • CYCLOPS aims to bridge the gap between Grid and GMES (Global Monitoring for Environment and Security) communities • EU-IndiaGrid, funded by EC: an European and Indian Grid-focused project. • DEISA (200 teraflops of supercomputing infrastructure). Founded by Europe's National Research and Education Networks (NRENs) and the European Commission. • DEGREE (Dissemination and Exploitation of GRids in Earth sciencE) • G-POD (Grid Processing on Demand) (ESRIN, ESA) • Access to ESA multi-mission catalog

  4. WHY EGEE for GEOSCIENCES ? • Grid is very well adapted for Geosciences applications: • For statistical approach –intensive computation/storage • For rapid solution in case of many independent jobs • Sharing and/or processing large datasets in the large-scale European projects

  5. DEGREE: Dissemination and Exploitation of GRids in Earth sciencE • EC Specific Support Action project (10 Institutes) • OBJECTIVES: • Bridge the Earth Science and GRID communities throughout Europe • Ensure that Earth Science requirements are satisfied in next Grid generation • Ensure the integration of emerging technologies for managing Earth Science knowledge • Demonstrate the interest of Grid for Geosciences with scientific results already obtained.

  6. Earth Sciences Applications and Requirements (DEGREE Report) (1/2) • Deal with enormous amount of data (size and number of files) (Envisat: 500 Gb daily) • Large computational needs • Differences from other Science domains: • Deals with Geospatial 4D data • Many different domains • Scattered among all countries and numerous Institutes • Complex work

  7. DEGREE Report on Earth Sciences Applications and Requirements /2/2) Specific requirements: • Reliability (good, well established Quality of Service) • Real time and Instantaneous access • Need to access licensed software (IDL, Matlab, Geocluster, …) • Data policies on input/output data (complicated security requirements) • Data scattered around various institutes, various formats, metadata in various forms =>Data management is essential (Accessibility and Harmonization) is essential => Need for a standardization on Grid service • Earth e-science can be an essential improvement in research

  8. Earth Science domains

  9. EXAMPLES (1) GOME Ozone profiles MERIS Global mosaic • Grid properties/requirements: • Large number of files (~ 40000; ~40000/per algorithm) • Metadata • Complex algorithm • License for IDL • Grid properties/requirements: • Large dataset (size+ # of files) • High security access • Dataflow • Monitoring • License for: Globus GT3/4 & Glite and LCG under testing

  10. EXAMPLES (2) GRIMI-2 (MIPAS) Ozone in polar regions • Grid properties/requirements: • Currently being ported to EGEE • High security and restricted access for data • Licensed software • Grid properties/requirements: • A full reprocessing needs 4 TB of input data and 1 TB output • Full processing time on e.g. 100 nodes: 12 days!. • 2 Grid nodes => an NRT service. • Globus GT3/4 & Glite and LCG under testing

  11. Other examples • SEISSOL, CMT, SpecFem3D, (research into earthquake simulations) • KORBA aquifer • COMSIMM (looking at current and future climate trends) • ICAROS (Chemical Assimilation of Remote Sensing Observations of the Stratosphere) • Space weather (SPIDR)

  12. ES GRID, e-collaborations and SOA portals • Survey: 30 portals • Analyzed: 17. GRID: 8; Data dissemination: 4; Collaborative: 5

  13. VOs in Earth Sciences (EGEE) • For Geosciences: two VOs: • ESR (Earth Science Research) : 50 members in an average, 10 countries, belonging to Academic • EGEODE (Expanding GEOscience On DEmand) : 30 members (~15 from Academy) centred on the use of the software, Geocluster, developed by CGG-Veritas (France). Devoted to: Generic seismic plate form

  14. Summary • Deal with enormous amount of data • Large computational needs =>Grid is very well adapted for Geosciences applications • However, Earth Science community is “very reluctant to deploy their applications” • Not many applications ported. Why? • Differences from other science fields (Geospatial 4D data, multidisciplinary, scattered, complex work)? • Some alternatives (GPOD-ESRIN) seem driven by “costly effective” more than by e-science • Others…. • NEEDS: Data policies (security reqs., standardization) • Earth e-science can be an essential improvement in research

  15. Our Needs: Retrieval of MIPAS spectra

  16. Our (GAPT-IAA) Needs • Earth Observation satellite data (MIPAS/Envisat) • Scientific accurate retrievals (not operational) • Data volume: 100 scans/orbit*14 orbits/day=1400 scans/day. ~4 years => 2 M scans • 20 species from each scan • Pre-processing (at home computers) • Full calculation: • Place a fixed set of large input files (HITRAN database) + • ~20 Mb (input/output) per scan (job) • A facility to send arbitrarily structured jobs to the batch computer (e.g., LSF, SLURM, etc. • Large RAM memory (1-3 Gb) • Compiler restrictions: Sun compiler to Linux Opteron:OK; PPCs, IBM compilers, Intel: Problems • CPU: 10 min(LTE)-5 hours(NLTE) per species & scan (Dec HP Fortran V5.5) • 1 hour * 20*2 M ~ 4500 years!

More Related