470 likes | 595 Vues
GRID Applications. Tu ğba Taşkaya Temizel 20 February 2006. Problems Where Grids Have Been Successful. Megacomputing problems: The problems are divided into parallel independent parts. Mega and seamless access problems: Integrate access, Use of multiple data and resources.
E N D
GRID Applications Tuğba Taşkaya Temizel 20 February 2006
Problems Where Grids Have Been Successful • Megacomputing problems: The problems are divided into parallel independent parts. • Mega and seamless access problems: Integrate access, Use of multiple data and resources. • Loosely coupled nets: Functionally decomposed sequential problems.
Grid Applications • Community centric: Get the organisations together for collaboration. • Data-centric: Integration of multiple resources • Compute-centric: Certain coupled applications and seamless access to multiple back-end hosts • Interaction-centric: Corresponds to problems requiring real-time responses
Application Fields • Astronomy • Bioinformatics • Environmental Science • Particle physics • Medicine and Health • Social Sciences • Combinatorial Chemistry • ….
ASTRONOMYVirtual Observatory TOTAL BUDGET : $ 20 million (US) DURATION : 2002-2005 TYPE : INTERNATIONAL URL : http://www.ivoa.net
ASTRONOMYVirtual Observatory • Objective:To facilitate the international coordination and collaboration necessary for the development and deployment of the tools, systems and organizational structures necessary to enable the international utilization of astronomical archives as an integrated and interoperating virtual observatory.
ASTRONOMYVirtual Observatory • Data creators • create the data and store in archive • describe process of data creation in standard modelling terms • describe data products according to IVOA standards • implement automated publication and registration mechanism • Data providers: • enable web access to archives • choose data products to be published • register data products with IVOA • support discovery/query services on data products • support federation • Service providers: • implement data discovery/query/analysis/creation services • enable web access to results of these services
ASTRONOMYVirtual Observatory • Problems: • One common data format structure: Translation mechanisms exist. Each data provider should advertise their data format. HDF5 format is proposed has been proposed recently to overcome this difficulty. • Query services: Basic queries (query for specific data product) have been provided but more complex queries are needed for theoretical results. • Simulators: Algorithms that create new data, from previously published data resources • Modelling/Describing Simulations: Right classification of simulations (classification in terms of subject, type, implementation choice, data product.
ASTRONOMYVirtual Sky PARTNERS: Caltech Center for Advanced Computing Research Johns Hopkins University the Sloan Sky Survey Microsoft Research PORTED TO TERAGRID URL : http://virtualsky.org
ASTRONOMYVirtual Sky • Provides seamless, federated images of the night sky; not just an album of popular places, but also the entire sky at multiple resolutions and multiple wavelengths • Federates many different image sources into a unified interface • Architecture is based on a hierarchy of precomputed image tiles(mosaic), so that response is fast.
ASTRONOMYVirtual Sky • Problem: Demand for high computational power for resampling the raw images. For each pixel of the image, several projections from pixel to sky and the same number of inverse projections are required. • Problem: Federation of the heterogeneous image resources causes a loss of information
ASTRONOMYMONTAGE Partners: California Institute of Technology, Nasa, Caltech University Duration: 2002-2005 URL : http://montage.ipac.caltech.edu/ PORTED TO TERAGRID
ASTRONOMYMONTAGE • Comprehensive mosaicking system that allows broad choice in the resampling and photometric algorithms • Offer simultaneous, parallel processing of multiple images to enable fast, deep, robust source detection in multi-wavelength image space.
ASTRONOMYMONTAGE • Data fetched from the most convenient place • Computing is done at any available platform • Replica Management: Intermediate products are cached for reuse • Virtual Data: User specifies the desired data using domain specific attributes and not by specifying how to derive the data
ASTRONOMYQUEST Partners: Yale University, Indiana University, Centro de Investigaciones de Astronomía, Universidad de Los Andes URL : http://hepwww.physics.yale.edu/www_info/astro/quest.html
ASTRONOMYQUEST Objectives: • Transient gravitational lensing: This will lead to a better understanding of the nature of the non-luminous mass of the Galaxy. • Quasar gravitational lensing: At much larger scales than our Galaxy, the Quest team hopes to detect strong lensing of very remote objects such as quasars. • Supernovae: The Quest system will be able to detect large numbers of very distant supernovae, leading to prompt follow-up observations, and a better understanding of supernova classification, as well as their role as standard candles for understanding the early Universe. • Gamma-ray burst (GRB) afterglows: Quest will search for these fading sources, and try to correlate them with known GRBs.
ASTRONOMYQUEST Architecture:
COMBINATORIAL CHEMISTRYCOMB-E-CHEM • Partners: Southampton Chemistry Department, Mathematics, ECS, Bristol Chemistry with backing Pfizer, Roche and IBM • £2.2M project • Started in 2001 • National e-science Pilot project • URL: http://www.combechem.org
COMBINATORIAL CHEMISTRYCOMB-E-CHEM • Objective: Develop new ways of collaborative working over the Grid to handle the hugely increasing flow of information on molecular and crystal structures arising from the application of Combinatorial Chemistry. • Facilitate the understanding of how molecular structure influences the crystal and material properties.
HIGHER ENERGY PHYSICSGoals • Find the mechanism responsible for mass in the universe, and the “Higgs” particles associated with mass generation, as well as the fundamental mechanism that led to the predominance of matter over antimatter in the observable cosmos.
HIGHER ENERGY PHYSICS Challenges • Providing rapid access to data subsets drawn from massive data stores , rising from petabytes in 2002 to ~100 petabytes by 2007, and exabtes (1018 bytes) by approximately 2012 to 2015. • Providing secure, efficient, and transparent managed access to heterogeneous worldwide-distributed computing and data-handling resources, across an ensemble of networks of varying capability, and reliability.
HIGHER ENERGY PHYSICS Challenges • Tracking the state and usage patterns of computing and data resources in order to make possible rapid turnaround as well as efficient utilisation of global resources • Providing the collaborative infrastructure that will make it possible for physicists to contribute effectively. • Building regional, national, continental, and transoceanic networks, with bandwidths rising from the gigabit per second to the terabit per second range over the next decade.
HIGHER ENERGY PHYSICS Grid projects • PPDG (Particle Physics Data Grid) • GriPhyN (Grid Physics Network) • iVDGL (International Virtual Data Grid Laboratory) • DataGrid • LCG (Large Hadron Collider Computing Grid) • CrossGrid
HIGHER ENERGY PHYSICS PPDG (Particle Physics Data Grid) • Formed in 1999 • Objective: To address the need for Data Grid services to enable the worldwide-distributed computing model of current and future high-energy and nuclear physics experiments. • URL: www.ppdg.net
HIGHER ENERGY PHYSICS GriPhyN (Grid Physics Network) • Objective: Focused on the creation of Petabyte Virtual Data Grids that meet the data-intensive computational needs of a diverse community of thousands of scientists spread across the globe. • URL: (http://www.griphyn.org)
HIGHER ENERGY PHYSICSiVDGL(International Virtual Data Grid Laboratory) • The iVDGL is tasked with establishing and utilizing an international Virtual-Data Grid Laboratory (iVDGL) of unprecedented scale and scope, comprising heterogeneous computing and storage resources in the U.S., Europe and ultimately other regions linked by high-speed networks, and operated as a single system for the purposes of interdisciplinary experimentation in grid-enabled, data-intensive scientific computing. • URL: http://www.ivdgl.org/
HIGHER ENERGY PHYSICSGoals • Deploy a Grid laboratory • Support research mission of data intensive experiments • Provide computing and personnel resources at university sites • Provide platform for computer science technology development • Prototype and deploy a Grid Operations Center (iGOC) • Integrate Grid software tools • Into computing infrastructures of the experiments • Support delivery of Grid technologies • Hardening of the Virtual Data Toolkit (VDT) and other middleware technologies developed by GriPhyN and other Grid projects • Education and Outreach • Lead and collaborate with Education and Outreach efforts • Provide tools and mechanisms for underrepresented groups and remote regions to participate in international science projects
Tier1 Tier2 Other HIGHER ENERGY PHYSICSiVDGL Sites (February 2004) SKC Boston U UW Milwaukee Michigan PSU UW Madison BNL Fermilab LBL Argonne Iowa Chicago J. Hopkins Indiana Hampton Caltech ISI Vanderbilt Partners • EU • Brazil • Korea UCSD UF Austin FIU Brownsville
HIGHER ENERGY PHYSICSDataGrid • DataGrid is a project funded by European Union. • The objective is to build the next generation computing infrastructure providing intensive computation and analysis of shared large-scale databases, from hundreds of TeraBytes to PetaBytes, across widely distributed scientific communities. • URL: eu.datagrid.webcern.ch • Duration : 2001- 2003
HIGHER ENERGY PHYSICSLCG(Large Hadron Collider Computing Grid) • The aim to prepare the computing infrastructure for the simulation, processing, and analysis of LHC data for all four of the LHC collaborations. • URL : http://lcgrid.web.cern.ch
Korea Russia UK USA Tier2 Center Tier2 Center Tier2 Center Tier2 Center Institute Institute Institute Institute HIGHER ENERGY PHYSICSGlobal LHC Data Grid Hierarchy CMS Experiment Online System 0.1 - 1.5 GBytes/s CERN Computer Center Tier 0 10-40 Gb/s Tier 1 2.5-10 Gb/s Tier 2 1-2.5 Gb/s Tier 3 Physics caches 1-10 Gb/s ~10s of Petabytes/yr by 2007-8~1000 Petabytes in < 10 yrs? PCs Tier 4
HIGHER ENERGY PHYSICSCrossGrid • Objective: Developing, implementing, and exploiting new Grid components for interactive compute- and data-intensive applications such as simulation and visualization for surgical procedures, flooding crisis team decision-support systems, distributed data analysis in high-energy physics, and air pollution combined with weather forecasting. • URL: www.crossgrid.org
BIOINFORMATICS Challenges • To provide a usable and accessible computational and data management environment • To provide sufficient support services • To ensure that the science performed on the grid constitutes the next generation of advances • To accept feedback from bioinformaticians and to improve the next generation of infrastructure
BIOINFORMATICS Grid Applications • CEPAR(Combinatorial Extension in PARallel) and CEPort – 3D protein structure comparison • Chemport – a quantum mechanical biomedical framework
BIOINFORMATICS Cepar:a computational biology application • A typical protein consists of 300 of one of 20 of amino acid a total of 20300 possibilities. • with 30000 protein chain in PDB (Protein Data Bank), and each pair takes 30s to compare, (30k * 30k /2) *30s size 428 CPU years on one processor. • Strategy: data reduction, data optimization, efficient scheduling CE (Combinatorial Extension) algorithm 1000 CPU of 1.7 Teraflop IBM Blue Horizon solved in few days
BIOINFORMATICS Chemport: a computational chemistry framework • Chemistry computation for general atomic molecular and Electronic Structure System • Computational and functional analysis in biomolecular via classical and quantum mechanical simulation
BIOINFORMATICS eDiamond • A Grid-enabled federated database of annotated mammograms • eDiaMoND is a collaborative project funded through an EPSRC grant and IBM's SUR grant • URL : www.ediamond.ox.ac.uk
BIOINFORMATICS ediamond goals • It has a significantly large distributed database of mammograms (400 cases per site with a majority annotated). • It aligns with and complies with new IT policies for the NHS in that it is secure and wins the confidence of the relevant legal, ethical and NHS Trust IT officers. In addition, the system will follow all known guidelines for the deployment of NHS patient and health records. • It is scalable and is designed in such a way that it could scale to cope conceptually with millions of images spread around the 90+ Breast Care Units in the UK.
BIOINFORMATICS ediamond goals • It is effective in that it is fast, it is useful to the clinicians in the areas of screening, training, epidemiology and computer aided detection, and it is intuitive for the users. • It must be built such that upgrades of platform or image analysis software are graceful. • It is reusable, in that the platform could be used as a foundation for other e-health projects. • It is based on Grid architecture.
Grid Applications What new challenges do these application represent? • Are there new paradigms and problems here?
Case Study: News Service Application • Problem: • The underlying application is to be used by News Service organization whose purpose is to electronically publish news bulletin messages to various subscribers. The News Service organization publishes bulletin messages within various categories, such as Business News, Sports, and Weather.
Case Study: News Service Application • Tasks: • Writers gather news and submit the news bulletins for approval via this application • Editors are informed of any pending bulletins that the writers have submitted. The editors log on to the application, are authenticated by the application and retrieve the pending news. Upon review of the news bulletins, they either approve or disapprove of the news bulletins submitted by the writers. All approved news bulletins are subsequently published by the application to all registered subscribers. • Administrator is responsible for starting and stopping the application and performing other necessary administrative functions. • Service organization allows other business partner organizations to submit news bulletins. Upon receipt of news bulletins from the business partner organizations, the administrator loads the news bulletins into the application for further review by the editor and publishing to the subscribers.
Case Study: News Service Application • The System Context:
Case Study: News Service Application • The use cases:
Case Study: News Service Application • The architecture overview: