1 / 39

Cyberinfrastructure across the Globe

Cyberinfrastructure across the Globe. Indiana University Computer Science Undergraduate Honors Seminar January 8 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 gcf@indiana.edu http://www.infomall.org.

cford
Télécharger la présentation

Cyberinfrastructure across the Globe

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cyberinfrastructure across the Globe Indiana University Computer Science Undergraduate Honors Seminar January 8 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 gcf@indiana.edu http://www.infomall.org

  2. Abstract • We discuss the role of Cyberinfrastructure (also called e-infrastructure and implemented by Grid technology) in a variety of global activities. These include the linking of researchers and data  world wide in many fields; new generations of digital libraries and tools like Google Scholar; study of ice-sheets at the poles and the dramatic impact of Global warming; the study of earthquakes across the Pacific ocean; the linking of apparel manufacturers in Asia to designers in different continents and the command and control system for the Department of Defense. We discuss these applications and their associated technology.

  3. Why Cyberinfrastructure Useful • Supports distributed science – data, people, computers • Exploits Internet technology (Web2.0) adding management, security, supercomputers etc. • It has two aspects: parallel – low latency (microseconds) between nodes and distributed – highish latency (microseconds) between nodes • Parallel needed to get high performance on individual 3D simulations, data analysis etc.; must decompose problem • Distributed aspect integrates already distinct components • Cyberinfrastructure is in general a distributed collection of parallel systems • Grids are made of services that are “just” programs or data sources packaged for distributed access

  4. e-moreorlessanything and the Grid • ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology • e-Science is about developing tools and technologies that allow scientists to do ‘faster, better or different’ research • Similarly e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. • The growing use of outsourcing is one example • The Grid provides the information technology e-infrastructure for e-moreorlessanything. • A deluge of data of unprecedented and inevitable size must be managed and understood. • People, computers, data and instruments must be linked. • On demand assignment of experts, computers, networks and storage resources must be supported

  5. TeraGrid: Integrating NSF Cyberinfrastructure Buffalo Wisc UC/ANL Cornell Utah Iowa PU NCAR PSC IU NCSA Caltech ORNL USC-ISI UNC-RENCI SDSC TACC TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research. Today 100 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today.

  6. Virtual Observatory Astronomy GridIntegrate Experiments Radio Far-Infrared Visible Dust Map Visible + X-ray Galaxy Density Map

  7. Grid Capabilities for Science • Open technologies for any large scale distributed system that is adopted by industry, many sciences and many countries (including UK, EU, USA, Asia) • Security, Reliability, Management and state standards • Service and messaging specifications • User interfaces via portals and portlets virtualizing to desktops, email, PDA’s etc. • ~20 TeraGrid Science Gateways (their name for portals) • OGCE Portal technology effort led by Indiana • Uniform approach to access distributed (super)computers supporting single (large) jobs and spawning lots of related jobs • Data and meta-data architecture supporting real-time and archives as well as federation • Links to Semantic web and annotation • Grid (Web service) workflow with standards and several successful instantiations (such as Taverna and MyLead) • Many Earth science grids including ESG (DoE), GEON, LEAD, SCEC, SERVO; LTER and NEON for Environment • http://www.nsf.gov/od/oci/ci-v7.pdf

  8. eApparel • Much of the world’s manufacturing industry is globalized and the apparel/textile industry is typical • We are working with Hong Kong Textile Industry to link the Asian manufacturers with design/marketing/purchase functions elsewhere (USA, Europe) • Need to exchange designs, available fabrics and discussions • Good example of e-infrastructure enabling specialization in one geographical area to thrive • Software and digital animation outsourcing are good examples

  9. APEC Cooperation for Earthquake Simulation • ACES is a seven year-long collaboration among scientists interested in earthquake and tsunami predication • iSERVO is Infrastructure to supportwork of ACES • SERVOGrid is (completed) US Grid that is a prototype of iSERVO • http://www.quakes.uq.edu.au/ACES/ • Chartered under APEC – the Asia Pacific Economic Cooperation of 21 economies

  10. Field Trip Data Database ? GISGrid Discovery Services RepositoriesFederated Databases Streaming Data Sensors Database Sensor Grid Database Grid Research Education SERVOGrid Compute Grid Customization Services From Researchto Education Data FilterServices ResearchSimulations Analysis and VisualizationPortal EducationGrid Computer Farm Grid of Grids: Research Grid and Education Grid

  11. SERVOGrid and Cyberinfrastructure • Grids are the technology based on Web services that implement Cyberinfrastructure i.e. support eScience or science as a team sport • Internet scale managed services that link computers data repositories sensors instruments and people • There is a portal and services in SERVOGrid for • Applications such as GeoFEST, RDAHMM, Pattern Informatics, Virtual California (VC), Simplex, mesh generating programs ….. • Job management and monitoring web services for running the above codes. • File management web services for moving files between various machines. • Geographical Information System services • Quaketables earthquake specific database • Sensors as well as databases • Context (dynamic metadata) and UDDI system long term metadata services • Services support streaming real-time data

  12. a Site-specific Irregular Scalar Measurements a Constellations for Plate Boundary-Scale Vector Measurements Ice Sheets a Volcanoes PBO Greenland Long Valley, CA Topography 1 km Stress Change Northridge, CA Earthquakes Hector Mine, CA

  13. Some Grid Concepts I • Services are “just” (distributed) programs sending and receiving messages with well defined syntax • Interfaces (input-output) must be open; innards can be open source (allowing you to modify) or proprietary • Services can be any language from Fortran, Shell scripts, C, C#, C++, Java, Python, Perl – your choice!! • Web Services supported by all vendors (IBM, Microsoft …) • Service overhead will be just a few milliseconds (more now) which is < typical network transit time • Any program that is distributed can be a Web service • Any program taking execution time ≥ 20ms can be an efficient Web service

  14. Web services • Web Services build loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles. • Web Services interact by exchanging messages in SOAPformat • The contracts for the message exchanges that implement those interactions are described via WSDL interfaces.

  15. Some Grid Concepts II • Systems are built from contributions from many different groups – you do not need one “vendor” for all components as Web services allow interoperability between components • One reason DoD likes Grids (called Net-Centric computing) • Grids are distributed in services and data allowing anybody to store their data and to produce “their” view • Some think that University Library of future will curate/store data of their faculty • “2 level programming model”: Classic programming of services and services are composed using workflow consistent with industry standards (BPEL) • Grid of Grids: (System of Systems) Realistically Grid-like systems will be built using multiple technologies and “standards” –integrate separate Grids for Sensors, GIS, Visualization, computing etc. with OGSA (Open Grid Service Architecture from OGF) system Grid (Security, registry) into a single Grid • Existing codes UNCHANGED; wrap as a service with metadata

  16. TeraGrid User Portal

  17. LEAD Gateway Portal NSF Large ITR and Teragrid Gateway - Adaptive Response to Mesoscale weather events - Supports Data exploration,Grid Workflow

  18. Use a Portlet-based user portal to access and control services and workflow Grid Workflow Data Assimilation in Earth Science • Grid services triggered by abnormal events and controlled by workflow process real time data from radar and high resolution simulations for tornado forecasts

  19. SERVOGrid has a portal The Portal is built from portlets – providing user interface fragments for each service that are composed into the full interface – uses OGCE technology as does planetary science VLAB portal with University of Minnesota

  20. Portlets v. Google Gadgets • Portals for Grid Systems are built using portlets with software like GridSphere integrating these on the server-side into a single web-page • Google (at least) offers the Google sidebar and Google home page which support Web 2.0 services and do not use a server side aggregator • Google is more user friendly! • The many Web 2.0 competitions is an interesting model for promoting development in the world-wide distributed collection of Web 2.0 developers • I guess Web 2.0 model will win!

  21. GIS and Sensor Grids • OGC has defined a suite of data structures and services to support Geographical Information Systems and Sensors • GML Geography Markup language defines specification of geo-referenced data • SensorML and O&M (Observation and Measurements) define meta-data and data structure for sensors • Services like Web Map Service, Web Feature Service, Sensor Collection Service define services interfaces to access GIS and sensor information • Grid workflow links services that are designed to support streaming input and output messages • We built Grid (Web) service implementations of these specifications for NASA’s SERVOGrid • Use Google maps as front end to WMS and WFS

  22. Streaming Data Support Transformations Data Checking Hidden MarkovDatamining (JPL) Display (GIS) Grid Workflow Datamining in Earth Science NASA GPS • Work with Scripps Institute • Grid services controlled by workflow process real time data from ~70 GPS Sensors in Southern California Earthquake

  23. Tornado Grid Security Notification Workflow Messaging Ice Sheet PolarGrid Earthquake SERVOGrid … … Ice Sheet Sensors, SAR, Filters, EM, Glacier Simulations Earthquake Data, Filters & Simulation Services Portals Collaboration Grid Visualization Grid Sensor Grid GIS Grid Compute Grid Data Access/Storage Registry Metadata Core Grid Services Physical Network Earth/Atmosphere Grids built as Grids of (library) Grids

  24. Community Tools • e-mail and list-serves are oldest and best used • Kazaa, Instant Messengers, Skype, Napster, BitTorrent for P2P Collaboration – text, audio-video conferencing, files • del.icio.us, Connotea, Citeulike, Bibsonomy, Biolicious manage shared bookmarks • MySpace, YouTube, Bebo, Hotornot, Facebook, or similar sites allow you to create (upload) community resources and share them; Friendster, LinkedIn create networks • http://en.wikipedia.org/wiki/List_of_social_networking_websites • Writely, Wikis and Blogs are powerful specialized shared document systems • ConferenceXP and WebEx share general applications • Google Scholar tells you who has cited your papers while publisher sites tell you about co-authors • Windows Live Academic Search has similar goals • Note sharing resources creates (implicit) communities • Social network tools study graphs to both define communities and extract their properties • Mashups link resources together (federation/workflow)

  25. Mashups and Grids • http://www.programmableweb.com • There are 281 “commodity” service Web 2.0 API’s on October 1 06 (356 Jan 9 07) • Mashups are composed from JavaScript, AJAX and REST and not usually BPEL WSDL and SOAP; Google Gadgets not portlets • Architecture of Mashups and Grids “identical” • See Amazon S3 Storage and EC2 ElasticComputing services • Mashups enable everybody to contribute

  26. Mashups using GoogleMaps Mashup Matrix

  27. Indiana Map Mash-up GIS Grid of “Indiana Map” and ~10 Indiana counties with accessible Map (Feature) Servers from different vendors. Grids federate different data repositories (cf Astronomy VO federating different observatory collections

  28. eSports? • YouTube illustrates asynchronous video sharing and video conferencing illustrates synchronous video sharing • One can link trainers (or spectators) and athletes globally with real time video supporting video and text annotation • Technically hard due to network issues and allowing real-time playing of annotated video • Exploring with China • Note IU could export coaching in Soccer, Basketball etc • Example of Cyberinfrastructure supporting geographically distributed specialization

  29. Minority Serving Institutions and the Grid • Historically the R1 Research University powerhouses dominated research due to their concentration of expertise • Cyberinfrastructure allows others to participate in same way it supports distributed open source software and distributed Web 2.0 • Navajo Nation (Colorado Plateau covering over 25,000 square miles in northeast Arizona, northwest New Mexico, and southeast Utah) with 110 communities and over 40% unemployment. Building a wireless grid for education, healthcare • http://www.win-hec.org/ World Indigenous Nations Higher Education Consortium • Cyberinfrastructure allows Nations to preserve their geographical identity but participate fully with world class jobs and research • Some 335 MSI’s in Alliance have similar hopes for Cyberinfrastructure to jump start their advancement!

  30. Typical Illustration of effect of Climate Change on Greenland: Velocity of Jakobshavn from 1995 to 2005 as a function of distance from its end Example: Setting up a Polar CI-Grid • The North and South poles are melting with potential huge environmental impact • As a result of MSI meetings, I am working with MSI ECSU in North Carolina and Kansas University to design and set up a Polar Grid (Cyberinfrastructure) • This is a network of computers, sensors (on robots and satellites), data and people aimed at understanding science of ice-sheets and impact of global warming • We have changed the 100,000 year Glacier cycle into a ~50 year cycle; the field has increased dramatically in importance and interest • Good area to get involved in as not so much established work

  31. PolarGrid • Important Polar Grid Cyberinfrastructure components include • Managed data from sensors and satellites • Data analysis such as SAR processing – possibly with parallel algorithms • Electromagnetic simulations (currently commercial codes) to design instrument antennas • 3D simulations of ice-sheets (glaciers) with non-uniform meshes • GIS Geographical Information Systems • Also need capabilities present in many Grids • Portal i.e. Science Gateway • Submitting multiple sequential or parallel jobs • Power/Bandwidth Challenged Expedition Grids

  32. Polar Expeditions F F F B B B F F F Archival – High Latency IU Adaptorlayer Educationand Training Real Time Monitor Low Bandwidth ECSUHaskell Core simulationand Data analysis Field Base Camps ECSU IU Existing IU Real Time Monitor Low Bandwidth Existing CRESIS Archival – High Latency TeraGrid Other Polar Sensors andSensor Aggregators (Non-polar and Polar Sites) OSG Prototype Base/Field Grid

  33. MyResearchDatabase Bibliographic Database Web serviceWrappers Document-enhanced Cyberinfrastructure Del.icio.us Windows Live Academic Search TraditionalCyberinfrastructure Export:RSS, BibtexEndnote etc. CiteULike Google Scholar Connotea Citeseer Bibsonomy Science.gov Biolicious PubChem Generic Document Tools CMT ConferenceManagement PubMed Manuscript Central Community Tools Integration/Enhancement User Interface etc. Existing User Interface New Document-enhanced Research Tools Existing Documentbased Research Tools

  34. Delicious Semantic Web/Grid • http://del.icio.us purchased by Yahoo for ~$30M • http://www.CiteULike.org • http://www.connotea.org (Nature) • Associate metadata with Bookmarks specified by URL’s, DOI’s (Digital Object Identifiers) • Users add comments and keywords (called tags) • Users are linked together into groups (communities) • Information such as title and authors extracted automatically from some sites (PubMed, ACM, IEEE, Wiley etc.) • Bibtex like additional information in CiteULike • This is perhaps de facto Semantic Web – remarkable for its simplicity

  35. Connotea queried by SERVOGrid

  36. Document-enhanced Cyberinfrastructureaka Semantic Scholar Grid I • Citeseer and Google Scholar scour the Internet and analyze documents for incidental metadata • Title, author and institution of documents • Citations with their own metadata allowing one to match to other documents • Science.gov extracts metadata from lots of US Government databases • These capabilities are sure to become more powerful and to be extended • Give “Citation Index” in real time • Tell you all authors of all papers that cite a paper that cites you etc. (Note it’s a small world so don’t go too far in link analysis) • Tell you all citations of all papers in a workshop

  37. Document-enhanced Cyberinfrastructureaka Semantic Scholar Grid II • It is natural to develop core document Servicessuch as those used in Citeseer/Google Scholar but applied to “your” documents of interest that may not have been processed yet • As just submitted to a conference perhaps • These tools can help form useful lists such as authors of all cited or submitted papers to a journal • OSCAR2/3 (from Peter Murray-Rust’s group at Cambridge) augment the application independent “core” metadata (Title, authors, institutions, Citations) with a list of all chemical terms • This tool is a Service that can be applied to “your” document or to a set of documents harvested in some fashion • Other fields have natural application specific metadata and OSCAR like tools can be developed for them • Such high value tools could appear on “publisher” sites of future (or else publishers will disappear)

More Related