1 / 47

Grid Computing: Harnessing Underutilized Resources

Grid Computing: Harnessing Underutilized Resources. UNCW Department of Chemistry & Biochemistry Seminar September 24, 2004 Ned H. Martin. Outline. Definition of Grid computing A brief history of computing Growth of computing power Rationale for Grid computing How a Grid works

kedem
Télécharger la présentation

Grid Computing: Harnessing Underutilized Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Computing: Harnessing Underutilized Resources UNCW Department of Chemistry & Biochemistry Seminar September 24, 2004 Ned H. Martin

  2. Outline • Definition of Grid computing • A brief history of computing • Growth of computing power • Rationale for Grid computing • How a Grid works • Examples of Grid projects • Grid computing in NC • Limitations of Grid computing • UNCW Grid initiative: GridNexus • What’s next?

  3. Definition of Grid Computing • Grid computing is a form of distributed computing that involves coordinating and controlled sharing of diverse computing, applications, data, storage, or network resources across dynamic and geographically dispersed multi-institutional virtual organizations. • A user of Grid computing does not need to have the data and the software on the same computer, and neither must be on the user’s home (login) computer.

  4. Grid Computing • The term Grid computing suggests a computing paradigm similar to an electric power grid - a variety of resources contribute power into a shared "pool" for many consumers to access on an as-needed basis.

  5. Background of Grid Computing • The idea of Grid computing resulted from the confluence of three developments: • The proliferation of largely unused computing resources (especially desktop computers) • Their greatly increased cpu speed in recent years • The widespread availability of fast, universal network connections (the Internet).

  6. Brief History of Computing • 1943: "I think there is a world market for maybe 5 computers." Thomas Watson, chairman of IBM • 1947: Testudo: The very first computer in the Netherlands; the relay-based machine was 5 m long. Adding took 30 s and multiplication 45 s.

  7. Brief History of Computing • 1949: "Computers in the future may weigh no more than 1.5 tons." -Popular Mechanics, forecasting the relentless march of science • 1957: "I have traveled the length and breadth of this country and talked with the best people, and I can assure you that data processing is a fad that won't last out the year." -The business book editor for Prentice Hall.

  8. Brief History of Computing • 1977: "There is no reason anyone would want a computer in their home." -Ken Olson, president, chairman and founder of Digital Equipment Corp. • 1980: "DOS addresses only 1 Megabyte of RAM because we cannot imagine any applications needing more." -Microsoft on the development of DOS. • 1981: "640k ought to be enough for anybody." -Bill Gates

  9. Brief History of Computing • 1979: Introduction of the 8086 chip by Intel; used a 16 bit processor; too expensive, so an 8 bit version was developed (the 8088), which was chosen by IBM for the first IBM PC; available clock frequencies up to 10 MHz. It had an instruction set of about 300 operations. At introduction the fastest processor was the 8 MHz version which achieved 0.8 MIPs (0.8 x 106instructions persecond) and contained 29,000 transistors.

  10. Brief History of Computing • 1982: Intel 80286 released. It supported clock frequencies of up to 20 MHz. At introduction the fastest version ran at 12.5 MHz, achieved 2.7 MIPs and contained 134,000 transistors. • 1985: Intel 80386 DX released. It supported clock frequencies of up to 33 MHz. At the date of release the fastest version ran at 20 MHz and achieved 6.0 MIPs. It contained 275,000 transistors.

  11. Brief History of Computing • 1989: Intel 80486 DX released by Intel. It contained the equivalent of about 1.2 million transistors. At the time of release the fastest version ran at 25 MHz and achieved up to 20 MIPs. Later versions had clock speeds up to 100 MHz. • 1993: Intel Pentium released. At that time it was only available in 60 & 66 MHz versions which achieved up to 100 MIPs, with over 3.1 million transistors.

  12. Brief History of Computing • 1995: Pentium Pro released. At introduction it achieved a clock speed of up to 200 MHz. It achieved 440 MIPs and contained 5.5 million transistors - this was nearly 2400 times as many as the first microprocessor in 1971- and capable of 70,000 times as many instructions per second. • 2004: Pentium 4 chips available with clock speeds of up to 3.6 GHz providing 11,356 MIPS and containing 125,000,000 transistors. • 2005: 500,000,000 transistors !!!

  13. Growth of Computing Power ts/104 2004

  14. Rationale for Grid Computing • The proliferation of largely unused computing resources (especially desktop computers, of which 152 million were sold in 2003). • Their greatly increased cpu speed in recent years (now >3 GHz). • The widespread availability of fast, universal network connections (the Internet).

  15. Rationale for Grid Computing • High performance computers (formerly called supercomputers) are very expensive to buy and maintain. • Much of the enhancement of computing power recently has come through the application of mulltiple cpus to a problem (e.g., NCSC had a 720 processor IBM parallel computer). • Many computing tasks relegated to these (especially massively parallel) computers could be performed by a “divide and conquer” strategy using many more, although slower, processors as are available on a Grid.

  16. How a Grid Works • The term "grid computing" suggests a computing paradigm similar to an electric power grid - a variety of resources contribute power into a shared "pool" for many consumers to access on an as-needed basis • Ideally the user does not know or care where the computing operation is being performed; the process is invisible to the user. • Middleware handles security, authentication, authorization, resource selection and routing of input and output seamlessly.

  17. Examples of Grid Projects • SETI@home • DNet (distributed.net) • GRID.ORG (anti-cancer ligand screening) • IBM Smallpox cure • Entropia.org • CERN

  18. Grid Projects: SETI@home • SETI@home • A large-scale search through data gathered by radiotelescopes in P.R. for evidence of extraterrestrial life • Involved more than 3 million computers averaging about 14 TeraFLOPS, or 14 trillion floating point operations per second, • Utilized over 500,000 years of processing time in the past year and a half.

  19. Grid Projects: DNet • DNet (distributed.net) • Began in 1997 as the first general-purpose distributed computing network on the Internet • Highly successful in bringing individuals together to complete cryptographic challenges via a distributed environment. • Equivalent to more than 160,000 PII 266Mhz computers working 24 hours a day, 7 days a week, 365 days a year! • The core distributed.net development team joined United Devices in 2000.

  20. Grid Projects: GRID.ORG • The United Devices Cancer Research Project (GRID.ORG) will advance research to uncover new cancer drugs through the combination of chemistry, computers, and specialized software. • The research centers on proteins that have been determined to be a possible target for cancer therapy. Through a process called "virtual screening", LigandFit docking software by Accelrys identifies molecules that interact with these proteins, and determines which ones have a high likelihood of being developed into a drug. • In the first year and a half, over 3.5 million drug candidates were screened using over a million personal computers.

  21. Grid Projects: Smallpox Cure • Smallpox cure • To help find a cure for smallpox, IBM and a group of partners harnessed the processing power of 2 million idle PCs. They then screened 35 million drug compounds and smallpox proteins to find the most effective cure.

  22. Grid Projects: Entropia • In 1997, Entropia applied idle computers worldwide to problems of scientific interest. In just two years, this network grew to encompass 30,000 computers with an aggregate speed of over one teraflop per second. Among its several scientific achievements is the identification of the largest known prime number.

  23. Grid Projects: CERN • CERN • By 2005, detectors at the Large Hadron Collider at CERN, the European Laboratory for Particle Physics will produce several petabytes of data per year - a million times the storage capacity of a desktop computer • Just the basic data analysis requires 20 tflops/s of computing power (the fastest supercomputer produces 3 teraflops per second). • more sophisticated analyses will need orders of magnitude more computing power

  24. Grid Computing in NC • NCBioGrid (www.ncbiogrid.org/), an outgrowth of the High Performance Computing and Data Storage Focus Group of the NC Genomics and Bioinformatics Consortium • NC Computing Grid – now includes 7 universities plus MCNC; UNCW will be joining soon • UNCW Grid – started as a grid for UNCW bioinformatics/genomics research, expanded now into chemistry and business applications.

  25. Limitations of Grid Computing • Currently, although efforts are being made to standardize protocols (e.g., Globus toolkit and Avaki), interacting with Grid services remains a complex process. • Most of the existing applications that access Grid services require the user to type cumbersome commands, often using a command-line interface. • Creating new clients and services requires programming in a language such as C or Java and using a host of libraries for interacting with Open Grid Services Infrastructure, Grid Security Infrastructure, Web Services Description Language and other standards.

  26. Limitations of Grid Computing • These tools and techniques are useful to a select group of computing specialists; however the only way to make Grid resources accessible to a wide range of users is to provide a relatively simple graphical user interface (GUI). • The UNCW Grid project proposes to develop a Graphical Grid User Interface that is easy to use and can access a wide range of applications. • Our hope is to create an interface to Grid computing that accomplishes what Internet browsers (Netscape and Internet Explorer) did to open up the WWW .

  27. UNCW Grid Initiative: GridNexus • This initiative grew in part out of a need for HPC resources following the closure of the NCSC in June 2003, coupled with the availability of faculty with software programming expertise and others with computing applications that could benefit from use of a Grid. • The UNC-OP funded UNCW’s proposal for $557,634 over two years to develop Grid portals (GUI middleware to allow users to access software on computers on a Grid).

  28. UNCW Grid Initiative: GridNexus • The UNCW Grid Computing Project is a two-year collaborative project among a multi-discipline, multi-investigator core research team at UNCW and several discipline-focused researchers at partner institutions: NCSU, WCU, NCCU, ECU, and CFCC. The research areas and institutional interests of this project are: • Advanced Grid Software Development (UNCW) • Computational Chemistry (UNCW and ECU) • Bioinformatics (UNCW, NCSU, and NCCU) • Combinatorics (UNCW) • Business Computing (UNCW and NCCU) • Education and Training (UNCW, WCU, CFCC) • This project proposes to develop a Grid interface that is easy-to-use and may be used by a wide-range of applications and users. We have developed an innovative graphical user interface (GUI) for grid applications. In particular, we introduced a new scripting language (JXPL) designed for web-based services, a GUI for creating scripts, and have demonstrated the use of these tools with grid services.

  29. UNCW Grid Initiative: GridNexus • UNCW’s initiative is unique in that it involves undergraduate students as the main players in the development of the Grid portal (GUI). • Undergraduate computer science students are partnered with faculty and students in application areas (chemistry, biology, business) to develop graphical front-ends to access services (programs) on computers on the Grid. • Grid portals are being developed for the two computational chemistry programs (Gaussian 03 and DMol ) most often used in research by our faculty and students.

  30. Resources of UNCW Grid • Beowulf cluster – 16 PIII processors in Computer Sciences Department • Fire and FireDev servers plus disc storage devices • PQS Quantum Cube – 8 cpu cluster with PQS and Gaussian 03 computational chemistry software, plus TCP-Linda environment. • An 8 processor IBM blade cluster with 0.5 tB disk storage will be added soon. • Other computers may be added, including the possibility of using all computing lab computers, or possibly even all faculty/staff computers (when not in use).

  31. Remote Computing before Grid Now, to submit a quantum chemistry calculation to a remote computer, e.g., at NCSU, one must: • Telnet to remote computer, login (separate login and password for each user account and for each computer) • FTP input data file from local computer to remote machine (requires login, password) • Create and edit an input file for job (using vi or other text editor) • Create a .job file, edit it if necessary • Select queue based on # cpus and time required; submit .job file • Check progress of calculation by periodically: telnet to remote machine; look for file that indicates completion of job. • FTP output file to local computer • Open output file in text editor, examine numerical data • Open output file in a commercial program on local computer to visualize structure

  32. Remote Computing on a Grid In the future, using Grid middleware to submit a quantum chemistry calculation to a remote computer at NCSU: • Login to Grid (single user login and password to access ANY Grid resource) • Select a data file and job parameters from pull-down menus; click to submit (.input and .job file is created automatically by Grid middleware, job is submitted automatically to an appropriate available computer) • Upon completion of computation, output file is automatically sent to local computer to visualize structure (which can also be automated).

  33. Development of a Grid Portal • The objective is to make accessing HPC resources (wherever they may be located) easy to scientists who are not computer savvy. • Most computation involves doing various mathematical operations on a dataset. • A GUI approach is employed, in which the user, after a single login that checks authentication and authorization, can create a ‘workflow’ of functions/operations graphically by connecting boxes dragged from a series of lists of options, then applying that series of steps to a dataset. • Such a ‘workflow’ can be saved for subsequent application to another dataset.

  34. Development of a Grid Portal • Job submission: Ideally in a grid, the grid middleware should select the ‘best’ resource – those computers that are available, capable, and have the software needed to handle the job. • The user need not select – nor know – where the computation is taking place. In fact, the job may even be passed from one computer to another for various aspects of the calculation. • The output is returned to the user’s workstation or account, rather than the user having to access and download the output file from a remote computer.

  35. UNCW’s Grid Portal: GridNexus • 3 main application types: genomics/ bioinformatics, business and chemistry • Chemistry resources on UNCW Grid: • PQS Quantum Cube – 8 cpu cluster with PQS and Gaussian 03 computational chemistry software and TCP-Linda • Beowulf Cluster – 16 cpu cluster with Gaussian 03 computational chemistry software and TCP-Linda • Soon to be added: IBM blade server with 8 or 16 cpus; Gaussian 03 will be installed on it. • Java script for file transformation…e.g., to convert HyperChem file into a Gaussian 03 input file

  36. Quantum Chemistry Portal • A GUI is under development to allow a user to select the following from pull-down menus within ‘boxes’ that are linked into a ‘workflow’: • Data input file • Transform to another file type if necessary • Level of calculation: HF, DFT, MP2, etc. • Basis set: 6-31G(d,p), 6-311++G(2d,p), etc. • Number of processors needed • CPU time requested • Keywords: opt, nmr, freq, pop=npa, etc. • Charge and multiplicity

  37. Design of UNCW Grid GUI • Select from pull-down menus in categories: Basis Set Data sets (Windows Explorer-like file browser) Level of Theory CPU Time # Processors Chg. & Multiplicity Keywords File Type Transformer Visualize Submit

  38. Design of UNCW Grid GUI • Select from pull-down menus in categories: Basis Set Data sets (Windows Explorer-like file browser) Level of Theory HF MP2 DFT CPU Time # Processors Chg. & Multiplicity Keywords File Type Transformer Visualize Submit

  39. Design of UNCW Grid GUI • Functions can be grouped into sets called “workflows” for repetitive operations: Basis Set Data sets (Windows Explorer-like file browser) Level of Theory CPU Time # Processors Chg. & Multiplicity Keywords File Type Transformer Visualize Submit

  40. Design of UNCW Grid GUI • Preferences among choices can be saved as part of the workflow: 6-31G(d) Data sets (Windows Explorer-like file browser) HF 4000 4 0,1 NMR File Type Transformer Visualize Submit

  41. Design of UNCW Grid GUI • The result is a much more simplified process for the user: Select data, Transform it Calculate, Visualize

  42. Design of UNCW Grid GUI • Multiple repeatedly used sets of commands (‘workflows’) can be saved • A user’s preferences within a workflow (e.g., level of theory, basis set, # processors, cpu time requested, keywords, charge and multiplicity) could be saved also (future design feature). • In the future a user may need only to specify a data set (file) and link it to a pre-set ‘workflow’ to initiate a calculation!

  43. Chemistry Portal • Initially, the portal will operate under Linux • Next it will be ported to operate under Windows • Eventually, computations will be submitted online through web browsers • This could be accomplished from any devise (e.g., pc, laptop, or even a cell phone) that can access the Internet.

  44. JXPL Language • UNCW Mathematics Faculty Dr. Jeff Brown with help from Computer Science Faculty Dr. Clayton Ferner and recent graduate Mike Wood developed a new java-base programming language called JXPL. • JXPL is the language used in the GridNexus project, and is a language commonly used with web services and grid services • The advantages of JXPL include: • It is readily extensible • Interfaces easily with (LISP-like) data structures in GUI • JXPL scripts are written in XML, a commonly used language

  45. What’s Next? • More “filters” to transform data need to be developed and tested • Fancier graphics may be added to the GUIs • More computational nodes will be added to the Grid. The eventual goal is to include all NC institutions of higher learning. • Extend Grid to include more software applications • Extend Grid services to other disciplines • Include industry and businesses as users and developers.

  46. References: • http://people.uncw.edu/vetterr/grid/proposal/UNC-OP_Grid_Project%20Overview.htm • http://www.ox.compsoc.net/~swhite/history/ • http://www.grid.org • http://www.gridcomputingplanet.com/ • http://www.globus.org/research/papers/anatomy.pdf • http://www.ibm.com/grid • http://www.globus.org • http://www.usatlas.bnl.gov/computing/grid/

  47. Acknowledgments • UNC-OP for funding the UNCW Grid Initiative Proposal: “Fostering Undergraduate Research Partnerships through a Graphical User Environment for the North Carolina Computing Grid,” Dr. Ron Vetter, PI • Co-PIs:Dr. Rebecca S. Boston, NCSU; Dr. Anthony Wilkinson, WCU; Dr. Marilyn McClelland, NCCU; Dr. Libero Bartolotti, ECU; Ms. Judy Porter, CFCC. • UNCW Participants: Computer Science: Dr. Ron Vetter, Dr. Clayton Ferner, Dr. David Berman, and Dr. Tom Hudson. Information Technology Systems: Dr. Bob Tyndall and Mr. Bobby Miller. Mathematics and Statistics: Dr. Jeff Brown. Chemistry and Biochemistry: Dr. Ned H. Martin. Biological Sciences: Dr. Ann Stapleton Information Systems and Operations Management: Dr. Tom Janicki. • UNCW Computer Science students working on the Chemistry portal: Tristan Carland, Jerry Martin, Andrew Martin

More Related