190 likes | 309 Vues
The CC – GRID? Era CC GS C 2002. Gordon Bell (gbell@microsoft.com) Bay Area Research Center Microsoft Corporation. Observations from a mostly Grid workshop. Clusters. Let’s finish the job! Grids generally. Grids as arbitrary cluster platforms…why?
E N D
The CC – GRID? EraCCGSC 2002 Gordon Bell (gbell@microsoft.com) Bay Area Research Center Microsoft Corporation
Observations from a mostly Grid workshop • Clusters. Let’s finish the job! • Grids generally. • Grids as arbitrary cluster platforms…why? • Examples of Grid-types, especially web services • Summary…
Blades aka a “cluster in a cabinet” • 366 servers per 44U cabinet • Single processor • 2 - 30 GB/computer (24 TBytes) • 2 - 100 Mbps Ethernets • ~10x perf*, power, disk, I/O per cabinet • ~3x price/perf • Network services… Linux based *42, 2 processors, 84 Ethernet, 3 TBytes
Clusters aren’t as bad as programs make them out to be, but we need to make them work better and be more transparent. • Everything is becoming a cluster. Certainly all of 500! • 64 bit addressing will cause more change! • Future nodes should bet on CLMP smP’s (p = 4-32) .Utilize existing and emerging smP’s nodes versus assuming lcd PM-pairs & MPI. • Massive gains from compiler and runtime. ES has set a new standard of efficiency and system transparency for “clusters”. • Expand the MPI programming model: • Full transparency of MPI needs to be the goal • Objectify for greater flexibility and greater insulation from latency
Grids: If they are the solution what’s the problem? • Economics… thief, scavenger, power, efficiency or resource sharing? • Research funding… that’s where the money is • Are they where the problems lie? • Does massive collaboration that the Grids enable, create massive overhead and generally less output?Unless the output is for a community! • Is funding and middleware a good investment?
Same observations as 2000 X • GRID was/is an exciting concept … • They can/must work within a community, organization, or project. Apps need to drive. • “Necessity is the mother of invention.” • Taxonomy… interesting vs necessity • Cycle scavenging and object evaluation (e.g. seti@home, QCD) • File distribution/sharing for IP theft e.g. Napster • Databases &/or programs for a community(astronomy, bioinformatics, CERN, NCAR) • Workbenches: web workflow chem, bio… • Exchanges… many sites operating together • Single, large objectified pipeline… e.g. NASA. • Grid as a cluster platform! Transparent & arbitrary access including load balancing Web SVCs
Grid nj. An arbitrary distributed, cluster platform A geographical and multi-organizational collection of diverse computers dynamically configured as cluster platforms responding to arbitrary, ill-defined jobs “thrown” at it. • Costs are not necessarily favorable e.g. disks are less expensive than cost to transfer data. • Latency and bandwidth are non-deterministic, thereby changing cluster characteristics • Once a large body of data exists for a job, it is inherently bound to (set into) fixed resources. • Large datasets & I/O bound programs need to be with their data or be database accesses… • But are there resources there to share? • Bound to cost more?
Bright spots… near term, user focus, a lesson for Grid suppliers • Tony Hey apps-based funding. Web services based Grid & data orientation. • David Abramson - Nimrod. • Parameter scans… other low hanging fruit • Encapsulate apps! “Excel”-- language/control mgmt. • “Legacy apps are programs that users just want, and there’s no time or resources to modify code …independent of age, author, or language e.g. Java.” • Andrew Grimshaw - Avaki • Making Legion vision real. A reality check. • Lip 4 pairs of “web services” based apps • Gray et al Skyservice and Terraservice • Goal: providing a web service must be as easy as publishing a web page…and will occur!!!
SkyServer: delivering a web service to the astronomy community. Prototype for other sciences? Gray, Szalay, et al First paper on the SkyServer http://research.microsoft.com/~gray/Papers/MSR_TR_2001_77_Virtual_Observatory.pdf http://research.microsoft.com/~gray/Papers/MSR_TR_2001_77_Virtual_Observatory.doc Later, more detailed paper for database community http://research.microsoft.com/~gray/Papers/MSR_TR_01_104_SkyServer_V1.pdf http://research.microsoft.com/~gray/Papers/MSR_TR_01_104_SkyServer_V1.doc
What can be learned from Sky Server? • It’s about data, not about harvesting flops • 1-2 hr. query programs versus 1 wk programs based on grep • 10 minute runs versus 3 day compute & searches • Database viewpoint. 100x speed-ups • Avoid costly re-computation and searches • Use indices and PARALLEL I/O. Read / Write >>1. • Parallelism is automatic, transparent, and just depends on the number of computers/disks. • Limited experience and talent to use dbases.
Heuristics for building communities that need to share data & programs • Always go from working to working • Do it by induction in time and space(Why version 3 is pretty good.) • Put ONE database in place that’s useful by itself in terms of UI, content, & queries • Invent and demo 10-20 instances of use • Get two working in a single location • Extend to include a second community, with an appropriate superset capability
You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years. 1PB ~10,000 >> 1,000 disks At some point you need indices to limit searchparallel data search and analysis Goal using dbases. Make it easy to Publish: Record structured data Find data anywhere in the network Get the subset you need! Explore datasets interactively Database becomes the file system!!! You can FTP 1 MB in 1 sec. You can FTP 1 GB / min. … 2 days and 1K$ … 3 years and 1M$ Some science is hitting a wallFTP and GREP are not adequate (Jim Gray)
Network concerns • Very high cost • $(1 + 1) / GByte to send on the net; Fedex and 160 GByte shipments are cheaper • DSL at home is $0.15 - $0.30 • Disks cost less than $2/GByte to purchase • Low availability of fast links (last mile problem) • Labs & universities have DS3 links at most, and they are very expensive • Traffic: Instant messaging, music stealing • Performance at desktop is poor • 1- 10 Mbps; very poor communication links • Manage: trade-in fast links for cheap links!!
Gray’s $2.4 K, 1 TByte Sneakernet aka Disk Brick Cost to move a Terabyte Cost, time, and speed to move a Terabyte Cost of a “Sneaker-Net” TB • We now ship NTFS/SQL disks. • Not good format for Linux. • Ship NFS/CIFS/ODBC servers (not disks). • Plug “disk” into LAN. • DHCP then file or DB serve… • Web Service in long term Courtesy of Jim Gray, Microsoft Bay Area Research
Cost, time of Sneaker-net vs Alts Courtesy of Jim Gray, Microsoft Bay Area Research
Grids: Real and “personal”Two carrots, one downside. A bet. • Bell will match any Gordon Bell Prize (parallelism, performance, or performance/cost) winner’s prize that is based on “Grid Platform Technology”. • I will bet any individual or set of individuals of the Grid Research community up to $5,000 that a Grid application will not win the above by SC2005.
The EndHow can GRIDs become a real, useful, computer structure?Get a life. Adopt an application community!Success if CCGSC2004 is the last…by making Grids ubiquitous.