1 / 37

greg astfalk woon yung chung woon-yung_chung@hp

high-end computing technology: where is it heading? greg astfalk woon yung chung woon-yung_chung@hp.com prologue this is not a talk about hewlett-packard ’ s product offering(s) the context is hpc (high performance computing) somewhat biased to scientific computing

betty_james
Télécharger la présentation

greg astfalk woon yung chung woon-yung_chung@hp

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. high-end computing technology: where is it heading? greg astfalk woon yung chung woon-yung_chung@hp.com

  2. prologue this is not a talk about hewlett-packard’s product offering(s) the context is hpc (high performance computing) somewhat biased to scientific computing also applies to commercial computing

  3. backdrop end-users of hpc systems have needs and “wants” from hpc systems the computer industry delivers the hpc systems there exists a gap between the two wrt programming processors architectures interconnects/storage in this talk we (weakly) quantify the gaps in these 4 areas

  4. end-users of hpc machines would ideally like to think and code sequentially have a compiler and run-time system that produces portable and (nearly) optimal parallel code regardless of processor count regardless of architecture type yes, i am being a bit facetious but the idea remains true end-users’ programming “wants”

  5. parallelism methodologies there exists 5 methodologies to achieve parallelism automatic parallelization via compilers explicit threading pthreads message-passing mpi pragma/directive openmp explicitly parallel languages upc, et al.

  6. parallel programming parallel programming is a cerebral effort if lots of neurons plus mpi constitutes “prime-time” then parallel programming has arrived no major technologies on the horizon to change this status quo

  7. discontinuities the ease of parallel programming has not progressed at the same rate that parallel systems have become available performance gains require compiler optimization or pbo most parallelism requires hand-coding in the real-world many users don’t use any compiler optimizations

  8. parallel efficiency mindful that the bounds on parallel efficiency are, in general, far apart 50% efficiency on 32 processors is good 10% efficiency on (100) processors is excellent >2% efficiency on (1000) processors is heroic a little communication can “knee over” the efficiency vs. processor count curve

  9. apps with sufficient parallelism few existing applications can utilize (1000), or even (100), processors with any reasonable degree of efficiency to date have generally required heroic effort new algorithms (i.e., data and control decompositions) or nearly complete are necessary such large-scale parallelism will have “arrived” when msc/nastran and oracle exist on such systems and utilize the processors

  10. latency tolerance will be a increasingly important theme for the future hardware will not solve this problem more on this point later developing algorithms that have significant latency tolerance will be necessary this means thinking “outside the box” about the algorithms simple modifications to existing algorithms generally won’t suffice latency tolerant algorithms

  11. operating systems development environments will move to nt heavy-lifting will remain with unix four unix’s to survive (alphabetically) hp-ux linux aix 5l solaris linux will be important at the lower-end but will not significantly encroach on the high-end

  12. end-users’ proc/arch “wants” all things being equal high-end users would likely want a classic cray vector supercomputer no caches multiple pipes to memory single word access hardware support for gather/scatter etc. it is true however that for some applications contemporary risc processors perform better

  13. processors the “processor of choice” is now, and will be, for some time to come the risc processor risc processors have caches caches are good caches are bad if your code fits in cache, you aren’t supercomputing! 

  14. a rule of thumb is that a risc processor, any risc processor, gets on average, on a sustained basis, 10% of its peak performance the 3 on this is large achieved performance varies with architecture application algorithm coding dataset size anything else you can think of risc processor performance

  15. semiconductor processes semiconductor processes change every 2-3 years assuming that “technology scaling” applies to subsequent generations then per generation frequency increase of ~40% transistor density increase of ~100% energy per transition decrease of ~60%

  16. semiconductor processes

  17. what to do with gates it is not a simple question of what the best use of the gates is larger caches multiple cores specialized functional units etc. the impact of soft errors with decreasing design rule size will be a important topic what happens if a alpha particles flips a bit in a register?

  18. processor futures you can expect, for the short term, moore’s law like gains in processor’s peak performance doubling of “performance” every 18-24 months does not necessarily apply to application performance moore’s law will not last forever 4-5 more turns (maybe?)

  19. 1995 1996 1997 1998 1999 2000 2001 2002 2003 Cisc Ia64 Ia32 Risc customer spending ($m) $40,000 $35,000 $30,000 $25,000 $20,000 $15,000 $10,000 $5,000 $0 idc, february 2000 • technology disruptions • risc crossed over cisc in 1996 • itanium will cross over risc in 2004

  20. present high-end architectures today’s high-end architecture is either smp ccnuma cluster of smp nodes cluster of ccnuma nodes japanese vector system all of these architectures work efficiency varies with application type

  21. architectural issues of the choices available the smp is preferred, however smp processor count is limited cost of scalability is prohibitive ccnuma addresses these limitations but induces its own disparate latencies better, but still limited, scalability ras limitations clusters too have pros and cons huge latencies low cost etc.

  22. physics limitations imposed by physics have led us to architectures that have a deep memory hierarchy the algorithmist and programmer must deal with, and exploit, the hierarchy to achieve good performance this is part of the cerebral effort of parallel programming we mentioned earlier

  23. memory hierarchy typical latencies for today’s technology

  24. balanced system ratios a “ideal” high-end system should be balanced wrt its performance metrics for each peak flop/second 0.5–1 byte of physical memory 10–100 byte of disk capacity 4–16 byte/sec of cache bandwidth 1–3 byte/sec of memory bandwidth 0.1–1 bit/sec of interconnect bandwidth 0.02–0.2 byte/sec of disk bandwidth

  25. balanced system applying the balanced system ratios to a unnamed contemporary 16 processor smp

  26. storage data volumes are growing at a extremely rapid pace disk capacity sold doubled from 1997 to 1998 storage is a increasingly large percent of the total server sale disk technology is advancing too slowly per generation, of 1-1.5 years; access time decreases 10% spindle bandwidth increases 30% capacity increases 50%

  27. networks only the standards will be widely deployed gigabit ethernet gigabyte ethernet fibre channel (2x and 10x later) sio atm dwdm backbones the “last mile” problem remains with us inter-system interconnect for clustering will not keep pace with the demands (for latency and bandwidth)

  28. vendor’s constraints rule #1: be profitable to return value to the shareholders you don’t control the market size you can only spend ~10% of your revenue on r&d don’t fab your own silicon (hopefully) you must be more than just a “technical computing” company to not do this is to fail to meet rule #1 (see above)

  29. market sizes according to the industry analysts the technical market is, depending on where you draw the cut-line, $4-5 billion annually the bulk of the market is small-ish systems (data from forest baskett at sgi)

  30. a perspective commercial computing is not a enemy without the commercial market’s revenue our ability to build hpc-like systems would be limited the commercial market benefits from the technology innovation in the hpc market is performance “left on the table” in designing a system to serve both the commercial and technical markets yes

  31. why? lack of a cold war performance of hpc systems has been marginalized in the mid-70s how many applications ran faster on a vax 11/780 than the cray-1 none how many applications today run faster on a pentium than the cray t90? some current demand for hpc systems is elastic

  32. future prognostication computing in the future will be all about data and moving data the growth in data volumes is incredible richer media types (i.e., video) means more data distributed collaborations imply moving data e-whatever requires large, rapid data movement more flops  more data

  33. data movement the scope of data movement encompasses: register to functional unit cache to register cache to cache memory to cache disk to memory tape to disk system to system pda to client to server continent to continent all of these are going to be important

  34. epilogue for hpc in the future it is going to be risc processors smp and ccnuma architectures smp processor count relatively constant technology trends are reasonably predictable mpi, pthreads and openmp for parallelism latency management will be crucial it will be all about data

  35. epilogue (cont’d) for the computer industry in the future trending toward “e-everything” e-commerce apps-on-tap brokered services remote data virtual data centers visualization nt for development vectors are dying for hpc vendors in the future there will be fewer 

  36. conclusion hpc users will need to yield more to what the industry can provide rather than vice-versa vendor’s rule #1 is a cruel master

More Related