110 likes | 212 Vues
Explore the world of future mass computing applications through exciting applications such as physiological and molecular dynamics simulations, video and audio manipulation, medical imaging, consumer games, and virtual reality products. Discover the potential of domain-specific architectures in the MPEG encoding parallelism field and novel forms of MPEG-4 threading building. Delve into the realm of compiler technology advancements, including advanced pointer analysis and the role of compilers in model language environments. Gain insights from the VLIW/EPIC experience, heavy-weight loops, and future compiler strategies for optimizing memory utilization and data streaming. Uncover the importance of memory data flow in enhancing efficiency and performance in next-generation applications.
E N D
Future mass apps reflect a concurrent world • Exciting applications in future mass computing market represent and model physical world. • Traditionally considered “supercomputing apps” or super-apps. • Physiological simulation, Molecular dynamics simulation, Video and audio manipulation, Medical imaging, Consumer game and virtual reality products • Attempts to grow current architectures “out” or domain-specific architectures “in” lack success; a more broad approach to cover more domains is promising
MPEG Encoding Parallelism • Independent IPPP sequences • Frames: independent 16x16 pel macroblocks • Localized dependence of P-frame macroblocks on previous frame • Steps of macroblock processing exhibit finer grained parallelism, each block spans function boundaries
Building on HPF Compilation: what’s new? • Applicability to mass software base - requires pointer analysis, control flow analysis, data structure and object analysis, beyond traditional dependence analysis • Domain-specific, application model languages • More intuitive than C for inherently parallel problems • increased productivity, increased portability • Will still likely have C as implementation language • There is room for a new app language or a family of languages • Role for the compiler in model language environments • Model can provide structured semantics for the compiler, beyond what can be derived from analysis of low-level code • Compiler can magnify the usefulness of model information with its low-level analysis
Pointer analysis: sensitivity, stability and safety Fulcra in OpenIMPACT [SAS2004, PASTE2004] and others Improved efficiency increases the scope over which unique, heap-allocated objects can be discovered Improved analysis algorithms provide more accurate call graphs (below) instead of a blurred view (above) for use by program transformation tools
Thoughts from the VLIW/EPIC Experience • Any significant compiler work for a new computing platform takes 10-15 years to mature • 1989-1998 initial academic results from IMPACT • 1995-2005 technology collaboration with Intel/HP • 2000-2005 SPEC 2000, Itanium 1 and 2, open source apps • This was built on significant work from Multiflow, Cydrom, RISC, HPC teams • Real work in compiler development begins when hardware arrives • IMPACT output code performance improved by more than 20% since arrival of Itanium hardware – and much more stable • Most apps brought up with IMPACT after Itanium systems arrived: debugging! • Real performance effects can only be measured on hardware • Early access to hardware for academic compiler teams crucial and must a priority for industry development team. • Quantitative methodology driven by large apps is key • Innovations evaluated in whole system context
Heavyweight loops How the next-generation compiler will do it (1) • To-do list: • Identify acceleration opportunities • Localize memory • Stream data and overlap computation • Acceleration opportunities: • Heavyweight loops identified for acceleration • However, they are isolated in separate functions called through pointers
How the next-generation compiler will do it (2) Initialization code identified Large constant lookup tables identified • To-do list: • Identify acceleration opportunities • Localize memory • Stream data and overlap computation • Localize memory: • Pointer analysis identifies indirect callees • Pointer analysis identifies localizable memory objects • Private tables inside accelerator initialized once, saving traffic
How the next-generation compiler will do it (3) Constant table privatized Summarize output access pattern Summarize input access pattern • To-do list: • Identify acceleration opportunities • Localize memory • Stream data and overlap computation • Streaming and computation overlap: • Memory dataflow summarizes array/pointer access patterns • Opportunities for streaming are automatically identified • Unnecessary memory operations replaced with streaming
How the next-generation compiler will do it (4) • To-do list: • Identify acceleration opportunities • Localize memory • Stream data and overlap computation • Achieve macropipelining of parallelizable accelerators • Upsampling and color conversion can stream to each other • Optimizations can have substantial effect on both efficiencyand performance
Memory dataflow in the pointer world Array of constant pointers • Arrays are not true 3D arrays (unlike in Fortran) • Actual implementation: array of pointers to array of samples • New type of dataflow problem – understanding the semantics of memory structures instead of true arrays Row arrays never overlap