1 / 40

Conference Review Presented by: Ivan Matosevic

CGO 2006: The Fourth International Symposium on Code Generation and Optimization New York, March 26-29, 2006. Conference Review Presented by: Ivan Matosevic. Outline. Conference overview Brief summaries of sessions Keynote speeches Best paper. Conference Overview.

duena
Télécharger la présentation

Conference Review Presented by: Ivan Matosevic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CGO 2006:The Fourth International Symposium on Code Generation and OptimizationNew York, March 26-29, 2006 Conference Review Presented by: Ivan Matosevic

  2. Outline • Conference overview • Brief summaries of sessions • Keynote speeches • Best paper

  3. Conference Overview • Primary focus: back-end compilation techniques • Static analysis and optimization • Profiling • Run-time techniques • 8 sessions, 29 papers • Dominating topics: multicores, dynamic compilation

  4. Overview of Session • Dynamic Optimization • Object-Oriented Code Generation and Optimization • Phase Detection and Profiling • Tiled and Multicore Compilation • Static Code Generation and Optimization Issues • SIMD Compilation • Optimization Space Exploration • Security and Reliability

  5. Session 1: Dynamic Optimization • Kim Hazelwood (University of Virginia), Robert Cohn (Intel), A Cross-Architectural Interface for Code Cache Manipulation • Pin dynamic instrumentation system with code cache • The paper describes an API for various operations with the code cache (callbacks, lookups, statistics, etc.) • Derek Bruening, Vladimir Kiriansky, Tim Garnett, Sanjeev Banerji (Determina Corporation), Thread-Shared Software Code Caches • Problem: sharing a code cache across multiple threads • Authors propose a fine-grained locking scheme • Evaluation using DynamoRIO

  6. Session 1: Dynamic Optimization • Keith Cooper, Anshuman Dasgupta (Rice Univ.), Tailoring Graph-coloring Register Allocation For Runtime Compilation • Problem: register allocation in JIT compilers • Authors propose a novel lightweight graph-colouring technique • Weifeng Zhang, Brad Calder, Dean Tullsen (UC San Diego), A Self Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework • Extension of the Trident event-driven dynamic optimization framework (previously proposed by the same authors) • Dynamic insertion of prefetching instructions based on run-time analysis

  7. Session 2: Object-Oriented CodeGeneration and Optimization • Suresh Srinivas, Yun Wang, Miaobo Chen, Qi Zhang, Eric Lin, Valery Ushakov, Yoav Zach, Shalom Goldenberg (Intel Corporation), Java JNI Bridge: An MRTE Framework for Mixed Native ISA Execution • Use a dynamic translator for the execution of native calls to one ISA on a different ISA’s Java platform • Kris Venstermans, Lieven Eeckhout, Koen De Bosschere (Ghent University), Space-Efficient 64-bit Java Objects through Selective Typed Virtual Addressing • Use address bits on a 64-bit architecture to encode object type in order to save memory • Objects of the same type allocated in a contiguous (virtual) region

  8. Session 2: Object-Oriented CodeGeneration and Optimization • Daryl Maier, Pramod Ramarao, Mark Stoodley, Vijay Sundaresan (IBM Canada), Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler • The IBM TestaRossa JIT compiler • This paper focuses on code patching and profiling in a multi-threaded environment with a lot of class loading/unloading • Lixin Su, Mikko H Lipasti (University of Wisconsin Madison), Dynamic Class Hierarchy Mutation • Run-time reassignment of objects from one derived class to another, changing its virtual tables • Offers opportunity for optimizations based on specialization

  9. Session 3: Phase Detection and Profiling • Priya Nagpurkar, (UCSB), Michael Hind (IBM), Chandra Krintz, (UCSB), Peter Sweeney, V.T. Rajan (IBM), Online Phase Detection Algorithms • Detecting phase behaviour in virtual machines • Track dynamic program parameters (methods invoked, branch directions…) over time and apply a similarity model • Jeremy Lau, Erez Perelman, Brad Calder (UC San Diego), Selecting Software Phase Markers with Code Structure Analysis • Portions of code whose execution correlates with phase changes • Procedure calls and returns, loop boundaries • Profile-based hierarchical loop-call graph

  10. Session 3: Phase Detection and Profiling • Shashidhar Mysore, Banit Agrawal, Timothy Sherwood, Nisheeth Shrivastava, Subhash Suri (UC Santa Barbara), Profiling over Adaptive Ranges • Voted best paper – details later • Hyesoon Kim, Muhammad Aater Suleman, Onur Mutlu, Yale N. Patt (UT-Austin), 2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set • Predicts whether the prediction accuracy of each branch will vary across input sets • Heuristic approach used to derive representative profiling results from a single input set

  11. Session 4: Tiled and Multicore Compilation • David Wentzlaff, Anant Agarwal (MIT), Constructing Virtual Architectures on a Tiled Processor • Map components of a superscalar architecture (Pentium III) onto a parallel tiled architecture (Raw) using dynamic translation • In a way, uses Raw as a coarse-grain FPGA • Aaron Smith, (UT-Austin), J. Burrill, (UMass at Amherst), J. Gibson, B. Maher, N. Nethercote, B. Yoder, D. Burger, K. S. McKinley (UT-Austin), Compiling for EDGE Architectures • TRIPS EDGE (Explicit Data Graph Execution) architecture • This paper focuses on compilation of standard C and FORTRAN benchmarks

  12. Session 4: Tiled and Multicore Compilation • Shih-wei Liao, Zhaohui Du, Gansha Wu, Guei-Yuan Lueh (Intel), Data and Computation Transformations for Brook Streaming Applications on Multiprocessors • Parallel compiler for the Brook streaming language • An extension of C that enables specifying data parallelism • Michael L. Chu, Scott A. Mahlke (University of Michigan), Compiler-directed Object Partitioning for Multicluster Processors • Partitioning of data in clustered architectures such as Raw • I didn’t really understand what programming model these authors have in mind?

  13. Session 5: Static Code Generation andOptimization Issues • Two papers about the HPUX Itanium compiler: • Dhruva R. Chakrabarti, Shin-Ming Liu (Hewlett-Packard), Inline Analysis: Beyond Selection Heuristics • Cross-module techniques for selection of inlined call sites and the choice of specialized function versions • Robert Hundt, Dhruva R. Chakrabarti, Sandya S. Mannarswamy (Hewlett-Packard), Practical Structure Layout Optimization and Advice • Data layout and placement on the heap to improve locality • Structure splitting, structure peeling, dead field removal, and field reordering

  14. Session 5: Static Code Generation andOptimization Issues • Chris Lupo, Kent Wilken (University of California, Davis), Post Register Allocation Spill Code Optimization • Authors propose a profile-based algorithm for placement of save/restore instructions handling spilled variables in function calls • Implemented as a part of GCC • Seung Woo Son, Guangyu Chen, Mahmut Kandemir (Pennsylvania State University), A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality • Goal: restructure code so that disk idle periods are lengthened • The approach targets array-based programs: disk layout of array data exposed to the compiler

  15. Session 6: SIMD Compilation • Jianhui Li, Qi Zhang, Shu Xu, Bo Huang (Intel China Software Center), Optimizing Dynamic Binary Translation for SIMD Instructions • Algorithms for dynamic binary translation of SIMD instructions in general-purpose architectures (such as MMX in x86) • Evaluation using IA-32 binaries on Itanium 2 • Dorit Nuzman (IBM), Richard Henderson (Red Hat), Multi-Platform Auto-Vectorization • Implementation of automatic vectorizer for GCC 4.0

  16. Session 7: Optimization-space Exploration • Felix Agakov, Edwin Bonilla, John Cavazos, Bjoern Franke, Grigori Fursin, Michael O'Boyle, Marc Toussaint, John Thomson, Chris Williams (U. of Edinburgh), Using Machine Learning to Focus Iterative Optimization • Predictive modelling used to search the optimization space • Targets embedded platforms – AMD Au1500 and Texas Instruments TI C6713 • Prasad Kulkarni, David Whalley, Gary Tyson (Florida State University), Jack Davidson (University of Virginia), Exhaustive Optimization Phase Order Space Exploration • Exhaustive search of the phase order space (15 phases) using aggressive pruning; takes time on the order of minutes to hours • Targets StrongARM SA-100

  17. Session 7: Optimization-space Exploration • Zhelong Pan, Rudolf Eigenmann (Purdue University), Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning • Problem: find the optimal combination of 38 GCC O3 options, targeting Pentium IV and Sparc II • Proposed heuristic algorithm that provides s quality solution in time on the order of several hours

  18. Session 8: Security and Reliability • Edson Borin, (UNICAMP), Cheng Wang, Youfeng Wu (Intel), Guido Araujo (UNICAMP), Software-Based Transparent and Comprehensive Control-Flow Error Detection • Addresses the problem of soft (transient) errors that cause branches to incorrect instructions • Implemented in SW as a part of a dynamic binary translator • Tao Zhang, Xiaotong Zhuang, Santosh Pande (Georgia Tech), Compiler Optimizations to Reduce Security Overheads • Optimizations that specifically target techniques that implement software protection with minimal HW support

  19. Session 8: Security and Reliability • Susanta Nanda, Wei Li, Tzi-cker Chiueh (State University of NY at Stony Brook), BIRD: Binary Interpretation using Runtime Disassembly • Goal: framework for automatic detection of vulnerabilities such as buffer overflows when the source code is not available • Static and dynamic disassembly and instrumentation – targets Windows x86 application

  20. Keynote Speeches • Wei Li, Principal Engineer, Intel: "Parallel Programming 2.0" • Kevin Stoodley, Fellow and CTO of Compilation Technology, IBM: "Productivity and Performance: Future Directions in Compilers"

  21. Wei Li: Parallel Programming 2.0 • Major technological change: • Moore’s Law continues to increase transistor counts • However: power, memory latency, limits to ILP are setting an effective performance ceiling • General trend towards thread-level on-chip parallelism • SMT • Chip multiprocessors

  22. Wei Li: Parallel Programming 2.0 • “Parallel Programming 2.0” refers to the advent of multicores • A very optimistic future vision:

  23. Wei Li: Parallel Programming 2.0 • Key issue – where will the parallelism come from? • Parallel programming needs to become more mainstream • Consumer vs. HPC/server/database • Inclusion into education at more elementary level • New tools for greater ease of programming • Intel’s parallel programming tools • http://www.intel.com/software

  24. K. Stoodley:"Productivity and Performance: Future Directions in Compilers" • Limits to traditional static compilation • Overview of IBM compiler technology • Testarossa JIT compiler, Toronto Portable Optimizer, Tobey backend • Challenges at present and near future • Software abstraction complexity – forces the scope of compilation to higher levels • Maintaining high performance backwards compatibility increasingly difficult

  25. xlc xlC xlf Front Ends class class jar W-Code J9 Execution Engine (Java + Others) CPO Toronto Portable Optimizer (TPO) TOBEY Backend Testarossa JIT Dynamic Machine Code Binary Translation Profile-Directed Feedback (PDF) Static Machine Code K. Stoodley:"Productivity and Performance: Future Directions in Compilers" • Future: convergence/combination of dynamic and static compilation technologies

  26. Best Paper • Shashidhar Mysore, Banit Agrawal, Timothy Sherwood, Nisheeth Shrivastava, Subhash Suri (UC Santa Barbara): Profiling over Adaptive Ranges

  27. Profiling over Adaptive Ranges • Problem: how to count specific events efficiently and accurately? • Code segments executed • Memory regions accessed • IP addresses of routed packets • In all cases, impossible to maintain separate counters for the entire range of values • Each basic block, memory address, IP address…

  28. Trade-off: Precision vs. Efficiency • Profiling with uniform ranges fails to distinguish hot code Uniform ranges Unlimited counters

  29. Higher Precision for Hot Regions • Good trade-off with limited resources: • High precision for hot regions • Low precision for colder ones, but this affects the accuracy less • Challenge: how to determine what exactly to count with what precision?

  30. Solution: Adaptive Profiling • Start with one counter; split counters as they become hot:

  31. Solution: Adaptive Profiling • Start with one counter; split counters as they become hot:

  32. Solution: Adaptive Profiling • Start with one counter; split counters as they become hot:

  33. Counter Merging • Problem: what if program behaviour changes after the initialization phase?

  34. Counter Merging • Problem: what if program behaviour changes after the initialization phase?

  35. Counter Merging • Solution: perform counter merging along with splitting

  36. Counter Merging • Counters of merged child nodes added to the parent

  37. Counter Merging • Counters of merged child nodes added to the parent

  38. Counter Merging • Problem: how to identify nodes for merging? • They are by definition those ones that are not updated frequently • Solution: periodic batched merge operations • Tree depth grows at logarithmic rate  can be done at exponentially increasing intervals

  39. Additional Contributions • Heuristics for splitting and merging • Theoretical analysis of accuracy guarantees • Proposal for hardware implementation • Experimental evaluation • Memory requirements • Average and worst-case errors on benchmarks • Performance of HW implementation • Accuracies on the order of 98.0-99.8% with only 8-64K of memory

  40. Conclusions • Highly interesting program • My short presentation certainly doesn’t do justice to most of the mentioned works! • Readings to perhaps consider for future CARG: • D. Wentzlaff, A. Agarwal, Constructing Virtual Architectures on a Tiled Processor • A. Smith et al., Compiling for EDGE Architectures • F. Agakov et al., Using Machine Learning to Focus Iterative Optimization • (Highly subjective!)

More Related