1 / 51

Dynamic Binary Optimization – Part 1

Dynamic Binary Optimization – Part 1. 2006. 9.25 Nam, E Hyun. Contents. Overview Dynamic program Behavior Profiling Optimizing Translation blocks. Overview : Optimization. Optimization Migration of VM consideration from compatibility to performance Goal

naiara
Télécharger la présentation

Dynamic Binary Optimization – Part 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Binary Optimization – Part 1 2006. 9.25 Nam, E Hyun

  2. Contents • Overview • Dynamic program Behavior • Profiling • Optimizing Translation blocks

  3. Overview : Optimization • Optimization • Migration of VM consideration from compatibility to performance • Goal • To close the gap between a guest’ emulated performance and native platform performance • Type • Translation block chaining • Enlarging the translation block • Reordering translated instructions • Conventional complier optimization techniques

  4. Overview : Profile • Profile • Statistics regarding a program’s behavior • A guide for making optimization decision • Common optimization strategy is to use profiling to determine the path that are predominantly followed by control flow • Type of profile information • Instructions( or Basic Blocks ), more heavily executed • Sequence in which BB are most commonly executed • Behavior of particular data variables and addresses

  5. Overview : Profile • Advantage of profile information • Providing information that may not have been available when a program was originally compiled

  6. Overview : BB rearrangement • Definition • Method, so that predominant path has instructions in consecutive memory location • Advantages • Nice localization • Efficient instruction fetching • Type • Trace • Superblock • Tree group

  7. Overview : Staged emulation • Relation between emulation and optimization • Tightly integrated with emulation • Optimization is part of an emulation framework that support staged emulation • Staged emulation • Based on tradeoff between start-up time and steady state performance • Interpretation  Binary translation  Dynamic binary optimization

  8. Overview : Staged emulation • Stages of staged emulation • Interpretation • BB translation( e.g. chaining ) • Optimized translation( e.g. superblock ) • Highly optimized translation

  9. Overview : Spectrum of emulation

  10. Overview : Staged emulation strategy • Strategy decision factors • Source and target ISA • Type of VM being implemented • Design objective • Tradeoff between Obtained optimization performance and optimization, profiling overhead • Example • Original HP Dynamo system, Digital FX!32 • Interpret  optimized, translated code • DynamoRIO • Simple binary translation  optimization • Shade • Interpretation  simple binary translation

  11. Contents • Overview • Dynamic program Behavior • Profiling • Optimizing Translation blocks

  12. Dynamic program behavior • Goal • Optimization depends on program’s structure and dynamic behavior • By profiling, optimization system can learn about program’s structure and dynamic behavior • Important characteristics of program • High predictability of dynamic control flow • Correlation of branch direction, between current and most recent previous execution

  13. Dynamic program behavior • Important characteristics of program • Backward instruction • Is typically taken • Predictability of indirect jump • Switch statement • Return from procedure call • Predictability of data value

  14. Contents • Overview • Dynamic program Behavior • Profiling • Overview • Role • Type • Collecting the profile data • Profile during interpretation • Profiling translated code • Overhead • Optimizing Translation blocks

  15. Profiling : Role • Definition • The process of collecting instruction and data statistics for an executing program • Usage • Input to code-optimization process • Principle of profiling • Predictability of program • Past behavior will often hold for future behavior

  16. Profiling : Role • Traditional profiling & optimization procedure • Decomposing the source program into control flow graph • Analyzing the graph and inserting probes to collect profile information • Program running with a typical data input • Generating profile data • Static profile log analysis • Generating optimized code • Property • Fully analyzed • Optimal placement of probe • Entire program run and complete profile

  17. Profiling : Role • Difficulty, requirement and limitation in dynamic optimization • Program structure is not known when a program begins • Program structure must be discovered in an incremental way • Inserting profiling probes in a globally optimal manner • Optimization decision must be made as early as possible • Statistics from a partial execution of the program

  18. Profiling : Role • Tradeoff between overhead and benefit • Overhead : Initial analysis + actual collection of profile data • Benefit : execution time reduction due to optimization • Static optimization • Overhead are paid once • Dynamic optimization • Overhead are paid every time a guest program runs • Benefits must outweigh the Overhead

  19. Profiling : Type of profile data • Frequency of Execution of different code region • Hotspot • Interpretation VS binary translation • Profile data which is based on Control flow( branch and Jump ) predictability • Can be used for determining aspects of a program’s dynamic execution behavior • Used as basis for gathering and rearranging BBs into larger unit • Used to guide specific optimization • Address • Data

  20. Profiling : Type of profile data • Basics • Nodes : BBs • Edges : flow of control • BB profile • Numbers are counts of the corresponding BB’s execution • Edge profile • BB profile can be derived from edge profile • Path profile • Approximate the path profile by using a heuristics based on edge profile

  21. Profile : collecting the profile • Instrumentation based profiling • Target program related events • Count all instances of the event being profiled • Many different events can be monitored simultaneously • Monitoring method : HW, SW • Sampling based profiling • Program runs in its unmodified form • Program is interrupted and an instances of program related event is captured • Tradeoff • Instrumentation based • slow but can collect given number of profile data over much shorter period of time • Sampling based • fast but requires a longer time for collecting the same amount of profile information

  22. Profile : collecting the profile • Strategy • Collection technique depends on emulation spectrum • Interpretation • SW instrumentation is about the only choice • Optimizing binary translation, dynamic optimization system • Instrumentation • Already well optimized longer running program • Sampling

  23. Profile : profiling during interpretation • Key points • Source instructions are actually access as data • Profiling code must be added to the interpret routine • Profiling is applied to specific instruction type rather than specific instruction • It can be applied for Certain classes of instructions rather than specific instruction • E.g. Backward branch • Method • BB profile • profile code should be added to all control transfer instructions after the PC bas been updated • Edge profile • Both the PC of the control transfer instruction and the targetPC are used to define a specific instruction

  24. Profile : profiling during interpretation • Profile Table • Access method • BB profile : Via PC value of control transfer destination • Edge profile : PC value that define an edge • Hash function • Contents of entry • Basic block or edge count • For conditional branch, taken count and not taken count

  25. Profile : profiling during interpretation

  26. Profile : profiling during interpretation • Profile Count decaying • Problem of profile table • A count field overflow • Solution • Key point • Optimization method focus on not absolute count but relative frequency • Recent program event history is more valuable than that of past • Decay process • Periodically divide all the profile count by 2

  27. Profile : profiling during interpretation • Profiling Jump Instruction • Difficulties of Jump compared with conditional branch • Switch statement : frequently change • Return from procedure call : many target address • Solution • Key point • Profile-driven optimization of indirect jump tend to be focused on those jumps that very frequently have the same target • Maintain profile table with a small number of target address and track only the more recently used target

  28. Profile : profiling translated code • Instrumenting individual instructions • Each individual instruction can have its own custom profiling code • = Profiling can be selectively applied • = Profile counters can be assigned to each static instructions • Profile counters can be directly addressed without hashing • Profile code can be easily inserted and removed as needed

  29. Profiling : Overhead • Performance overhead • Example • To access hash table : hash function + 1 load + 1 compare • To increment proper count : 1 load + 1store + 1add • Profiling during interpretation VS profiling translated code • Absolute overhead VS relative overhead • Memory overhead • Profile table • Overhead reduction method • Reducing the number of instrumentation point • Heuristic + Using collected data • Code duplication • Attractive for same-ISA optimization ( 4.7 )

  30. Contents • Overview • Dynamic program Behavior • Profiling • Optimizing Translation blocks • Overview • Improving locality • Traces • Superblocks • Dynamic superblocks formation • Tree group

  31. Optimizing translation blocks : Overview • Two strategy • Improving locality • Optimization on enlarged translation blocks

  32. Optimizing translation blocks : Improving locality • Locality • Temporal • Spatial • Problem • Cache space • Performance • Low instruction fetch bandwidth

  33. Optimizing translation blocks : Improving locality • Rearrange the layout of the blocks in memory • Conditional branch tests are reversed • Unconditional branch removal/Add • Instruction fetch efficiency is improved

  34. Optimizing translation blocks : Improving locality • Procedure inlining

  35. Optimizing translation blocks : Improving locality • Partial procedure inlining • In dynamic optimization system

  36. Optimizing translation blocks : Improving locality • Pros and Cons of procedure inlining • Pros • Increase spatial locality • Remove overhead • Call and return instructions are removed • Save/restore instruction are removed • Cons • Increase code size • Increase register “pressure” • Inlined code needs more register than procedure call • Con sequently, procedure inlining is typically used only for those procedures that are very frequently called and are very small

  37. Optimizing translation blocks • Three ways of rearranging basic blocks according to control flow • Trace formation • Superblock formation • Most widely used in VM implementation • Tree group • Useful when control flow is difficult to predict • Provide wider scope for optimization

  38. Optimizing translation blocks : Traces • Traces • Chunks of contiguous instructions containing multiple BBs • Traces > Superblock • Static traces forming step • 1. Profile collection using test data • 2. Begin with start point • Most frequently executed BB ,not already part of a trace • 3. Collection BB through most common control path, until a stopping condition is met • A block already belonging to another trace is reached • The arrival at a procedure call/return boundary • 4. Collect the BBs into a trace • Reverse branch tests • removing/adding unconditional branch • 5. stop otherwise go to step 2 • In dynamic environment, Traces are not commly used s translation blocks

  39. Optimizing translation blocks : Traces

  40. Optimizing translation blocks : Superblocks • Superblocks VS Traces • Side entrance • Problems in forming superblocks • Small and a number of superblocks • Too small to provide many opportunities for optimizations • Tail duplication • The process of replicating code that appears at the end of a superblock in order to form other superblock

  41. Optimizing translation blocks : Superblocks

  42. Optimizing translation blocks : Dynamic superblock formation : Overview • Dynamic • Formed incrementally as the source code is being emulated • Complication • BB replication leads to more choices • Key question • Starting point • Continuation • Stopping point

  43. Optimizing translation blocks : Dynamic superblock formation : starting point • Heavily used block • By using Profile information • Method for determining profile points • All basic block • Heuristics • Targets of backward branches an candidates starting point • Exit arc from an existing superblock • Start threshold • When a profiled BB’s execution frequency reaches this value, a new superblock is started • Depends on emulation tradeoff • A few tens to hundreds of execution is typical

  44. Optimizing translation blocks : Dynamic superblock formation : Continuation • Continuation • Which subsequent blocks should be collected and added as the superblock is grown • Most frequently used approach • Node profile information is used to identify the most likely successor BB • Continuation threshold • A relatively complete set of profile data must be collected for all BBs • Typically half of start point threshold • Continuation set • At the time superblock formation is to begin, the set of all BBs that have reached the continuation threshold is collected

  45. Optimizing translation blocks : Dynamic superblock formation : Continuation • Most frequently used procedure

  46. Optimizing translation blocks : Dynamic superblock formation : Continuation • Most Recently used approach • Edge profile information • Algorithm • Assumption • The very next sequence of blocks following a start point is also likely to be a common path • Simply follows the actual dynamic control flow path one edge at a time • Advantage • Only candidate start point need to be profiled • = No need to use profiling for continuation blocks • = Profile overhead is substantially reduced

  47. Optimizing translation blocks : Dynamic superblock formation : stopping point • Type of heuristics to determine stop condition • The start point of the same superblock is reached • A start point of some other superblock is reached • A superblock has reached some maximum length • A BB can be used in more than one superblock  there may be multiple copies of a given BB  Explosion of code size • When using the most frequently used heuristic, there are no more candidate BBs that have reached the candidate threshold • An indirect jump is reached, or there is a procedure call

  48. Optimizing translation blocks : Dynamic superblock formation : Example • Most frequently used

  49. Optimizing translation blocks : Dynamic superblock formation : Example • Most Recently used • Profile point is just A because A is target of backward branch • Most likely • ADEG  BCG  FG • However • There is about 30% chance • ABCG  DEG  FG • There are cases where a most recently executed method may not select superblocks quite as well as most frequently executed method

  50. Optimizing translation blocks : Tree group • Background • Problems when applying Superblock for Branches that tend to almost evenly split their decision • Side exit is frequently taken  compensation code overhead • Optimization are typically not done along the side exit  losing performance improvement opportunities • Traces, Superblock VS Tree group • Tree group • conditional branch outcomes are more evenly balanced • Generalization of superblock • Multiple flow of control • Superblocks • Conditional branches are predominantly decided one way • Single flow of control

More Related