1 / 39

HLL VM Implementation

HLL VM Implementation. chap6.5~6.7 김정기 Kim, Jung ki October 11 th , 2006. System programming 특강 , 2006. Contents. Basic Emulation High-performance Emulation Optimization Framework Optimizations Case study: The Jikes Research Virtual Machine. Basic Emulation.

bao
Télécharger la présentation

HLL VM Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HLL VM Implementation chap6.5~6.7 김정기 Kim, Jung ki October 11th, 2006 System programming 특강, 2006

  2. Contents • Basic Emulation • High-performance Emulation • Optimization Framework • Optimizations • Case study: The Jikes Research Virtual Machine

  3. Basic Emulation • The emulation engine in a JVM can be implemented in a number of ways • interpretation • just-in-time compilation(JIT) • JIT • Methods are compiled at the time they are first invoked • JIT compilation is enabled because the Java ISA’s instructions in a method can easily be discovered

  4. JIT vs conventional compiler • JIT doesn’t have a frontend for parsing and syntax checking before intermediate form • Different intermediate form before optimization • Optimization strategy • multiple optimization levels through profiling • applying optimizations selectively to hot spot(not entire method) • Examples • interpretation : Sun HotSpot, IBM DK • compilation : Jikes RVM

  5. Contents • Basic Emulation • High-performance Emulation • Optimization Framework • Optimizations • Case study: The Jikes Research Virtual Machine

  6. High-Performance Emulation • Two challenges for HLL VMs • to offset run-time optimization overhead with execution-time improvement • to make an object-oriented program go fast • Frequent use of addressing indirection and small methods

  7. Optimization Framework translated code profile data SimpleCompiler OptimizingCompiler Bytecodes Interpreter Profile Data CompiledCode OptimizedCode Host Platform

  8. Contents • Basic Emulation • High-performance Emulation • Optimization Framework • Optimizations • Code Relayout • Method Inlining • Optimizing Virtual Method Calls • Multiversioning and Specialization • On-Stack Replacement • Optimization of Heap-Allocated Objects • Low-Level Optimizations • Optimizing Garbage Collection • Case study: The Jikes Research Virtual Machine

  9. Code Relayout • the most commonly followed control flow paths are in contiguous location in memory • improved locality and conditional branchpredictability

  10. B Code Relayout A 3 Br cond1 == false D A Br cond3 == true 30 70 E D B G Br cond4 == true 1 29 68 Br uncond 2 E F C Br cond2 == false 29 68 1 C G Br uncond 97 1 F Br uncond

  11. Method Inlining • Benefits • calling overheads decrease especially in object-oriented • passing parameters • managing stack frame • control transfer • code analysis scope expands • more optimizations are applicable. • Effects may be different by method’s size • small method : beneficial in most of cases • large method : low portion of calling sequence, sophisticated cost-benefit analysis is needed ->code explosion may occur : poor cache behavior, performance losses

  12. Method Inlining • Processing sequence1. profiling by instrument2. constructing call-graph at certain intervals3. invoking dynamic optimization system when call counts exceeds certain threshold • Reducing analysis overhead • profile counter is included in stack frame. • When meet the threshold, “walk” backward through the stack

  13. Method Inlining MAIN MAIN 900 100 900 A X A 100 1000 1500 25 1500 threshold C B C Y threshold via stack frame With a call graph

  14. Optimizing Virtual Method Calls • the most common case • Determination of which code to use is done at run time via a dynamic method table lookup. If (a.isInstanceof(Sqaure)){ inlined code…. . }Else invokevirtual <perimeter> Invokevirtual <perimeter>

  15. Optimizing Virtual Method Calls • If inlining is not useful, just removing method table lookup is also helpful • Polymorphic Inline Caching circle perimeter code square perimeter code if type = circle jump to circle perimeter code else if type = square jump to square perimeter codeelse call lookup …invokevirtualperimeter… …call PIC stub… polymorphic Inline Cache stub update PIC stub;method table lookup code

  16. Multiversioning and Specialization • Multiversioning by specialization • If some variables or references are always assigned data values or types known to be constant (or from a limited range) • simplified, specialized code can be used for (int i=0;i<1000;i++) { if (A[i] ==0 ) B[i] = 0; } for (int i=0;i<1000;i++) { if(A[i]<0) B[i] = -A[i]*C[i]; else B[i] = A[i]*C[i];} if(A[i]<0) B[i] = -A[i]*C[i]; else B[i] = A[i]*C[i]; specialized code

  17. Multiversioning and Specialization • defered compilation of the general case for (int i=0;i<1000;i++) { if (A[i] ==0 ) B[i] = 0; } for (int i=0;i<1000;i++) { if(A[i]<0) B[i] = -A[i]*C[i]; else B[i] = A[i]*C[i];} jump to dynamic compiler for deferred compilation

  18. On-Stack Replacement • due to no benefit until the next call, OSR is needed • Implementation stack needs to be modified on the fly. • OSR is needed in this case • inlining in long-running method • defered compilation • debugging (user expect to observe the architected instruction sequence)

  19. On-Stack Replacement optimize/de-optimizemethod code methodcodeopt. level y methodcodeopt. level x stack stack 2. generate a new implementation frame implementationframe A implementationframe B architectedframe 3. replace the current implementation stack frame 1. extract architected state

  20. On-Stack Replacement • OSR is a complex operation • If the initial stack frame is maintained by an interpreter or an nonoptimizing compiler, then extracting architected stack state straightforward • On the other hand, compiler may define a set of program points where OSR can potentially occur and then ensure that the architected values are live at that point in the execution.

  21. On-Stack Replacement • Meaning of OSR • state performance benefits are small • allowing the implementation of debuggers • reducing start-up time in defered compilation • improving cache performance

  22. Optimization of Heap-Allocated Objects • Creating objects and garbage collection have high cost • the code forthe heap allocation and objectinitializationcan beinlined for frequently allocated objects • scalar replacement • escape analysis • effective for reducing object access delays

  23. Optimization of Heap-Allocated Objects • Scarlar Replacement -access delays are reduced class square {int side;int area;}void calculate() { a = new square(); a.side= 3; a.area= a.side * a.side;System.out.println(a.area); } void calculate() { int t1= 3;int t2= t1 * t1;System.out.println(t2); }

  24. Optimization of Heap-Allocated Objects • field ordering for data usage patterns • to improve data cache performance • to remove redundant object accesses redundant getfield (load) removal a = new square; b = new square; c = a; … a.side = 5; b.side = 10; z = c.side; a = new square; b = new square; c = a; … t1 = 5; a.side = t1; b.side = 10 z = t1;

  25. Low-Level Optimizations • array range and null reference checking is significant in object-oriented HLL VMs • array range and null reference checking for throwing exception may cause two performance losses • overhead needed to perform check itself • some optimizations are inhibited for a precise state

  26. Low-Level Optimizations p = new Z q = new Z r = p … p.x = … <null check p> … = p.x<null check p> … q.x = …<null check q> … r.x = …<null check r(p)> p = new Z q = new Z r = p … p.x = … <null check p> … = p.x … r.x = … q.x = … <null check q> Removing Redundant Null Checks

  27. Low-Level Optimizations • Hoisting an Invariant Check • checking can be hoisted outside the loop if (j < A.length)then for (int i=0;i<j;i++) { sum += A[i]; } else for (int i=0;i<j;i++) { sum += A[i]; <range check A> } for (int i=0;i<j;i++) { sum += A[i]; <range check A> }

  28. Low-Level Optimizations • Loop Peeling • the null check is not needed for the remaining loop iterations r = A[0]; B[0] = r*2; p.x = A[0]; <null check p> for (int i=1;i<100;i++) { r = A[i]; p.x += A[i]; B[i] = r*2; } for (int i=0;i<100;i++) { r = A[i]; B[i] = r*2; p.x += A[i]; <null check p> }

  29. Optimizing Garbage Collection • Compiler support • Compiler provide the garbage collector with “yield point” at regular intervals in the code. At these points a thread can guarantee a consistent heap state, and control can be yielded to the garbage collector. • Compiler also helps specific garbage-collection algorithm.

  30. Contents • Basic Emulation • High-performance Emulation • Optimization Framework • Optimizations • Case study: The Jikes Research Virtual Machine

  31. The Jikes Research Virtual Machines • an open sourceJava compiler. • The original version was developed by IBM • It is much faster in compiling small projects than Sun's own compiler. • Unfortunately it is no longer actively being developed.

  32. The Jikes Research Virtual Machines • only to use compile (without interpretation step) • first, compiler translates bytecodes into native code • generated code simply emulates the Java stack, and then optimization is applied • dynamic compiler supports optimization depending on an estimate of cost-benefit • multithreaded implementation • preemptive thread scheduling by control bit

  33. The Jikes Research Virtual Machines • Adaptive Optimization System • runtime measurement system *gathers raw performance data by sampling at yield point • recompilation system • controller *responsible for coordinating the activities *determining optimization level in recomilation • cost-benefit function(to given method) • Cj(recompile time) , Tj(recompiled exec) , Ti(not recompiled) • Cj + Tj < Ti -> recompilation.

  34. The Jikes Research Virtual Machines Collected sample New code Executing Code RuntimeMeasurementSubsystem RecompilationSubsystem Instrumentation/Compilation Plan MethodSamples AOS Database(Profile Data) CompilationThread OptimizingCompiler Instrumented/Optimized Code Hot MethodOrganizer Controller Event queue Compilation queue

  35. The Jikes Research Virtual Machines Optimization levels • Level 0 • copy/constant propagation, branch optimization, etc. • trivial methods inlining • simple code relayout, register allocation (simple linear scan) • Level 1 • higher-level code restructuring : more aggressive inlining, code relayout • Level 2 • use a static single assignment intermediate form • SSA allows global optimizations • loop unrolling, eliminating loop-closing branches

  36. The Jikes Research Virtual Machines start-up

  37. The Jikes Research Virtual Machines steady-state

  38. The Jikes Research Virtual Machines

  39. The Jikes Research Virtual Machines

More Related