Optimizing dynamic dispatch with fine-grained state tracking
This paper discusses the optimization of dynamic dispatch mechanisms in languages like Ruby, Python, and JavaScript through fine-grained state tracking. It focuses on the challenges of high overhead in dynamic mixin operations and proposes solutions to reduce invalidation granularity and enhance caching strategies. By improving performance in applications heavily reliant on dynamic mixins, the authors aim to enable effective dynamic monitoring and patching while effectively managing the complexities of mutable class hierarchies.
Optimizing dynamic dispatch with fine-grained state tracking
E N D
Presentation Transcript
Optimizing dynamic dispatch with fine-grained state tracking Salikh Zakirov, Shigeru Chiba and Etsuya Shibayama Tokyo Institute of Technology Dept. of Mathematical and Computing Sciences 2010-10-18
Mixin • code composition technique BaseServer BaseServer Server Server Additional Security Additional Security Mixin use declaration Mixin semantics
Dynamic mixin • Temporary change in class hierarchy • Available in Ruby, Python, JavaScript Server Server Additional Security BaseServer BaseServer
Dynamic mixin (2) • Powerful technique of dynamic languages • Enables • dynamic patching • dynamic monitoring • Can be used to implement • Aspect-oriented programming • Context-oriented programming • Widely used in Ruby, Python • e.g. Object-Relational Mapping
Dynamic mixin in Ruby • Ruby has dynamic mixin • but only “install”, no “remove” operation • “remove” can be implemented easily • 23 lines
Target application • Mixin is installed and removed frequently • Application server with dynamic features class BaseServer def process() … end end class Server < BaseServer def process() ifrequest.isSensitive() Server.class_eval { include AdditionalSecurity } end super # delegate to superclass … # remove mixin end end module AdditionalSecurity def process() … # security check super # delegate to superclass end end
Overhead is high Reasons • Invalidation granularity • clearing whole method cache • invalidating all inline caches • next calls require full method lookup • Inline caching saves just 1 target • which changes with mixin operations • even though mixin operations are mostly repeated
Our research problem • Improve performance of application which frequently uses dynamic mixin • Make invalidation granularity smaller • Make dynamic dispatch target cacheable in presence of dynamic mixin operations
Proposal • Reduce granularity of inline cache invalidation • Fine-grained state tracking • Cache multiple dispatch targets • Polymorphic inline caching • Enable cache reuse on repeated mixin installation and removal • Alternate caching
Basics: Inline caching consider a call site method implementation Dynamic dispatch implementation (executable code) Expensive! But the result is mostly the same method = lookup(cat, ”speak”) method(cat) Animal Cat speak() { … } subclass Inline caching cat.speak() cat.speak() Cat class if (cat has type ic.class) { ic.method(cat) } else { ic.method = lookup(cat, ”speak”) ic.class = cat.class ic.method(cat) } method speak ic instance cat
Inline caching: problem • What if the method has been overridden? Animal Cat speak() { … } Training speak(){ … } Inline caching cat.speak() class Cat if (cat has type ic.class) { ic.method(cat) } else { ic.method = lookup(cat, ”speak”) ic.class = cat.class ic.method(cat) } method speak ic instance cat
Inline caching: invalidation if (cat has type ic.class && state == ic.state) { ic.method(cat) } else { ic.method = lookup(cat, ”speak”) ic.class = cat.class; ic.state = state ic.method(cat) } 1 2 Global state Animal Cat speak() { … } Training speak(){ … } cat.speak() Cat class speak method speak ic • Single global state object • too coarse invalidation granularity instance state 1 2 cat
Fine-grained state tracking • Many state objects • small invalidation extent • share as much as possible • One state object for each family of methods called from the same call site • State objects associated with lookup path • links updated during method lookups • Invariant • Any change that may affect method dispatch must also trigger change of associated state object
State object allocation if (cat has type ic.class &&ic.pstate.state == ic.state ) { ic.method(cat) } else { ic.method, ic.pstate = lookup(cat, ”speak”, ic.pstate) ic.class = cat.class; ic.state = state method(cat) } inline caching code Animal cat.speak() Cat class speak() { *1* } speak*1* method ic 1 1 No implemmentation here 1 state Cat pstate speak
Mixin installation if (cat has type ic.class &&ic.pstate.state == ic.state ) { ic.method(cat) } else { ic.method, ic.pstate = lookup(cat, ”speak”, ic.pstate) ic.class = cat.class; ic.state = state method(cat) } inline caching code Training speak() { *2* } Animal cat.speak() Cat class speak() { *1* } speak *2* method speak*1* ic 1 2 2 1 1 2 state Cat pstate speak
Mixin removal if (cat has type ic.class &&ic.pstate.state == ic.state ) { ic.method(cat) } else { ic.method, ic.pstate = lookup(cat, ”speak”, ic.pstate) ic.class = cat.class; ic.state = state method(cat) } inline caching code Training speak() { *2* } Animal cat.speak() Cat class speak() { *1* } method speak*1* speak *2* ic 3 3 2 state 2 3 2 Cat pstate speak
Alternate caching alternate cache • Detect repetition • Conflicts detected by state check super Training Animal speak 4 3 … Training speak() { *2* } Animal A cat.speak() Cat class speak() { *1* } speak *2* speak*1* method ic 3 4 4 3 state Cat pstate speak Inline cache contents oscillates
Polymorphic caching alternate cache • Use multiple entries in inline cache super Training Animal speak 4 3 … Training speak() { *2* } Animal cat.speak() class Cat Cat speak() { *1* } method *1* *2* ic 3 4 state 3 4 Cat pstate speak
State object merge animal executable code instance animal.speak() Training cat.speak() while(true) { S speak() { *2* } Animal speak() { *1* } remove mixin } Overridden by Q Q instance Cat cat speak One-time invalidation
Overheads of proposed scheme • Increased memory use • 1 state object per polymorphic method family • additional methodentries • alternate cache • polymorphic inline cache entries • Some operations become slower • Lookup needs to track and update state objects • Explicit state object checks on method dispatch
Generalizations (beyond Ruby) • Delegation object model • track arbitrary delegation pointer change • Thread-local delegation • allow for thread-local modification of delegation pointer • by having thread-local state object values • Details in the article…
Evaluation • Implementation based on Ruby 1.9.2 • Hardware • Intel Core i7 860 2.8 GHz
Evaluation: microbenchmarks • Single method call overhead • Inline cache hit • state checks 1% • polymorphic inline caching 49% overhead • Full lookup • 2x slowdown
Dynamic mixin-heavy microbenchmark (smaller is better)
Evaluation: application • Application server with dynamic mixin on each request (smaller is better)
Evaluation • Fine-grained state tracking considerably reduces overhead • Alternate caching brings only small improvement • Number of call sites affected by mixin is low • Lookup cost / inline cache hit cost is low • about 1.6x on Ruby
Related work • Dependency tracking in Self • focused on reducing recompilation, rather than reducing method lookups • Inline caching for Objective-C • state object associated with method, no dynamic mixin support
Conclusion • We proposed combination of techniques • Fine-grained state tracking • Alternate caching • Polymorphic inline caching • To increase efficiency of inline caching • with frequent dynamic mixin installation and removal
Method caching in Ruby • Global hashtable • indexed by method name and class • On method lookup • gives answer in 1 hash lookup • On miss • answer obtained by recursive lookup • result stored in method cache • On method redefinition or mixin operation • method cache cleared completely