1.87k likes | 4.59k Vues
Kanat Bolazar April 29, 2010. Compiler Design 27. Runtime Environments: Activation Records, Heap Management. Run-time Environments. The compiler creates and manages a run-time environment in which it assumes the target program will be executed Issues for run-time environment
 
                
                E N D
Kanat Bolazar April 29, 2010 Compiler Design27. Runtime Environments:Activation Records,Heap Management
Run-time Environments • The compiler creates and manages a run-time environment in which it assumes the target program will be executed • Issues for run-time environment • Layout and allocation of storage locations for named program objects • Mechanism for the target program to access variables • Linkages between procedures • Mechanisms for passing parameters • Interfaces to the operating system for I/O and other programs
Storage Organization • Assumes a logical address space • Operating system will later map it to physical addresses, decide how to use cache memory, etc. • Memory typically divided into areas for • Program code • Other static data storage, including global constants and compiler generated data • Stack to support call/return policy for procedures • Heap to store data that can outlive a call to a procedure Code Static Heap Free Memory Stack
Run-time stack • Each time a procedure is called (or a block entered), space for local variables is pushed onto the stack. • When the procedure is terminated, the space is popped off the stack. • Procedure activations are nested in time • If procedure p calls procedure q, then even in cases of exceptions and errors, q will always terminate before p. • Activations of procedures during the running of a program can be represented by an activation tree • Each procedure activation has an activation record (aka frame) on the run-time stack. • The run-time stack consists of the activation records at any point in time during the running of the program for all procedures which have been called but not yet returned.
Procedure Example: quicksort int a[11]; void readArray( ) /* Reads 9 integers into a[1] through a[9] */ { int i; … } int partition ( int m, int n) { /* picks a separator v and partitions a[m .. n] so that a[m .. p-1] are less than v, a[p] = v, a[p+1 .. n] are equal to or greater than v. Returns p. */ } void quicksort (int m, int n) { int i; if ( n > m ) { i = partition(m,n); quicksort(m, i-1); quicksort(i+1, n); } } main ( ) { readArray ( ); a[0] = -9999; a[10] = 9999; quicksort (1, 9); }
Activation Records • Elements in the activation record: • temporary values that could not fit into registers • local variables of the procedure • saved machine status for point at which this procedure called. includes return address and contents of registers to be restored. • access link to activation record of previous block or procedure in lexical scope chain • control link pointing to the activation record of the caller • space for the return value of the function, if any • actual parameters (or they may be placed in registers, if possible) actual params return values control link access link saved machine state local data temporaries
Procedure Linkage • The standardized code to call a procedure, the calling sequence, and the return sequence, may be divided between the caller and the callee. • parameters and return value should be first in the new activation record so that the caller can easily compute the actual params and get the return value as an extension of its own activation record • also allows for procedures with a variable number of params • fixed-length items are placed in the middle of the activation record • saved machine state is standardized • local variables and temporaries are placed at the end, especially good for the case when the size is not known until run-time, such as with dynamic arrays • location of the top-of-stack pointer is commonly at the end of the fixed-length fields • fixed length data can be accessed by local offsets, known to the intermediate code generator, relative to the TOP-SP (negative offsets) • variable length fields are actually above the top of stack pointer, and their offsets calculated at run-time via positive offsets from the TOP-SP
Activation Record Example • Showing one way to divide responsibility between the caller and the callee. actual params and return val control link, access link, and saved machine state local data and temporaries actual params and return val control link, access link, and saved machine state local data and temporaries Caller A.R. Caller responsibility Callee A.R. TOP-SP Callee responsibility actual top of stack
Calling Sequence • A possible calling sequence matching the previous diagram: • caller evaluates the actual parameters • caller stores the return address and old value of TOP-SP in the callee’s AR. Caller then increments TOP-SP to the callee’s AR. (Caller knows the size of the caller’s local data and temps, and the callee’s parameters and status fields.). Caller jumps to callee code. • callee saves the register values and other status fields • callee initializes local data and begins execution
Return Sequence • Corresponding return sequence • callee places the return value next to the parameters • using information in the status fields, callee restores TOP-SP and other registers. Callee jumps to the return address that the caller placed in the status field • Although TOP-SP has been restored to the caller AR, the caller knows where the return value is, relative to the current TOP-SP
Variable-length data on the stack • It is possible to allocate objects, arrays or other structures of unknown size on the stack, as long as they are local to a procedure and become inaccessible when the procedure ends • For example, represent a dynamic array in the activation record by a pointer to an array located between the activation records actual params and return val control link, access link, and saved machine state pointer to array a … array a actual params and return val control link, access link, and saved machine state local data and temporaries proc p proc q
Access to Nonlocal Data on the Stack • Simplest case are languages without nested procedures or classes • C and many C-based languages • All variables are defined either within a single procedure (function) or outside of any procedure at the global level • Allocation of variables and access to variables • Global variables are allocated static storage. Locations are fixed at compile time. • All other variables must be local to the activation on the top of the stack. These variables are allocated when the procedure is called and accessed via the TOP-SP pointer. • Nested procedures will use a set of access links to access variables at other levels on the stack
Nested Procedure Example Outline in ML fun sort(inputFile, outputFile) = let val a = array (11, 0); fun readArray (inputFile) = . . . a . . . ; // body of readArray accesses a fun exchange ( i, j ) = . . . a . . . ; // so does exchange fun quicksort ( m, n ) = let val v = . . . ; fun partition ( y, z ) = . . . a . . . v . . . exchange . . . in . . . a . . . v . . . partition . . . quicksort . . . end in . . . a . . . readArray . . . quicksort . . . end; // the function sort accesses a and calls readArray and quicksort
Access Links • Access links allow implementation of the normal static scope rule • if procedure p is nested immediately within q in the source code, then the access link of an activation of p points to the more recent activation of q • Access links form a chain – one link for each lexical level – allowing access to all data and procedures accessible to the currently executing procedure • Look at example of access links from quicksort program in ML (previous slide)
Defining Access Links for Direct Procedure Calls • Procedure q calls procedure p explicitly: • case 1: procedure p is at a nesting depth 1 higher than q (can’t be more than 1 to follow scope rules). Then the access link is to the immediately preceding activation record (of p) (example quicksort calls partition) • case 2: recursive call, i.e. q is p itself. The access link in the new activation record for q is the same as the preceding activation record for q (example: quicksort called quicksort) • case 3: procedure p is at a lower nesting depth than q. Then procedure p must be immediately nested in some procedure r (defined in r) and there must be an activation record for r in the access chain of q. Follow the access links of q to find the activation record of r and set the access link of p to point to that activation record of r. (partition calls exchange, which is defined in sort)
Defining Access Links Parameter Procedures • Suppose that procedure p is passed to q as a parameter. When q calls its parameter, which may be named r, it is not actually known which procedure to call until run-time. • When a procedure is passed as a parameter, the caller must also pass along with the name of the procedure, the proper access link for that parameter. • When q calls the procedure parameter, it sets up that access link, thus enabling the procedure parameter to run in the environment of the caller procedure.
Displays • If the nesting depth of access links gets large, then access to nonlocal variables will be inefficient to follow the chain of access links. • Solution is to keep an auxiliary array – the display – in which each element is the highest activation record on the stack for the procedure at that nesting depth. • Whenever a new activation record is created at level l, it will save the value of display[l] to restore when it is done display stack d[1] d[2] d[3] sort q(1,9) saved d[2] q(1,3) saved d[2] p(1,3) saved d[3] e(1,3) saved d[2]
Dangling pointers in the stack • In a stack-based environment, typically used for parameters and local variables, variables local to a procedure are removed from the stack when the procedure exits • There should not be any pointers still in use to such variables • Example from C: int* dangle(void) { int x; return &x; } • An assignment “addr = dangle();” causes addr to point to a deallocated stack location • In C, this is considered to be a programming error, not one that the compiler checks for.
Organization of Memory for Arrays • C/C++ arrays and Java arrays are stored very differently in memory • A 2x3 C int array only needs space for 6 ints in the heap: • ar[0][0] , ar[0][1] , ar[0][2] , ar[1][0] , ar[1][1] , ar[1][2] • The same array can be accessed as an int[6] array. • Java is type-safe; in Java, you can't access an int[2][3] as if it is an int[6]. • Java also stores array length, and other Object information, including reference counts for garbage collection. • All arrays are objects in heap. A local "array" variable is just a pointer/reference. • This line creates three Array objects in Java: • int[][] ar = new int[2][3]; • 1. ar an array of element type int[] length = 2 • 2. ar[0] an array of element type int length = 3 • 3. ar[1] an array of element type int length = 3 • Note that the two rows (ar[0] and ar[1]) could have different lengths, and one could even be null while the other one holds some integers. • ar here is a local variable that holds the address of first array in the heap.
Heap Management • Store used for data that lives indefinitely, or until the program explicitly deletes it • Memory manager allocates and deallocates memory in the heap • serves as the interface between application programs, generated by the compiler, and the operating system • calls to free and delete can be generated by the compiler, or in some languages, explicitly by the programmer • Garbage Collection is an important subsystem of the memory manager that finds spaces within the heap that are no longer used and can be returned to free storage • the language Java uses the garbage collector as the deallocation operation
Memory Manager • The memory manager has one large chunk of memory from the operating system that it can manage for the application program • Allocation – when a program requests memory for a variable or an object (anything requiring space), the memory manager gives it the address of a chunk of contiguous heap memory • if there is no space big enough, can request the operating system for virtual space • if out of space, inform the program • Deallocation – returns deallocated space to the pool of free space • doesn’t reduce the size of space and return to the operating system
Properties of Memory Manager • Space efficiency – minimize the total heap size needed by the program • accomplished by minimizing fragmentation • Program efficiency – make good use of the memory subsystem to allow programs to run faster • locality of placement of objects • Low overhead – important for memory allocation and deallocation to be as efficient as possible as they are frequent operations in many programs
Memory Hierarchy • Registers are scarce – explicitly managed by the code generated by the compiler • Other memory levels are automatically handled by the operating system • chunks of memory copied from lower level to higher level as necessary typical sizes typical access times > 40GB 3-15 ms 512MB – 4GB 100-150ns 125KB – 4MB 40-60ns 16 – 64KB 5-10ns 32 words 1ns viritual memory (disk) physical memory 2nd-level cache 1st-level cache Registers
Taking Advantage of Locality • Programs often exhibit both • temporal locality – accessed memory locations are likely to be accessed again soon • spatial locality – memory close to locations that have been accessed are also likely to be accessed • Compiler can place basic blocks (sequential instructions) on the same cache page, or even the same cache line • Instructions belonging to the same loop or function can also be placed together
Placing objects in the heap • As heap memory is allocated and deallocated, it is broken into free spaces, the holes, and the used spaces. • on allocation, a hole must be split into a free and used part • Best Fit placement is deemed the best strategy – uses the smallest available hole that is large enough • this strategy saves larger holes for later, possibly larger, requests • Contrasted to the First Fit strategy that uses the first hole on the list that is large enough • has a shorter allocation time, but is a worse overall strategy • Some managers use the “bin” approach to keeping track of free space • for many standard sizes, keep a list of free spaces of that size • keep more bins for smaller sizes, as they are more common • makes the best fit strategy more efficient
Coalescing Free Space • When an object is freed, it will reduce fragmentation if we can combine the deallocated space with any adjacent free spaces • Data structures to support coalescing: • boundary tags – at each end of the chunk, keep a bit indicating whether the chunk is free and keep its size • doubly linked, embedded free list – pointers to next free chunks are kept at each end next to the boundary tags • When B is deallocated, it can check if A and C are free and is so, coalesce blocks and adjust the links of the free list chunk A chunk B chunk C 0:200: : : :200:0 0:100: : : :100:0 0:80: : : :80:0 pointers doubly link free chunks, not in physical order
Problems with Manual Deallocation • It is a notoriously difficult tasks for programmers, or compilers, to correctly decide when an object will never be referenced again • If you use caution in deallocation, then you may get chunks of memory that are marked in use, but are never used again • memory leaks • If you deallocate incorrectly, so that at a later time, a reference is used to an object that was deallocated, then an error occurs • dangling pointers
Garbage Collection heap • In many languages, program variables have pointers to objects in the heap, e.g. through the use of new • These objects can have pointers to other objects • Everything reachable through a program variable is in use, and everything else in the heap is garbage • in an assignment “x = y”, an object formerly pointed to by x is now garbage if x were the last pointer to it • A requirement to be a garbage collectible language is to be type safe: • we can tell if a data element or component of a data element is a pointer to a chunk of memory • true for Java, ML • not true for C, C++ p q r
Performance Metrics • Overall execution time – garbage collection touches a lot of data and it is important that it not substantially increase the total run time of an application • Space Usage – garbage collector should not increase fragmentation • Pause time – garbage collectors are notorious for causing the application to pause suddenly for a very long time, as garbage collection kicks in • as a special case, real-time applications must be assured that they can achieve certain computations within a time limit • Program locality – garbage collector also controls the placement of data, particularly ones which relocate data
Reachability • The data that can be accessed directly by the program, without any deferencing, is the root set, and its elements are all reachable • the compiler may have placed elements of the root set in registers or on the stack • Any object with a reference stored in the field members or array elements of any reachable object is also a reachable object • The program (sometimes called the mutator) can change the reachable set • object allocation by the memory manager • parameter passing and return values – objects pointed to by actual parameters and by return results remain reachable • reference assignments “x = y” • procedure returns – if the only reference to a reachable object is popped off the stack, then that object becomes unreachable
Reference Counting Garbage Collectors • Keep a count of the number of references to any object and when the count drops to 0, the object can be returned to free • Every object keeps a field for the reference count, which is maintained: • object allocation – the count of a new object is 1 • parameter passing – the reference count of an actual parameter object is increased by 1 • reference assignments “x = y”: reference count of object referred to by y goes up by 1, reference count of old object pointed to by x is decreased by 1 • procedure returns – objects pointed to by local variables have counts decremented • transitive loss of reachability – whenever the count of an object goes to 0, we must decrement by 1 each of the objects pointed to by a reference within the object • Simple, but imperfect: cannot do circular objects • Overhead is very high, but is incrementally spread over execution
Java Array Example • Recall that this line creates three Array objects: • int[][] ar = new int[2][3]; // creates ar, ar[0] and ar[1] arrays • Local variable ar stores address of the int[][] object • Its elements store addresses of the two int[] objects, one per row. • int[] row1 = ar[0]; // the first int[] object now has ref count = 2 • ar[0] = null; // the same object now has ref count = 1, from row1 • // the first int[] object is not reachable from ar anymore • ar = null; // first int[][] object now has ref count = 0 • transitive loss of reachability: ar is not reachable anymore • anything reachable from it should have refCount decremented: • ar[0] == null -- nothing to do • ar[1] points to the second int[] object: ref count decremented to 0 • Now: row1 points to first int[] object; other arrays are not reachable • At all times, ref count should hold number of reachable objects pointing to this object.
Basic Mark and Sweep Garbage Collection • Trace based algorithms recycle memory as follows: • program runs and make allocation requests • garbage collector discovers reachability by tracing • unreachable objects are reclaimed for storage • The Mark and Sweep algorithms use four states for chunks of memory • Free – ready to be allocated, during any time • Unreached – reachability has not been established by gc, when a chunk is allocated, it is set to be “unreached” • Unscanned – chunks that are known to be reachable are either scanned or unscanned – an unscanned object has itself been reached, but its points have not been scanned • Scanned – the object is reachable and all its pointers have been followed
Mark and Sweep Algorithm • Stop the program and start the garbage collector • Marking phase: set Free list to be empty set the reached bit to 1 and add root set to the list Unscanned loop over unscanned list: remove object o from unscanned list for each pointer p in object: if p is unreached (bit is 0) set the bit to 1 and put p in unscanned list • Sweeping phase for each chunk of memory o in the heap if o is unreached, add o to the Free list otherwise set the reached bit to 0
Baker’s Mark and Sweep algorithm • The basic algorithm is expensive because it examines every chunk in the heap • Baker’s optimization keeps a list of allocated objects • This list is used as the Unreached list in the algorithm Scanned = empty set Unscanned = root set loop over Unscanned set move object o from Unscanned to Scanned for each pointer p in o: if p is Unreached, move p from Unreached to Unscanned Free = Free + Unreached Unreached = Unscanned
Copying Collectors (Relocating) • While identifying the Free set, the garbage collector can relocate all reachable objects into one end of the heap • while analyzing every reference, the gc can update them to point to a new location, and also update the root set • Mark and compact moves objects to one end of the heap after the marking phase • Copying collector moves the objects from one region of memory to another as it marks • extra space is reserved for relocation • separates the tasks of finding free space and updating the new memory locations to the objects • gc copies objects as it traces out the reachable set
Short-Pause Garbage Collection • Incremental garbage collection – interleaves garbage collection with the mutator • incremental gc is conservative during reachability tracing and only traces out objects which were allocated at the time it begins • not all garbage is found during the sweep (floating garbage), but will be collected the next time • Partial Collection – the garbage collector divides the work by dividing the space into subsets • Usually between 80-98% of newly allocated objects “die young”, i.e. die within a few million instructions, and it is cost effective to garbage collect these objects often • Generational garbage collection separates the heap into the “young” and the “mature” areas. If an object survives some number of “young” collections, it is promoted to the “mature” area. • The “train” algorithm is used to collect multi-generation areas
Parallel and Concurrent Garbage Collection • A garbage collector is parallel if it uses multiple threads, and it is concurrent if it runs in parallel with the mutator • based on Dijkstra’s “on-the-fly” garbage collection, coloring the reachable nodes white, black or gray • This version partially overlaps gc with mutation, and the mutation helps the gc: • Find the root set (with the mutator stopped) • Interleave the tracing of reachable objects with the mutator(s) • whenever the mutator writes a reference that points from a Scanned object to an Unreached object, we remember it (called the dirty objects) • Stop the mutator(s) to rescan all the dirty objects, which will be quick because most of the tracing has been done already
Cost of Basic Garbage Collection • Mark phase: Depth-first search takes time proportional to the number of nodes that it marks, i.e. the number of reachable chunks • Sweep phase: time proportional to the size of the heap • Amortize the collection: divide the time spent collecting by the amount of garbage reclaimed: • R chunks of reachable data • H is the heap size • c1 is the time for each marked node and c2 the time to sweep (c1)R + (c2)H / H – R • If R is close to H, this cost gets high, and the collector could increase H by asking the operating system for more memory
References • Dr. Nancy McCracken, Syracuse University. • Aho, Lam, Sethi, and Ullman, Compilers: Principles, Techniques, and Tools. Addison-Wesley, 2006. (The purple dragon book) • Keith Cooper and Linda Torczon, Engineering a Compiler, Elsevier, 2004.