Deferred segment-loading
E N D
Presentation Transcript
Deferred segment-loading An exercise on implementing the concept of ‘load-on-demand’ for the program-segments in an ELF executable file
Background • Recall our previous in-class exercise: we wrote a demo-program that could execute a Linux application (named ‘hello’) • A working version of that demo is now on our class website (named ‘tryexec.s’) • That demo simulated ‘loading’ of the .text and .data program-segments, by copying the ‘hello’ file’s memory-image into two distinct locations in extended memory
Memory-to-memory copying • We used the Pentium’s ‘movsb’ instruction to perform those two copying operations • The number of bytes we copied was equal to the size of five disk-sectors (5 * 512) • To ‘load’ the ‘.text’ program-segment, we copied from 0x00011800 to 0x08048000 • To ‘load’ the ‘.data’ program-segment, we copied from 0x00011800 to 0x08049000
Copying to extended memory • The ‘movsb’ instruction is an example of a ‘complex’ instruction – it requires setup of several CPU registers prior to its execution • Setup required for ‘movsb’ involves: • Setup DS : ESI to address the source buffer • Setup ES : EDI to address the dest’n buffer • Setup ECX with the number of bytes to copy • Clear the DF-bit in the EFLAGS register • Then ‘rep movsb’ perform the string-copying • Note that 32-bit addressing is required here!
Example assembly code ; Source-statements to ‘load’ the ‘.text’ program-segment: USE32 ; assemble for 32-bit code-seg mov ax, #sel_fs ; selector for 4GB data-segment mov ds, ax ; with base-address=0x000000 mov es, ax ; is used for both DS and ES mov esi, #0x00011800 ; offset-address for ‘source’ mov edi, #0x08048000 ; offset-address for ‘dest’n’ mov ecx, #2560 ; number of bytes to be copied cld ; use ‘forward’ string-copying rep ; ‘repeat-prefix’ is inserted movsb ; before the ‘movsb’ opcode
Segments were ‘preloaded’ • In our ‘tryexec.s’ demo, ‘.text’ and ‘.data’ segments were initialized in advance of transferring control to the ‘hello’ program • That technique is called ‘preloading’ • But the Pentium supports an alternative approach to program-loading (it’s called ‘load-on-demand’) • Segments remain ‘uninitialized’ until they are actually accessed by the application
Segment-Not-Present • The ‘Segment-Not-Present’ exception can be utilized to implement ‘demand-loading’ • Segment-descriptors are initially marked as ‘Not Present’ (i.e., the P-bit is zero) • When any instruction attempts to access these memory-segments (by moving the segment-selector into a segment-register), the CPU will generate an interrupt (int-11)
The Fault-Handler • The interrupt service routine for INT-0x0B (Segment-Not-Present Fault) can perform the initialization of the specified memory region (i.e., the ‘loading’ operation), mark the segment-descriptor as ‘Present’ and then ‘retry’ the instrtuction that triggered the fault (by executing an ‘iret’ or ‘iretd’)
Error-Code Format 31 15 3 2 1 0 reserved T I I D T E X T table-index Legend: EXT = An external event caused the exception (1=yes, 0=no) IDT = table-index refers to Interrupt Descriptor Table (1=yes, 0=no) TI = The Table Indicator flag, used when IDT=0 (1=GDT, 0=LDT) This same error-code format is used with exceptions 0x0B, 0x0C, and 0x0D
Benefits of deferred loading? • With a small-size program (like ‘hello’) we might not see much benefit from using the ‘load-on-demand’ mechanism, since both of the program-segments sooner-or-later would have to be ‘loaded’ into memory • The only apparent benefit is that copying can be done by ONE program-fragment (i.e., within the fault-handler) instead of by two fragments in the ‘pre-load’ procedure
Table-driven ‘handler’ • Balanced against the fewer instructions required with ‘load-on-demand’ is the need to provide a table-driven interrupt-handler that can ‘load’ whichever ‘not present’ program-segments happen to get accessed • A very simple implementation for such a handler could use a table like this one: memmap: ; from to count type .LONG 0x11800, 0x08048000, 2560, 0xFA .LONG 0x11800, 0x08049000, 2560, 0xF2
Big/Complex programs • With complex applications that use many more program-segments, ‘demand-loading’ could potentially offer some runtime efficiencies • For example, with interactive programs that can display various error-messages: If error-handling routines are in separate program-segments, then those segments would not need to be loaded unless -- and until -- the error-condition actually occurs (maybe never)
In-class exercise • To get practical ‘hands on’ experience with implementing the demand-loading concept we propose the following exercise • Modify the ‘tryexec.s’ demo (see website) by deferring the memory-to-memory copy operations until the program-segments are actually referenced by the ‘hello’ program • Then perform the copying within an ISR
Some exercise details • Copy the ‘tryexec.s’ demo-program to a new file, named ‘ondemand.s’ • In the ‘load_and_exec_demo’ procedure, comment out the two memory-to-memory copy operations, and the mark the LDT segment-descriptors for .text and .data as ‘NOT PRESENT’ segments (i.e., P=0) • Create a ‘memmap’ table that describes the copying operations that will be needed
Create a fault-handler • Add an interrupt-gate for exception 0x0B and a fault-handler that will perform the copy-operation for a ‘not-present’ segment • Remember that the CPU will automatically push an error-code onto the ring0 stack if a ‘segment-not-present exception occurs • Don’t forget to discard that error-code as the final step before exiting from the ISR: add esp, #4 ; discard error-code iretd ; retry the instruction
Parallel table-entries memmap theLDT 0x00CF7A000000FFFF From 0x11800 To 0x8048000 Size 2560 Type 0xFA 0x00 0x08 0x10 0x00 0x10 0x20 0x00CF72000000FFFF From 0x11800 To 0x8049000 Size 2560 Type 0xF2 0x00CF72000000FFFF From 0 To 0 Size 0 Type 0xF2 4-words 4-longwords