1 / 78

CH3 CPUs

summary. Input and output mechanisms.Supervisor mode, exceptions, and traps.Memory management and address translation.Caches.How architecture affects program performance.How architecture affects program power consumption.. 3.1 Introduction . outline. aspects of CPUs that do not directly relate

ethelda
Télécharger la présentation

CH3 CPUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. CH3 CPUs

    2. summary Input and output mechanisms. Supervisor mode, exceptions, and traps. Memory management and address translation. Caches. How architecture affects program performance. How architecture affects program power consumption.

    3. 3.1 Introduction

    4. outline aspects of CPUs that do not directly relate to their instruction sets interrupts and memory management performance and power consumption

    5. outline 3.2: study input and output mechanisms such as interrupts 3.3: several mechanisms designed to handle internal events 3.4: co-processors that provide optional support for parts of the instruction set 3.5: memory systems, memory management and caches

    6. outline 3.6 looks at performance 3.7 considers power consumption 3.8 data compressor example

    7. 3.2 Programming Input and Output

    8. basics of I/O programming basic characteristics of I/O devices

    9. 3.2.1 Input and Output Devices

    10. Structure of a typical I/O device Input and output devices usually have some analog or nonelectronic component relationship between I/O device and CPU Registers: interface between CPU and device's internals CPU talks to the device by reading and writing the registers

    11. Structure of a typical I/O device

    12. Structure of a typical I/O device Data registers: hold data values, such as the data read or written by a disk. Status registers: provide information about the device's operation

    13. Ex1. 8251 UART 8251 UART (Universal Asynchronous Receiver/Transmitter): the original device used for serial communications Data are transmitted as streams of characters Every character starts with a start bit (a 0) and a stop bit (a 1)

    14. Ex1. 8251 UART baud rate: data bits are sent as high and low voltages at a uniform rate CPU must set the UART's mode registers - baud rate: - data bits: 5-8bits - parity bit: even, odd,none - stop bit: 1, 1.5, or 2 bits

    15. Ex1. 8251 UART 8-bit register: buffers characters between the UART and the CPU bus. Transmitter Ready output: transmitter is ready to accept a data character Transmitter Empty signal: goes high when the UART has no characters to send. Receiver Ready: goes high when UART has a character ready to be read by CPU.

    16. 3.2.2 Input and Output Primitives

    17. programming support for input and output I/O instructions - special instructions (Intel x86) for input and output memory-mapped I/O provides addresses for the registers in each I/O device read and write instructions communicate with the devices

    18. Ex1. Memory-Mapped I/O on ARM use the EQU pseudo-op to define a symbolic name for the memory location of our I/O device DEV1 EQU 0x1000

    19. Ex1. Memory-Mapped I/O on ARM read and write the device register: LDR r1,#DEV1 ; set up device address LDR r0,[r1] ; read DEV1 LDR r0,#8 ; set up value to write STR r0,[r1] ; write 8 to device

    20. Ex2. Memory-Mapped I/O on SHARC A memory-mapped I/O device must be assigned within the external memory space, which starts at 0x400000. use a DM access to read and write the off-chip device register: I0 = 0x400000 M0 = 0 R1 = DM(i0,M0)

    21. write I/O devices in C read and write arbitrary memory locations are peek and poke The peek function written in C as: int peek(char *location) { return *location; } #define DEV1 0x1000 dev_status = peek(DEVl);

    22. write I/O devices in C poke function can be implemented as: void poke(char *location, char newval) { (*location) = newval; } write 8 to the status register poke(DEV1,8);

    23. 3.2.3 Busy-Wait I/O

    24. busy-wait I/O Devices are slower than the CPU and require many cycles to complete an operation. CPU must wait for one operation to complete before starting the next one polling: Asking an I/O device whether it is finished by reading its status register

    25. Ex3-3 Busy-Wait I/O Programming write a sequence of characters to an output device two registers: one for the character to be written and a status register status register's value is 1 when the device is busy writing and 0 when the write transaction has completed

    26. Ex3-3 Busy-Wait I/O Programming register addresses #define 0UT_CHAR 0x1000 /* output device character register */ #define OUT_STATUS 0x1001 /* output device status register */

    27. Ex3-3 Busy-Wait I/O Programming sequence of characters is stored in a standard C string, which is terminated by a null (0) character char *mystring = "Hello, world." /* string to write */ char *current_char; /* pointer to current position in string */

    28. Ex3-3 Busy-Wait I/O Programming current_char = mystring; /* point to head of string */ while (*current_char != '\0') { /* until null character */ poke(OUT_CHAR,*current_char); /* send character to device */ while (peek(OUT_STATUS) != 0); /* keep checking status */ current_char++; /* update character pointer */ }

    29. Ex3-4 Copy Characters from Input to Output Using Busy-Wait I/O repeatedly read a character from the input device and write it to the output device define addresses for the device registers: #define IN_DATA 0x1000 #define IN_STATUS 0x1001 #define 0UT_DATA 0x1100 #define OUT_STATUS 0x1101

    30. Ex3-4 Copy Characters from Input to Output Using Busy-Wait I/O The input device: sets status register to 1: when a new character has been read; set the status register 0: after character has been read When writing: set the output status register to 1: to start writing and wait for it to return to 0

    31. while (TRUE) { /* perform operation forever */ /* read a character into achar */ while (peek(IN_STATUS) == 0); /* wait until ready */ achar = (char)peek(IN_DATA); /* read the character */ /* write achar */ poke(OUT_DATA,achar); poke(OUT_STATUS,l); /* turn on device */ while (peek(OUT_STATUS) != 0); /* wait until done */ }

    32. 3.2.4 Interrupts Busy-wait I/O is inefficient: the CPU does nothing but test the device status CPU could work in parallel with the I/O transaction: - computation - control of other I/O devices.

    33. interrupt interrupt mechanism allows devices to signal CPU and to force execution of a particular piece of code At interrupt, the program counter point to an interrupt handler routine (device driver): writing the next data, reading data CPU can return to the program that was interrupted

    34. interrupt

    35. interrupt interface between the CPU and I/O device includes the following signals: I/O device asserts the interrupt request signal when it wants service CPU asserts the interrupt acknowledge signal when it is ready to handle the I/O device's request

    36. interrupt The interrupt handler operates much like a subroutine, except that it is not called by the executing program The program that runs when no interrupt is being handled is often called the foreground program when the interrupt handler finishes, it returns to the foreground program

    37. ex3-5 Copy Characters from Input to Output with Basic Interrupts repeatedly read a character from an input device and write it to an output device use a global variable achar for the input handler to pass the character to the foreground program use a global Boolean variable, gotchar, to signal when a new character has been received

    38. void input_handler() { /* get a character and put in global */ achar = peek(IN_DATA); /* get character */ gotchar = TRUE; /* signal to main program */ poke(IN_STATUS,0); /* reset status to initiate next transfer */ } void output_handler() { /* react to character being sent */ /* don't have to do anything */ }

    39. ex3-5 Copy Characters from Input to Output with Basic Interrupts main(){ while (TRUE) { /* read then write forever */ if (gotchar) { /* write a character */ poke(OUT_DATA,achar); /* put character in device */ poke(OUT_STATUS,l); /* set status to initiate write */ gotchar = FALSE; /* reset flag */ }}}

    40. Ex3-6 Copy Characters from Input to Output with Interrupt and Buffer performs reads and writes independently. The read and write routines communicate through the following global variables:. string io_buf: hold a queue of characters that have been read but not yet written. integers buf_start and buf_end: point to the first and last characters read. integer error: set to 0 whenever io_buf overflows

    41. Ex3-6 Copy Characters from Input to Output with Interrupt and Buffer input and output devices allow to run at different rates queue io_buf acts as a wraparound buffer add characters to the tail take characters from the head

    42. Ex3-6 Copy Characters from Input to Output with Interrupt and Buffer When head and tail are equal, the queue is empty

    43. Ex3-6 Copy Characters from Input to Output with Interrupt and Buffer When the buffer is full, we leave one character in the buffer unused

    45. Debug interrupt interrupt can occur at any time means that the same bug can manifest itself in different ways when the interrupt handler interrupts different segments of the foreground program

    46. Ex3-7 Debugging Interrupt Code Y = Ax+b: for (i = 0; i < M; i++) { y[i] = b[i]; for (j = 0; j < N; j++) y[i] = y[i] + A[i,j]*x[j]; }

    47. Ex3-7 Debugging Interrupt Code Assume read_handler has a bug that causes it to change the value of j Any CPU register that is written by the interrupt handler must be saved before it is modified and restored before the handler exits

    48. implement The CPU implements interrupts by checking the interrupt request line at the beginning of execution of every instruction If an interrupt request asserted, CPU does not fetch curent instruction The starting address of the interrupt handler is usually given as a pointer

    49. interrupts and subroutines interrupt handler must return to the foreground program without disturbing the foreground program's operation Most CPUs use the same basic mechanism for remembering the foreground program's PC as is used for subroutines interrupt mechanism puts the return address on a stack

    50. Priorities and Vectors interrupts can be generalized to handle multiple devices and to provide more flexible definitions - interrupt priorities: CPU to recognize some interrupts as more important than others - interrupt vectors: allow the interrupting device to specify its handler

    51. Prioritized interrupts Prioritized interrupts - allow multiple devices to be connected - allow the CPU to ignore less important interrupt requests the lower-numbered interrupt lines are given higher priority

    52. Prioritized device interrupts most CPUs provide the priority number in binary form

    53. change the priority How do we change the priority of a device? Simply by connecting it to a different interrupt request line This requires hardware modification programmable switches, or make the change easy

    54. Nested interrupt Masking: CPU stores the priority level of interrupt in an internal register When a subsequent interrupt occur, - checked against the priority register - new request only if higher priority When the interrupt handler exits, the priority register must be reset.

    55. power-down interrupts The highest-priority interrupt is normally called the nonmaskable interrupt or NMI. The NMI cannot be turned off reserved for interrupts caused by power failures detect a dangerously low power supply NMI interrupt handler save critical state in nonvolatile memory, turn off I/O devices

    56. Most CPUs provide a relatively small number of interrupt priority levels more priority levels can be added with external logic combine polling with prioritized interrupts to efficiently handle the device

    57. Using polling to share an interrupt over several devices

    58. Ex3-8 I/O with Prioritized Interrupts A has priority 1 B priority 2 C priority 3.

    59. Interrupt vectors define the interrupt handler that should service a request from a device hardware structure to support interrupt vectors

    60. Interrupt vectors additional interrupt vector lines run from the devices to the CPU After request is acknowledged, device sends its interrupt vector to CPU. CPU uses vector number as an index in a table stored in memory gives the address of the handler

    61. Activity on the bus during a vectored interrupt

    62. Interrupt vectors First, the device stores its vector number. a device can be given a new handler without modifying the system software. there is no fixed relationship between vector numbers and interrupt handlers

    63. implement Most modern CPUs implement both prioritized and vectored interrupts. Priorities determine which device is serviced first vectors determine what routine is used to service the interrupt

    64. Interrupt Overhead complete interrupt handling process Once a device requests an interrupt, some steps are performed by the CPU, some by the device, and others by software. The basic procedure is described below. 1. CPU: checks interrupts at the beginning of an instruction, answers the highest-priority interrupt

    65. Interrupt Overhead 2. Device: device receives acknowledgment and sends the CPU its interrupt vector. 3. CPU: CPU looks up the device handler address in the interrupt vector table, save current PC, internal CPU state, general-purpose registers.

    66. Interrupt Overhead 4. Software: device driver save additional CPU state, performs required operations, restores saved state, executes interrupt return instruction. 5. CPU: interrupt return instruction restores the PC and other automatically saved states, return to the interrupted.

    67. performance penalty interrupt causes a change in the program counter, it incurs a branch penalty. if interrupt automatically stores CPU registers, requires extra cycles interrupt requires extra cycles to acknowledge the interrupt and obtain the vector from the device.

    68. performance penalty interrupt handler will save and restore CPU registers that were not automatically saved by the interrupt. interrupt return instruction incurs a branch penalty as well as the time required to restore the automatically saved state.

    69. performance penalty time required for the hardware to respond to the interrupt, obtain the vector, cannot be changed by the programmer. programming result in a small number of registers used by an interrupt handler coding interrupt handler in assembly language rather than a high-level language

    70. Interrupts in ARM types of interrupts: fast interrupt requests (FIQs) and interrupt requests (IRQs). FIQ takes priority over an IRQ. interrupt table is kept in the bottom memory addresses, starting at location 0. The entries in the table contain subroutine calls to the appropriate handler.

    71. Interrupts in ARM responding to an interrupt: saves the appropriate value of the PC to be used to return, copies the CPSR into an SPSR (saved program status register), forces bits in the CPSR to note the interrupt, and forces the PC to the appropriate interrupt vector.

    72. Interrupts in ARM leaving the interrupt handler : restore the proper PC value, restore the CPSR from the SPSR, and clear interrupt disable flags.

    73. Interrupts in ARM worst-case latency to respond: 2 cycles to synchronize external request, up to 20 cycles to complete current instruction, 3 cycles for data abort 2 cycles to enter interrupt handling state. adds up to 4-27 clock cycles

    74. Interrupts in SHARC supports three prioritized, vectored, maskable interrupts, each of which calls an interrupt handler subroutine

    75. When processing an interrupt outputs interrupt vector address; pushes current PC onto the PC stack; may push the ASTAT and MODE1 registers onto the status stack; sets appropriate bit in the interrupt latch register changes interrupt mask pointer to show the current interrupt nesting state.

    76. return from an interrupt pops the return address of the PC stack and saves it to the PC, pops the status stack if appropriate, and clears the appropriate bits in the interrupt latch and mask registers.

    77. Interrupts in SHARC The interrupt vector table may be kept either in internal or external memory. vector table provides interrupt vectors for a number of actions, including: reset, the three external interrupts, internal DMA channels, timers, floating-point errors, user software interrupts.

    78. Interrupts in SHARC For most instructions, the latency for an external interrupt is four cycles. Some instructions require multiple cycles to finish and will delay interrupt handling; waiting for external memory may also delay handling.

More Related