1 / 51

Instruction Set

Instruction Set. Ali Azarpeyvand Advanced Computer Architecture. 1. ExTime old ExTime new. Speedup overall =. =. (1 - Fraction enhanced ) + Fraction enhanced. Speedup enhanced. Review, #1. Amdahl ’ s Law: CPI Law: Execution time is the REAL measure of computer performance!

vilina
Télécharger la présentation

Instruction Set

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instruction Set Ali Azarpeyvand Advanced Computer Architecture

  2. 1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced Review, #1 • Amdahl’s Law: • CPI Law: • Execution time is the REAL measure of computer performance! • Good products created when have: • Good benchmarks • Good ways to summarize performance • Die Cost goes roughly with die area4 CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle

  3. Computing Targets (review) • Desktop computing • Servers • Embedded applications

  4. Outline • Taxonomy of instruction set alternatives • Some instruction set measurements • Instruction set architecture of processors not aimed at desktops or servers: digital signal processors (DSPs) and media processors. • A sample RISC architecture

  5. Instruction Set Architecture (ISA) software instruction set hardware

  6. Arithmetic Logical Shift Load (from MM) Store (to MM) Move (reg-reg) Move (MM-MM) If I/O is not memory- mapped (e.g., MIPS: 4 bytes) 1) Length of operands 2) Shift/rotate: direction, amount 3) Branch condition (e.g., VAX: 1-37 bytes) 0 address 1 address 2 address 3 address implied • Addressing modes • immediate • absolute • computed Unconditional (jump) Conditional (branch) Call Return Instruction Register Memory Organization of an Instruction

  7. Instruction Set Design Objective #1 Code size (code density) : • Depends on: • size of MM/cache • access time of cache (on-chip/off-chip) • CPU-MM bandwidth • Frequently used (written down) instructions should be short • Implies variable-length instructions

  8. Instruction Set Design Objective #2 Execution speed (performance) : • Only frequently executed instructions should be included in the instruction set • Infrequently executed instructions slow down the others • Complex and long instructions tend to be used infrequently • Frequently executed instructions should be fast • Pipelining should be made as easy as possible • Overlapped execution lowers CPI value • Single instruction length, simple instruction formats, and few addressing modes for easy decoding

  9. Instruction Set Design Objective #3 Size and complexity of hardware (ALU, CU) • Implementing infrequently executed instructions ties down hardware that is rarely used, and could be used for some other purpose with greater advantage • Some instructions should not be included in the instruction set

  10. Instruction Set Design Objective #4 Instruction set for programming languages • Needs of a human programmer (less important today) • orthogonality (each operand can be specified independently of the others) • consistency (being able to predict the remainder of an architecture given partial knowledge of the system) • Needs of an optimizing compiler • Simple instructions are more suitable for code optimizations • Optimizing compilers try to find the shortest or fastest code sequence that implements the semantics of a HLL program. To make code reorganization tractable, an instruction set is needed that makes: • the size of each instruction easy to calculate; • the execution time of each instruction easy to calculate; • the interactions between instructions easy to figure out. • ISA features such as complex addressing modes, variable length instructions, special-purpose registers provide too many ways of doing the same thing and lead to combinatorial explosion

  11. Evolution of Instruction Sets • Major advances in computer architecture are typically associated with landmark instruction set designs • Ex: Stack vs GPR (System 360) • Design decisions must take into account: • technology • machine organization • programming languages • compiler technology • operating systems • And they in turn influence these

  12. Classifying • Stack • Accumulator • Register – memory • Register – register (load – store)

  13. Architectures

  14. Register Advantages • Registers - like other forms of storage internal to the processor—are faster than memory. • Registers are more efficient for a compiler to use than other forms of internal storage. • More importantly, registers can be used to hold variables. • How many registers?

  15. Register Usage (Compiler) • for expression evaluation • for parameter passing • to be allocated to hold variables. • GPR architectures: • Two or three operands • how many of the operands may be memory addresses in ALU instructions

  16. Registers versus Cache • Similarities • Both are small, fast, and expensive (flip-flops) • Both are used to increase execution speed of CPU • Differences • Registers are visible in ISA; caches are not (except for instructions for invalidation, prefetch, or flushing) • Number of registers is fixed by instruction format; size of cache is easily changeable • Registers have higher BW: 3 words/cycle, and are random-access; caches have lower BW: 1 word/cycle, and are associative • Register access time is fixed; cache access time is statistical • Registers require fewer bits to address; caches require full memory addresses • Registers create no I/O problems; caches do

  17. Organization of Registers • One general-purpose set (all interchangeable, “typeless”) • One general-purpose set (a few with dedicated uses) • PDP-11: eight 16-bit registers (R6: stack pointer, R7: PC) • VAX 11/780: sixteen 32-bit registers (four special-purpose, R14: stack pointer, R15: PC) • Two sets • Motorola 68000: eight 32-bit data, eight 32-bit address • IBM 370: sixteen 32-bit integer, four 64-bit FP • DLX, MIPS: 31 32-bit integer, 32 32-bit FP • Three sets • CDC 6600: eight 18-bit integer, eight 18-bit address, eight 60-bit FP • Many registers with dedicated use • Intel 80x86

  18. Number of Memory operands

  19. GPR Architectures

  20. 64 bits 8 bytes 2 words 1 doubleword Most Significant Digit (MSD) “Big End” Least Significant Digit (LSD) “Little End” 0 1 2 3 4 5 6 “Big End”-ian Numbering 6 5 4 3 2 1 0 “Little End”-ian Numbering Notations for Information Representation Q: How do we number these various units of information in a consistent manner? 9 6 2 1 7 6 6

  21. Mem Bank 00 Mem Bank 01 Mem Bank 10 Mem Bank 11 Memory Controller 8 8 8 8 32 bits Alignment of Words in Memory • CPU accesses a 32-bit word of data starting at byte address x…x00 • Such an address (multiple of 32[b]/8[b/B] = 4[B]) is called word-aligned • Memory controller is simple and fast, data available in one cycle • CPU accesses a 32-bit word of data starting at byte address 01111 • Byte addresses are 01111, 10000, 10001, 10010 (misaligned address) • Doubles the access time of word • Requiring aligned addresses results in simpler memory controller and faster execution • Costs some loss of storage, and adds complexity in code generators

  22. Mem Bank 00 Mem Bank 01 Mem Bank 10 Mem Bank 11 Memory Controller 8 8 8 8 32 bits Sub-Word Accesses CPU Register File (32 bits) • Byte operand in register is usually the rightmost byte of register • Byte may come from any of the four memory banks • Source of complications

  23. Memory Addressing • Byte Addressed • Big Endian, Little Endian • Aligned and misaligned access of objects

  24. Addressing Modes • Addresses: • Constants • Registers • Locations in memory • Immediates are also included • What is effective address? • PC-relative addressing • Effects of Addressing modes: • Reduce instruction counts • Complexity of building a computer • Increase average CPI

  25. Addressing Modes • We can’t directly refer to data values, only their addresses • Except for immediate operands • Register deferred and direct addressing modes can be synthesized from displacement addressing mode R : the register file M: the memory address space d : the size of the data item being accessed (1, 2, 4, 8 bytes)

  26. Frequency of addressing modes

  27. Displacement values Range of displacements used?

  28. Immediate mode

  29. Immediate Values (number of bits)

  30. Conclusions (up to now) • Modes: • displacement, • immediate, • and register indirect. • Displacement mode: • 12 to 16 bits • Immediate field • 8 to 16 bits

  31. Operand Types • Integers: • 2’s complement • Characters: • ASCI • Unicode, utf-8 • Floating points: • IEEE standard 754 (short seminar on Unicode, IEEE) • Strings • Packed decimals

  32. Data size

  33. Instruction Categories

  34. Top 10 Instructions in 80x86

  35. Control Transfer Instructions Terminology • BTA (Branch Target Address): The destination address of the branch • The BTA is static if it is always the same during execution • The BTA is dynamic if it can vary during a single execution of a program (procedure return, switch statements are major examples) • Branch is taken if next instruction to be executed is at address BTA • Branch is not taken if next instruction to be executed is the one following the branch instruction (“fall-through”) • Branch outcome: whether the branch is taken or not taken • Forward branch: BTA > (PC), where (PC) is the address of the branch instruction • Backward branch: BTA < (PC) • An unconditional branch is always taken

  36. Code Generation Examples for Branches while (a < b) { a++; b--; x++; } if (x > 0) y += z; else y -=z; blez r7, L18 addu r3, r3, r4 j L33 L18: subu r3, r3, r4 L33: j L33 L34: addu r5, r5, 1 addu r6, r6, -1 addu r7, r7, 1 L33: slt r2, r5, r6 bne r2, r0, L34 Register r3 contains y Register r4 contains z Register r5 contains a Register r6 contains b Register r7 contains x

  37. Classification of Branches Classifying branches into these four groups permits us to compute some of the dynamic frequencies if some others have been measured. Rule of thumb: Backward branches tend to be taken, forward branches tend not to be taken.

  38. Computing Branch Frequencies Assume that 75% of all branches are forward, and that 55% of all branches are taken. If 80% of all backward branches are taken, what is the probability that a taken branch is a forward branch?

  39. Frequency of Instructions for Control Flow • Conditional branches • Jumps • Procedure calls • Procedure returns

  40. Addressing Modes for Control Instructions • Destination must be specified (compile time) • Absolute • PC-relative (displacement) • Target is usually near  fewer bits • Position-independent • Target unknown • Returns • Case or switch • Virtual functions • High order functions

  41. Bits for branch displacement

  42. Conditional Branch Options • Typical set of condition codes (e.g., Motorola 680x0) • NegativeResult, ZeroResult, ArithmeticOverflow, CarryOut • Many RISC machines do not use condition codes (e.g., MIPS, Alpha) • Magnitude comparisons are done with explicit COMPARE instructions that put their results into named registers

  43. Comparison Types

  44. Instruction Encoding • Factors: • Registers • Addressing modes • Size of these in instruction • Length of instructions: • Variable • Fixed • Hybrid The length of 80x86 instructions varies between 1 and 17 bytes. 16-bit and 32-bit instructions: ARM Thumb and MIPS MIPS16 (code size reduction of up to 40%).

  45. An Ideal Machine • In Section 2.2—Use general-purpose registers with a load-store architecture. • In Section 2.3—Support these addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register indirect. • In Section 2.5—Support these data sizes and types: 8-, 16-, 32-bit, and 64-bit integers and 64-bit IEEE 754 floating-point numbers. • In Section 2.7—Support these simple instructions: load, store, add, subtract, move register-register, and, shift. • In Section 2.9—Compare equal, compare not equal, compare less, branch (with a PC-relative address at least 8 bits long), jump, call, and return. • In Section 2.10—Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size. • In Section 2.11—Provide at least 16 general-purpose registers, and be sure all addressing modes apply to all data transfer instructions, and aim for a minimalist instruction set.

  46. A "Typical" RISC (MIPS64) • 32-bit fixed format instruction (3 formats) • 32 64-bit GPR (R0 contains zero) • 32 double precision floating point register (F0-F31) • reg-reg arithmetic instruction • Single address mode for load/store: base + displacement • no indirection • Simple branch conditions • Delayed branch see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

  47. Example: MIPS

  48. Example Instructions (Load / Store)

  49. Arithmetic / Logic

  50. Control Flow Examples

More Related