1 / 115

Real-time Signal Processing on Embedded Systems

Real-time Signal Processing on Embedded Systems. Advanced Cutting-edge Research Seminar I&III. Advances in Microprocessor Technology. Architectural improvements of microprocessors. Pipelining Paralle processing exploiting ILP Superscalar VLIW SIMD.

grant
Télécharger la présentation

Real-time Signal Processing on Embedded Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real-time Signal Processing on Embedded Systems Advanced Cutting-edge Research Seminar I&III

  2. Advances in Microprocessor Technology

  3. Architectural improvementsof microprocessors • Pipelining • Paralle processing exploiting ILP • Superscalar • VLIW • SIMD

  4. Procedure of instruction execution on a processor • Instruction Fetch (IF) • fetches an instruction from main memory. • Instruction Decode (ID) • decodes fetched instruction • Execution (EX) • executes decoded instruction • Memory Access (MA) • accesses to main memory • Write Back (WB) • Write back data to registers

  5. Operation cycles on a processor • Single cycle machine • This kinds of machines execute all procedures from IFto WB in a cycle. • Operation speed is determined by the slowest instruction. (Because all instructions must be executed in a cycle) • Multi-cycle machine • This kinds of machines execute an instruction in several cycles. IF ID EX MA WB

  6. Piepelined operation • can improve throughput of instructions. IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB To realize pipelined operation, several techniques are required. IF IF IF ID ID ID EX EX EX MA MA MA WB WB WB IF ID EX MA WB IF ID EX MA WB

  7. Causes of pipeline hazards • Structural hazard: The hardware cannot cope with the combination of issued instructions. • Data hazard: The latter instruction must wait completion of former instruction because the latter uses the result of the former. • Control hazard: A condition that determines whether an instruction is executed or not depends on the result of the former instruction.

  8. Memory Structural hazard CPU PC Instructionregister Instructiondecoder ALU Registers IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB

  9. Memory Structural hazard CPU PC Instructionregister Instructiondecoder IF ID EX MA WB ALU Registers IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB

  10. Memory Structural hazard CPU PC Instructionregister Instructiondecoder IF ID EX MA WB ALU Registers IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB

  11. Memory Structural hazard CPU PC Instructionregister Instructiondecoder IF ID EX MA WB ALU Registers IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB

  12. Memory MA Structural hazard conflict IF CPU PC Instructionregister Instructiondecoder IF ID EX MA WB ALU Registers IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB

  13. Memory Structural hazard CPU • Resolve 1: to stall the next instruction PC Instructionregister Instructiondecoder IF ID EX MA WB ALU Registers IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB

  14. Memory Structural hazard CPU • Resolve 1: to stall the next instruction PC Instructionregister Instructiondecoder IF ID EX MA WB ALU Registers IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB

  15. Memory MA Structural hazard conflict IF CPU • Resolve 2: to add another data bus to access the instruction memory. PC Instructionregister Instructiondecoder ALU Registers IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB

  16. Inst Mem Data Mem Structural hazard CPU • Resolve 2: to add another data bus to access the instruction memory. PC Instructionregister Instructiondecoder ALU Registers IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB Harvard Architecture

  17. Memory Data hazard CPU PC Instructionregister Instructiondecoder add $s0,$t0,$t1 ($s0=$t0+$t1) IF ID EX MA WB ALU Registers sub $t2,$s0,$t3 ($t2=$s0-$t3) IF ID EX MA WB Registers 5 4 3 2 1 t0 t1 t2 t3 t4 0 0 0 0 0 s0 s1 s2 s3 s4

  18. Memory Data hazard CPU PC Instructionregister $s0=$t0+$t1 Instructiondecoder add $s0,$t0,$t1 ($s0=$t0+$t1) IF ID EX MA WB ALU Registers sub $t2,$s0,$t3 ($t2=$s0-$t3) IF ID EX MA WB Registers 5 4 3 2 1 t0 t1 t2 t3 t4 0 0 0 0 0 s0 s1 s2 s3 s4

  19. Memory Data hazard CPU PC Instructionregister $s0=$t0+$t1 Instructiondecoder add $s0,$t0,$t1 ($s0=$t0+$t1) IF ID EX MA WB ALU Registers sub $t2,$s0,$t3 ($t2=$s0-$t3) IF ID EX MA WB $t2=$s0-$t3 Registers 5 4 3 2 1 t0 t1 t2 t3 t4 0 0 0 0 0 s0 s1 s2 s3 s4

  20. Memory Data hazard CPU PC Instructionregister $s0=$t0+$t1 Instructiondecoder add $s0,$t0,$t1 ($s0=$t0+$t1) IF ID EX MA WB ALU Registers sub $t2,$s0,$t3 ($t2=$s0-$t3) IF ID EX MA WB $t2=$s0-$t3 -2=0-2 Registers 5 4 3 2 1 t0 t1 t2 t3 t4 0 0 0 0 0 s0 s1 s2 s3 s4

  21. Memory Data hazard CPU • Waiting by stalls: consuming 3 cycles PC Instructionregister $s0=$t0+$t1 Instructiondecoder add $s0,$t0,$t1 ($s0=$t0+$t1) IF ID EX MA WB ALU Registers sub $t2,$s0,$t3 ($t2=$s0-$t3) IF ID EX MA WB Registers 5 4 3 2 1 t0 t1 t2 t3 t4 0 0 0 0 0 s0 s1 s2 s3 s4

  22. Memory Data hazard CPU • Resolve: forwarding PC Instructionregister $s0=$t0+$t1 Instructiondecoder add $s0,$t0,$t1 ($s0=$t0+$t1) IF ID EX MA WB ALU Registers sub $t2,$s0,$t3 ($t2=$s0-$t3) IF ID EX MA WB Registers 5 4 3 2 1 t0 t1 t2 t3 t4 0 0 0 0 0 s0 s1 s2 s3 s4

  23. Memory Data hazard CPU • Resolve: forwarding PC Instructionregister $s0=$t0+$t1 Instructiondecoder add $s0,$t0,$t1 ($s0=$t0+$t1) IF ID EX MA WB ALU Registers sub $t2,$s0,$t3 ($t2=$s0-$t3) IF ID EX MA WB The result is forwarded to ALU Registers 5 4 3 2 1 t0 t1 t2 t3 t4 0 0 0 0 0 s0 s1 s2 s3 s4

  24. Memory Data hazard CPU • Resolve:forwarding PC Instructionregister $s0=$t0+$t1 Instructiondecoder add $s0,$t0,$t1 ($s0=$t0+$t1) IF ID EX MA WB ALU Registers sub $t2,$s0,$t3 ($t2=$s0-$t3) IF ID EX MA WB The result is forwarded to ALU $t2=9-$t3 7=9-2 Registers 5 4 3 2 1 t0 t1 t2 t3 t4 0 0 0 0 0 s0 s1 s2 s3 s4

  25. Control hazard An instruction sequence including branch add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB CPU ※ ※ In this explanation, PC adopts word address for simplification. PC:10 Instructiondecoder Instructionregister ALU Registers IF ID EX MA WB

  26. Control hazard An instruction sequence including branch add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB CPU PC: Instructiondecoder Instructionregister ALU Registers IF ID EX MA WB

  27. Control hazard An instruction sequence including branch add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB CPU PC:11 Instructiondecoder Instructionregister ALU Registers IF ID EX MA WB

  28. Control hazard An instruction sequence including branch add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB PC’s value of next instruction depends on the branch condition Branch is taken:PC=40 Not taken:PC=12 CPU PC:12 Instructiondecoder Instructionregister ALU Registers IF ID EX MA WB

  29. Control hazard • Resolve 1:stall add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB 2 cycle stall The number of required stall cycle aetermined by architecture. IF ID EX MA WB

  30. Control hazard • Resolve 1:stall add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB 1 cycle stall If the processor can calculate the branch target address at the ID stage. IF ID EX MA WB

  31. Control hazard • Resolve 2: Branch prediction add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB CPU PC:10 In this example, the next PC is predicted as if the branch is always untaken. Instructiondecoder Instructionregister ALU Registers IF ID EX MA WB

  32. Control hazard • Resolve 2:branch prediction add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB CPU PC:11 In this example, the next PC is predicted as if the branch is always untaken. Instructiondecoder Instructionregister ALU Registers IF ID EX MA WB

  33. Control hazard • Resolve 2: branch prediction add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB CPU PC:12 In this example, the next PC is predicted as if the branch is always untaken. Instructiondecoder Instructionregister ALU Registers IF ID EX MA WB

  34. Control hazard • Resolve 2: branch prediction add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB stall CPU PC:40 If the prediction is missed, in other words, if branch is taken. Instructiondecoder Instructionregister ALU Registers IF ID EX MA WB

  35. Control hazard • More practical scheme: dynamic branch prediction • n-bit counter-based prediction: Branch History Table Address of a branch instraction Lower i-bit n-bit saturating up/down counter

  36. 1-bit counter-based prediction 1 0 Predict branch will be taken Predict branch will be untaken Branch is taken Branch is untaken

  37. 2-bit counter-based prediction Branch is taken Branch is untaken Predict branch will be taken Predict branch will be taken 01 10 Predict branch will be taken Predict branch will be taken This scheme is adopted in Intel Pentium, Sun Ultra SPARC, MIPS R10000,etc 00 11

  38. Control hazard • Resolve 3:delayed prediction add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) Inserted instruction or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB CPU PC:11 An instruction that has no dependency is inserted. Instructiondecoder Instructionregister IF ID EX MA WB ALU Registers

  39. Control hazard • Resolve 3:delayed prediction add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) Inserted instruction or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB CPU PC:12 An instruction that has no dependency is inserted. Instructiondecoder Instructionregister IF ID EX MA WB ALU Registers

  40. Control hazard • Resolve 3:delayed prediction add $s0,$t0,$t1 ($s0=$t0+$t1) beq $s1,$s2, 40 (if($s1==$s2){goto 40}) Inserted instruction or $s3,$s4,$t2 ($s3=$s4|$t2) IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB CPU PC:13or40 An instruction that has no dependency is inserted. Instructiondecoder Instructionregister IF ID EX MA WB ALU Registers An instruction at determined address is executed.

  41. Exploiting ILP (Instruction Level Parallelism) • SuperScalar : issuing multiple instructions per cycle with hardware support. • Advantage: binary compatibility. • VLIW: issuing multiple instructions per cycle with compiler support. • Advantage: simple hardware

  42. Types of data dependence • True data dependence (RAW: Read After Write) • Anti-dependence (WAR: Write After Read) • Output dependence (WAW: Write After Write) difficult to remove i1: r2=r1+r3 i2: r4=r2+1 can be removed by register renaming They are called as artificial dependence i1: r1=r2+r3 i2: r2=r4+1 i3: r1=r4+2 Anti Output

  43. Basic Architecture of Superscaler Processor Instructioncache Frontend Instruction decode Branch prediction Datacache Register renaming dispatch commit ・・・・・ Ex-core Back end ・・・・・ Instruction window Registers issue ・・・・・ Reorder buffer Function unit Function unit ・・・・・ ・・・・・

  44. Basic function of Frontend • provides enough instructions. • predicts next instruction address if branch instruction appears. • resolves artificial dependences by register renaming. • analyzes true data dependence after register renaming. • transfers instructions after the above operations. • This operation is called “dispatch”.

  45. Basic function of Ex-core • finds independent instructions stored in “instruction window” as many as possible. • In this operation, dynamic scheduling is performed to resolve several restrictions: data dependence, resource, prior defined priority, etc. • executes independent instructions in parallel. • An operation that transfers an instruction to a function unit is called “issue”.

  46. Basic function of Backend • updates processor state. • Results obtained as out-of-order are reordered to in-order. • Update of the processor state is performed precisely. • Update of the processor state based on the execution result is called “commit”. • Disappear of instruction is called “retire”.

  47. Dynamic instruction scheduling • Instruction scheduling means to determine issuing order of instructions and when the instructions are issued. • In superscalar processors, dynamic instruction scheduling is performed using instructions stored in the instruction buffer. In the following slides, dynamic scheduling will be explained using several types of processors:1-way in-order processor, i-way in-order processro, and i-way out-of-order processor.

  48. 1 way in-order issue • The number of issued instructionsat a cycle is at most 1. • The size of instruction window is 1 because all subsequent instructions cannot be issued if an instruction cannot be issued. • Only true and output dependences should be checked because anti dependence is always resolved.

  49. Control by R flag • R flag is used to check true and output dependences. Registers op dst src1 src2 R value R value Register number Instruction R value R value R value R value R value R value R==false means the register is reserved but the result has not been stored yet. In this case, the operand is not available. Only when R(dst) == true && R(src1) ==true && R(src2), the instruction is issued. (This condition is called “ready”.)

  50. Update sequence of the R flag • R bit of destination becomes false when an instruction is issued. • R bit of destination becomes true when a result is stored in the destination. by the above update, Practically, resource restrictions must be satisfied to issue instructions in addition to the check of dependency. In this lecture, only restriction about function unit is considered to simplify the discussion. • Instructions using unavailable registers as source registers are not issued; true dependence is resolved. • Instructions using unavailable a register as a destination register are not issued; output dependence is resolved.

More Related