1 / 19

Pipelining

Pipelining. Pipelining. s1. s2. s3. stages. stages. s3. s3. s2. s2. s1. s1. time. time. With pipeline. Without pipeline. Pipelining. s – stages n – tasks t – time per stage. stages. stages. Without pipeline. With pipeline. s3. s3. s2. s2. s1. s1. time. time.

welchp
Télécharger la présentation

Pipelining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pipelining

  2. Pipelining s1 s2 s3 stages stages s3 s3 s2 s2 s1 s1 time time With pipeline Without pipeline

  3. Pipelining s – stages n – tasks t – time per stage stages stages Without pipeline With pipeline s3 s3 s2 s2 s1 s1 time time T1 = s . t . n Ts = s . t + (n-1).t Speedup = T1 / Ts = s.n.t = s s.t + (n-1).t s/n +(1-1/n) Speedup = s n Throughput = n Ts

  4. Pipelining stages stages Without pipeline With pipeline s3 s3 s2 s2 s1 s1 T1 = s . t . n Ts = s . t + (n-1).t s = 3 Speedup = T1 / Ts Speedup = s n Throughput = n Ts

  5. Pipelining Slowest stage determines the pipeline performance 10 30 20 s1 s2 s3 stages stages s3 s3 s2 s2 s1 s1 time time With pipeline Without pipeline

  6. Pipelining Deep pipeline 10 30 20 10 10 10 10 10 10 s1 s1 s2 s21 s3 s22 s23 s31 s32 stages stages s6 s5 s4 s3 s3 s2 s2 s1 s1 time time 6 stages 3 stages

  7. Computational Pipelines Combinatorial logic Reg clock R R R Comb.log. A Comb.log. B Comb.log. C clock

  8. Limitations of Pipelining R R R Comb.log. A Comb.log. B Comb.log. C clock • Nonuniform partitioning • Stage delays may be nonuniform • Throughput is limited by the slowest stage • Deep pipelining • Large number of stages • Modern processors have deep pipelines (15 or more) to increase the clock rate. 50ps 20ps 150ps 20ps 100ps 20ps 50ps 20ps 50ps 20ps 50ps 20ps R R R … Comb.log. C Comb.log. A Comb.log. B clock

  9. Parallel Adder a1,b1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 FA a2,b2 x1 FA a3,b3 x2 FA a4,b4 x3 FA x4

  10. Pipelined Parallel Adder a4,b4 a3,b3 a2,b2 a1,b1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 FA c4 c3 c2 c1 d4 d3 d2 d1 +------------ y4 y3 y2 y1 FA e4 e3 e2 e1 f4 f3 f2 f1 +------------ z4 z3 z2 z1 FA g4 g3 g2 g1 h4 h3 h2 h1 +------------ w4 w3 w2 w1 FA

  11. Pipelined Parallel Adder c4,d4 c3,d3 c2,d2 c1,d1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 FA a4,b4 a3,b3 a2,b2 x1 c4 c3 c2 c1 d4 d3 d2 d1 +------------ y4 y3 y2 y1 FA e4 e3 e2 e1 f4 f3 f2 f1 +------------ z4 z3 z2 z1 FA g4 g3 g2 g1 h4 h3 h2 h1 +------------ w4 w3 w2 w1 FA

  12. Pipelined Parallel Adder e4,f4 e3,f3 e2,f2 e1,f1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 FA c2,d2 y1 c4,d4 c3,d3 c4 c3 c2 c1 d4 d3 d2 d1 +------------ y4 y3 y2 y1 FA a3,b3 x2 x1 a4,b4 e4 e3 e2 e1 f4 f3 f2 f1 +------------ z4 z3 z2 z1 FA g4 g3 g2 g1 h4 h3 h2 h1 +------------ w4 w3 w2 w1 FA

  13. Pipelined Parallel Adder g4,h4 g3,h3 g2,h2 g1,h1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 FA e4,f4 e3,f3 e2,f2 z1 c4 c3 c2 c1 d4 d3 d2 d1 +------------ y4 y3 y2 y1 FA c4,d4 c3,d3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 +------------ z4 z3 z2 z1 FA x3 a4,b4 x2 x1 g4 g3 g2 g1 h4 h3 h2 h1 +------------ w4 w3 w2 w1 FA

  14. Pipelined Parallel Adder a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 FA g3,h3 g2,h2 w1 g4,h4 c4 c3 c2 c1 d4 d3 d2 d1 +------------ y4 y3 y2 y1 FA e4,f4 e3,f3 z2 z1 e4 e3 e2 e1 f4 f3 f2 f1 +------------ z4 z3 z2 z1 FA c4,d4 y3 y2 y1 g4 g3 g2 g1 h4 h3 h2 h1 +------------ w4 w3 w2 w1 FA x4 x3 x2 x1

  15. Floating-point Arithmeric Pipeline • Pipelined Floating-point Addition • Subtract exponents (E) • Subtract exponents to check if they are equal • Compare exponents and Align mantissas (M) • Shift mantissas until the exponents are equal • Add mantissas (A) • Normalize result (N) n1 E M A N n2

  16. Instruction Execution Pipeline • Instruction Fetch Cycle (IF) • Fetch current instruction from memory • Increment PC • Instruction decode / register fetch cycle (ID) • Decode instruction • Compute possible branch target • Read registers from the register file • Execution / effective address cycle (EX) • Form the effective address • ALU performs the operation specified by the opcode • Memory access (MEM) • Memory read for load instruction • Memory write for store instruction • Write-back cycle (WB) • Write result into register file IF ID EX MEM WB

  17. Instruction Execution Pipeline IF ID EX MEM WB stages WB MEM EX ID IF time

  18. Pipeline Hazards Control (Branch) Hazards • Arise from pipelining of instructions (e.g. branch) that change PC. LOOP: LOAD 100,X ADD 200,X STORE 300,X DECX BNE LOOP ... for i=n to 1 ci = ai + bi stages WB MEM EX ID IF time

  19. A Modern Processor Intel Core i7

More Related