220 likes | 356 Vues
This lecture focuses on advanced division algorithms in computer arithmetic, specifically the Radix-4 SRT division and division by a constant using various methods such as Newton-Raphson iteration and continued multiplication. Prof. Chung-Kuan Cheng discusses the mathematical foundations of these algorithms, their implementation in hardware design, and the efficiency of each method. Students are encouraged to provide project updates and explore the use of memory and arithmetic efficiency in digital systems. This session emphasizes the balance between precision and performance in division operations.
E N D
CSE 246: Computer Arithmetic Algorithms and Hardware Design Fall 2006 Lecture 8: Division Instructor: Prof. Chung-Kuan Cheng
Topics: • Radix-4 SRT Division • Division by a Constant • Division by a Repeated Multiplication
Project Update • Come in to speak briefly about the final project • Status Update • 2:30 – 3:00 p.m. • Tuesday or Thursday
Radix-4 SRT Division • 4sj-1 = qjd + sj where • qj is in [-2,2] and sj-1 is in [-hd,+hd] • h is less than or equal to 2/3 • Therefore, sj-1 is in [-2d/3, 2d/3] • And, 4sj-1 is in [-8d/3, 8d/3] • s shifts to the left by 2 bits
Radix-4 SRT Division 4sj-1 8d/3 11.0 Anything above 8d/3 goes against our assumption and is therefore the infeasible region 10.1 qj=2 5d/3 10.0 4d/3 1.1 qj=1 1.0 2d/3 0.1 d/3 d qj=0 0.0 .1 .101 .110 .111 1.00 -2d/3 • The overlap regions of qj denote a choice still allowing for recursion. The gap defines the precision for carry save addition.
Radix-4 SRT Division • The value of qj determines the range it governs • For example, qj = 1 • 1 + 2/3 = 5/3 • 1 – 2/3 = 1/3 • The range is 1/3 to 5/3
Division by a Constant • Multiplication is O(log n) but division is linear…much slower • Try to convert division to multiplication • Property: Given an odd number d m such that d*m = 2n– 1 • Ex. • d = 3, m = 5 3*5 = 24– 1 • d = 7, m =9 7*9 = 26– 1 • d = 11, m = 93 11 * 93 = 210 - 1 E
Division by a Constant • 1/d = m/(2n– 1) • 1/(1-r) = 1+r+r2+r3+… = (1+r)(1+r2)(1+r4)(1+r8)… • Example • z/7 = zm/(2n-1), m=9, n=6 • log(n/6) operations m 1 m = = (1+2-n)(1+2-2n)(1+2-4n) 2n 1-2-n 2n z 9 9z = = (1+2-6)(1+2-12)(1+2-24) 26 1-2-6 26
Division by Reciprocation • Find 1/d with iteration • Newton Raphson Algorithm xi+1=xi-f(xi)/f’(xi) • Set f(x)=1/x-d, (1/2<=d<1) We have f’(x)=-1/x2 • Thus xi+1=xi(2-xid) • Let ei=1/d-xi We have ei+1=1/d-xi+1=1/d-xi(2-xid) =d(1/d-xi)2=dei2 • The convergence rate is quadratic. • For k iterations, it takes 2k multiplications
Division by Reciprocation • z/d=3/0.7 • x0=4(31/2-1)-2d=2.9282-2d=1.5282 • e0=1/d-x0=1/0.7-1.5282=-0.0996286 • x1=x0(2-x0d)=1.42164 • e1=1/d-x1=1/0.7-1.42164=0.0069314 • x2=x1(2-x1d)=1.4285377 • e2=1/d-x2=1/0.7-1.4285377=0.0000337 • x3=x2(2-x2d)=1.4285715 • e3=1/d-x3=1/0.7-1.4285715=-0.000000(1) • The convergence rate is quadratic.
Division by Recursive Multiplication • q = z/d = (z/d) (x0/x0) (x1/x1)… (xk-1/xk-1) eq(a) • Let ½<=d<1 • It takes 2k multiplication for eq(a) • We also need k operations to find xi
Division by a Repeated Multiplication • q = z/d = (z/d) (x0/x0) (x1/x1)… (xk-1/xk-1) • Let ½<=d<1 • Set d0=d, xk = 2-dk 1. d1 = dxo = d(2-d) = 1-(1-d)2 2. dk+1= dkxk = dk(2-dk) = 1-(1-dk)2 3. 1-dk+1 = (1-dk)2 =(1-d)2k quadratic convergence • For k-bit operands, we need 2m-1 multiplications • m 2’s complement • m = ceiling(log2 k) with log2 m extra bits for precision
Division by a Repeated Multiplication • q = z/d=3/0.7 = (z/d) (x0/x0) (x1/x1)… (xk-1/xk-1) • d0=d=0.7, xk = 2-dk, dk+1=dkxk 1. x0=2-d0=1.3, d1=d0xo= 0.7x1.3 = 0.91 2. x1=2-d1=1.09, d2=d1x1=0.91x1.09=0.9919 3. x2=2-d2=1.0081, d3=d2x2=0.9919x1.0081=0.9999343
Division Methods • Iteration • Memory • Arithmetic
0.1 1 0 1 1 0 1 0 1 0 0 1 R0=A 1 0 1 0 1 0 0 0 R1 Q1 = 0.1Q2 = 0.01Q3 = 0.000Q4 = 0.0001 1 0 1 0 0 1 0 0 R2 0 0 0 0 1 0 0 0 R3 1 0 1 0 0 1 1 0 R4 Division –Iteration effort • Pencil and paper method: (A=QB+2-nR and R<B)1 bit partial quotient per iteration, n iterations A = 0.1001, B = 0.1010; Q= A / B. + Qi: Partial Quotient Ri: Partial Remainder Ri+1 = Ri – B Qi Q = 0.1101
Division –Memory effort • Lookup table is the simplest way to obtain multiple partial quotient bits in each iteration. • SRT method: a lookup tables stores m-bit partial quotients decided by m bits of partial remainder and m bits of divisor. Table size: 22m m • STR method is limited by memory wall.
Division –Arithmetic effort • Partial quotient is calculated by arithmetic functions. • Prescaling: • Taylor expansion: • Series expansion:
Division –Solution space • Modern FPGAs contains plenty of memory and build-in multipliers, which enable high performance divider. Memory Effort Our target SRT Memory Wall Low latency Prescaling Pencil-and-paper Series Expansion Iteration Effort Taylor Expansion Arithmetic Effort Low area
Division –PST algorithm • Utilize the power of series expansion, but need a good start point. • Prescaling provide a scaled divisor close to 1. • 0-order Taylor expansion iterates to reach the final quotient
z1 = z E0 =0.1101,1000,0010 d1 = d E0 =0.1111,0001,0001 Q1 = z1 E1 =0.1110,0011 R1 = B1 – Q1 d1 =0.0000,0010,0101,1110,1101 Q2 = R1 E1 =0.1001,1111 R2 = R1 – Q2 d1 =0.0000,0001,1111,1011,0001 Q =0.1110,0011+ 0.0000,0010,0111,11= 0.1110,0101,0111,11 Division –PST algorithm B(m) =0.1100 E0 =1.0011 z =0.1011,0110 d =0.1100,1011 E1 = INV(d1(2m)) =1.0000,1110 E0 = Table (d(m)) 1/d z1 = zE0; d1 = dE0 E1 = (2 d1) INV(d1(2m)) Qi = Ri-1 E1 Ri= Ri-1 Qi B1 Q = Q + Qi
Division –FPGA Implementation • PST algorithm is suitable for high-performance division unit design in FPGAs 32-bit division with 5-cycle latency