Improved Cotransformation for Logarithmic Number System (LNS) Subtraction

Improved Cotransformation for Logarithmic Number System (LNS) Subtraction Mark Arnold University of Manchester Institute of Science and Technology

Outline Advantages of Logarithmic Number Systems (LNS) LNS Addition with interpolation of sb Subtraction and db Cotransformation overcomes singularity of db Coleman’s cotransformation Arnold’s cotransformation New cotransformation Simulation Results Conclusions

Arithmetic Choices • Fixed-point (FX) • Scaled integer—manual rescale after multiply • Hard to design, but common choice for cost-sensitive applications • Floating-point IEEE-754 (FP) • Exponent provides automatic scaling for mantissa • Easier to use but more expensive • Logarithmic Number System (LNS) • Converts to logarithms once—keep as log during computation • Easy as FP, can be faster, cheaper, lower power than FX

1 2 3 4 5 67891 2 3 4 567891 1 2 3 4 567891 2 3 4 567891 Advantages of LNS • Cheaper multiply, divide, square root • Good for applications with high proportion of multiplications log(3) log(2) log(2) + log(3) = log(6) • Most significant bits change less frequently: power savings • Table-based arithmetic ideal for FPGAs

Commercial and Practical Interest in LNS Motorola: 120MHz LNS 1GFLOP chip for satellite

Commercial and Practical Interest in LNS Motorola: 120MHz LNS 1GFLOP chip for satellite European Union: LNS microprocessor (Coleman 800K € )

Commercial and Practical Interest in LNS Motorola: 120MHz LNS 1GFLOP chip for satellite European Union: LNS microprocessor (Coleman 800K € ) Yamaha: Music Synthesizer

Commercial and Practical Interest in LNS Motorola: 120MHz LNS 1GFLOP chip for satellite European Union: LNS microprocessor (Coleman 800K € ) Yamaha: Music Synthesizer Boeing: Aircraft controls

Commercial and Practical Interest in LNS Motorola: 120MHz LNS 1GFLOP chip for satellite European Union: LNS microprocessor (Coleman 800K € ) Yamaha: Music Synthesizer Boeing: Aircraft controls Interactive Machines,Inc.:IMI-500: Animation forJay Jay the Jet Plane

Commercial and Practical Interest in LNS Motorola: 120MHz LNS 1GFLOP chip for satellite European Union: LNS microprocessor (Coleman 800K € ) Yamaha: Music Synthesizer Boeing: Aircraft controls Interactive Machines,Inc.:IMI-500: Animation forJay Jay the Jet Plane Advanced Rendering Hardware Ray-Tracing Engine Technologies:

Commercial and Practical Interest in LNS Motorola: 120MHz LNS 1GFLOP chip for satellite European Union: LNS microprocessor (Coleman 800K € ) Yamaha: Music Synthesizer Boeing: Aircraft controls Interactive Machines,Inc.:IMI-500: Animation forJay Jay the Jet Plane Advanced Rendering Hardware Ray-Tracing Engine Technologies: Cambridge/Microsoft: HTK Hidden Markov Model Toolkit

Commercial and Practical Interest in LNS Motorola: 120MHz LNS 1GFLOP chip for satellite European Union: LNS microprocessor (Coleman 800K € ) Yamaha: Music Synthesizer Boeing: Aircraft controls Interactive Machines,Inc.:IMI-500: Animation forJay Jay the Jet Plane Advanced Rendering Hardware Ray-Tracing Engine Technologies: Cambridge/Microsoft: HTK Hidden Markov Model Toolkit Univ. of Tokyo: N-body Gravity Pipeline (GRAPE) Won 1999 Gordon Bell Prize

Notation upper-case variables (e.g., X) = real value,

Notation upper-case variables (e.g., X) = real value, lower-case variables (e.g., x) = corresponding logarithmic representation

Notation upper-case variables (e.g., X) = real value, lower-case variables (e.g., x) = corresponding logarithmic representation b = base of the logarithm (b=2 is typical)

LNS Addition Given x = logb(X) and y = logb(Y): Why it works:

LNS Addition Given x = logb(X) and y = logb(Y): Why it works: 1. Let z = x - y

LNS Addition Given x = logb(X) and y = logb(Y): Why it works: 1. Let z = x - y 1. z = logb(X/Y)

LNS Addition Given x = logb(X) and y = logb(Y): Why it works: 1. Let z = x - y 1. z = logb(X/Y) 2. Lookup sb(z) = logb(1+bZ)

LNS Addition Given x = logb(X) and y = logb(Y): Why it works: 1. Let z = x - y 1. z = logb(X/Y) 2. Lookup sb(z) = logb(1+bZ) 2. sb(z) = logb(1+X/Y)

LNS Addition Given x = logb(X) and y = logb(Y): Why it works: 1. Let z = x - y 1. z = logb(X/Y) 2. Lookup sb(z) = logb(1+bZ) 2. sb(z) = logb(1+X/Y) 3. t = y + sb(z)

LNS Addition Given x = logb(X) and y = logb(Y): Why it works: 1. Let z = x - y 1. z = logb(X/Y) 2. Lookup sb(z) = logb(1+bZ) 2. sb(z) = logb(1+X/Y) 3. t = y + sb(z) 3. t = logb(Y(1+X/Y))

LNS Addition Given x = logb(X) and y = logb(Y): Why it works: 1. Let z = x - y 1. z = logb(X/Y) 2. Lookup sb(z) = logb(1+bZ) 2. sb(z) = logb(1+X/Y) 3. t = y + sb(z) 3. t = logb(Y(1+X/Y)) Thus, t = logb(Y + X)

LNS Addition Given x = logb(X) and y = logb(Y): Why it works: 1. Let z = x - y 1. z = logb(X/Y) 2. Lookup sb(z) = logb(1+bZ) 2. sb(z) = logb(1+X/Y) 3. t = y + sb(z) 3. t = logb(Y(1+X/Y)) Thus, t = logb(Y + X) Hardware: 1 subtractor - x y

LNS Addition Given x = logb(X) and y = logb(Y): Why it works: 1. Let z = x - y 1. z = logb(X/Y) 2. Lookup sb(z) = logb(1+bZ) 2. sb(z) = logb(1+X/Y) 3. t = y + sb(z) 3. t = logb(Y(1+X/Y)) Thus, t = logb(Y + X) Hardware: 1 subtractor 1 function approximation unit - x sb(z) y

LNS Addition Given x = logb(X) and y = logb(Y): Why it works: 1. Let z = x - y 1. z = logb(X/Y) 2. Lookup sb(z) = logb(1+bZ) 2. sb(z) = logb(1+X/Y) 3. t = y + sb(z) 3. t = logb(Y(1+X/Y)) Thus, t = logb(Y + X) Hardware: 1 subtractor 1 function approximation unit 1 adder - + x sb(z) t y

LNS Addition Given x = logb(X) and y = logb(Y): Why it works: 1. Let z = x - y 1. z = logb(X/Y) 2. Lookup sb(z) = logb(1+bZ) 2. sb(z) = logb(1+X/Y) 3. t = y + sb(z) 3. t = logb(Y(1+X/Y)) Thus, t = logb(Y + X) Hardware: 1 subtractor 1 function approximation unit 1 adder History Leonelli 1803 Gauss 1812 Matula and Marasa 1969 Kingsbury and Rayner 1971 Swartzlander et. al. 1975 Lee and Edgar 1977 Barlow and Bareiss 1985

Plot of sb(z) y y = sb(z) z y=z

Ways to reduce sb table size: Don’t tabulate for positive z (only tabulate for z < 0 ): sb(z) = sb(- z) + z

Ways to reduce sb table size: Don’t tabulate for positive z (only tabulate for z < 0 ): sb(z) = sb(- z) + z Don’t tabulate for large |z|: sb(-z)  0 if z > precision

Ways to reduce sb table size: Don’t tabulate for positive z (only tabulate for z < 0 ): sb(z) = sb(- z) + z Don’t tabulate for large |z|: sb(-z)  0 if z > precision Interpolate from a smaller table: cuts number of address bits in half

Linear Interpolator high bits of z function ROM of z + with round function function ROM ROM f(z) » f(z) z z » partitioning slope ROM of ROM * * * low bits of z z

Subtraction Similar to addition except: uses db(z)=logb|1-bZ| instead of sb(z)

Subtraction Similar to addition except: uses db(z)=logb|1-bZ| instead of sb(z) y y = sb(z) y = db(z) z y = db(z) y=z

Subtraction Similar to addition except: uses db(z)=logb|1-bZ| instead of sb(z) db harder to interpolate due to singularity near z=0 y y = sb(z) y = db(z) z y = db(z) y=z Singularity

Prior solutions for db singularity • 1. Partition range of z with non-uniform [lew90] • Precision 15 17 19 21 23 • sb bits 0.3K 1K 2K 5K 10K • db bits 1K 4K 10K 24K 60K • Problem: db takes most of the ROM

Prior solutions for db singularity • 1. Partition range of z with non-uniform [lew90] • Precision 15 17 19 21 23 • sb bits 0.3K 1K 2K 5K 10K • db bits 1K 4K 10K 24K 60K • Problem: db takes most of the ROM • 2. Use Arnold’s cotransformation [arn97] to convert db to sb • db(zH+zL) = db(zL) + sb(zL + db(zH) – db(zL)), where zH>0 and zL>0 • db(zH) and db(zL) are in tables • Problem: Doesn’t work with z<0, as needed for table reduction

Prior solutions for db singularity • 1. Partition range of z with non-uniform [lew90] • Precision 15 17 19 21 23 • sb bits 0.3K 1K 2K 5K 10K • db bits 1K 4K 10K 24K 60K • Problem: db takes most of the ROM • 2. Use Arnold’s cotransformation [arn97] to convert db to sb • db(zH+zL) = db(zL) + sb(zL + db(zH) – db(zL)), where zH>0 and zL>0 • db(zH) and db(zL) are in tables • Problem: Doesn’t work with z<0, as needed for table reduction • 3. Use Colman’s cotransformation[col95] to convert dbaway from 0: • db(zH+zL) = db(zL) + db(zL + db(zH) – db(zL)) , where zH<0 and zL>0 • Problem: Not as accurate as Arnold’s • Needs dbinterpolator rather thansb • Needs more input guard bits to interpolator.

Prior solutions for db singularity • 1. Partition range of z with non-uniform [lew90] • Precision 15 17 19 21 23 • sb bits 0.3K 1K 2K 5K 10K • db bits 1K 4K 10K 24K 60K • Problem: db takes most of the ROM • 2. Use Arnold’s cotransformation [arn97] to convert db to sb • db(zH+zL) = db(zL) + sb(zL + db(zH) – db(zL)), where zH>0 and zL>0 • db(zH) and db(zL) are in tables • Problem: Doesn’t work with z<0, as needed for table reduction • 3. Use Colman’s cotransformation[col95] to convert dbaway from 0: • db(zH+zL) = db(zL) + db(zL + db(zH) – db(zL)) , where zH<0 and zL>0 • Problem: Not as accurate as Arnold’s • Needs dbinterpolator rather thansb • Needs more input guard bits to interpolator. • This talk reviews cotransformation and • proposes a new cotransformation that • overcomes these problems

Review of Prior Cotransformations Choose z1 and z2 such that z = z1+z2 Z1 = bz1 Z2 = bz2 Z = Z1 *Z2 = bz1 bz2 = bz1+z2 = bz T = bt = |1-Z| = |1 - Z1*Z2|

Review of Prior Cotransformations Choose z1 and z2 such that z = z1+z2 Z1 = bz1 Z2 = bz2 Z = Z1 *Z2 = bz1 bz2 = bz1+z2 = bz T = bt = |1-Z| = |1 - Z1*Z2| Coleman [col95] assumes that Z1 <1 and Z2 >0 so z1<0 and z2>0 t = db (z1) + db (z1 + db(z2) - db (z1)),

Review of Prior Cotransformations Choose z1 and z2 such that z = z1+z2 Z1 = bz1 Z2 = bz2 Z = Z1 *Z2 = bz1 bz2 = bz1+z2 = bz T = bt = |1-Z| = |1 - Z1*Z2| Coleman [col95] assumes that Z1 <1 and Z2 >0 so z1<0 and z2>0 t = db (z1) + db (z1 + db(z2) - db (z1)), T = |1 - Z1| * |1 - | Z1 * |Z2 - 1| |1 - Z1|

Review of Prior Cotransformations Choose z1 and z2 such that z = z1+z2 Z1 = bz1Z2 = bz2 Z = Z1 *Z2 = bz1 bz2 = bz1+z2 = bz T = bt = |1-Z| = |1 - Z1*Z2| Coleman [col95] assumes that Z1 <1 and Z2 >0 so z1<0 and z2>0 t = db (z1) + db (z1 + db(z2) - db (z1)), T = |1 - Z1| * |1 - | Remember db(z1)=log|1-bz1| Z1 * |Z2 - 1| |1 - Z1|

Review of Prior Cotransformations Choose z1 and z2 such that z = z1+z2 Z1 = bz1 Z2 = bz2 Z = Z1 *Z2 = bz1 bz2 = bz1+z2 = bz T = bt = |1-Z| = |1 - Z1*Z2| Coleman [col95] assumes that Z1 <1 and Z2 >0 so z1<0 and z2>0 t = db (z1) + db (z1 + db(z2) - db (z1)), T = |1 - Z1| *|1 - | Z1 * |Z2 - 1| |1 - Z1|

Review of Prior Cotransformations Choose z1 and z2 such that z = z1+z2 Z1 = bz1 Z2 = bz2 Z = Z1 *Z2 = bz1 bz2 = bz1+z2 = bz T = bt = |1-Z| = |1 - Z1*Z2| Coleman [col95] assumes that Z1 <1 and Z2 >0 so z1<0 and z2>0 t = db (z1) + db (z1 + db(z2) - db (z1)), T = |1 - Z1| * |1 - | Remember db(z)=log|1-bz| Z1 * |Z2 - 1| |1 - Z1|

Review of Prior Cotransformations Choose z1 and z2 such that z = z1+z2 Z1 = bz1 Z2 = bz2 Z = Z1 *Z2 = bz1 bz2 = bz1+z2 = bz T = bt = |1-Z| = |1 - Z1*Z2| Coleman [col95] assumes that Z1 <1 and Z2 >0 so z1<0 and z2>0 t = db (z1) + db (z1 + db(z2) - db (z1)), T = |1 - Z1| * |1 - | Z1 * |Z2 - 1| |1 - Z1|

Review of Prior Cotransformations Choose z1 and z2 such that z = z1+z2 Z1 = bz1Z2 = bz2 Z = Z1 *Z2 = bz1 bz2 = bz1+z2 = bz T = bt = |1-Z| = |1 - Z1*Z2| Coleman [col95] assumes that Z1 <1 and Z2 >0 so z1<0 and z2>0 t = db (z1) + db (z1 + db(z2) - db (z1)), T = |1 - Z1| * |1 - | Remember db(z1)=log|1-bz1| Z1 * |Z2 - 1| |1 - Z1|

Review of Prior Cotransformations Choose z1 and z2 such that z = z1+z2 Z1 = bz1 Z2 = bz2 Z = Z1 *Z2 = bz1 bz2 = bz1+z2 = bz T = bt = |1-Z| = |1 - Z1*Z2| Coleman [col95] assumes that Z1 <1 and Z2 >0 so z1<0 and z2>0 t = db (z1) + db (z1 + db(z2) - db (z1)), T = |1 - Z1| * |1 - | Remember db(z2)=log|1-bz2| Z1 * |Z2 - 1| |1 - Z1|

= (1 - Z1) *() Z1 * (Z2 - 1) 1 - Z1 1 - Review of Prior Cotransformations Choose z1 and z2 such that z = z1+z2 Z1 = bz1 Z2 = bz2 Z = Z1 *Z2 = bz1 bz2 = bz1+z2 = bz T = bt = |1-Z| = |1 - Z1*Z2| Coleman [col95] assumes that Z1 <1 and Z2 >0 so z1<0 and z2>0 t = db (z1) + db (z1 + db(z2) - db (z1)), T = |1 - Z1| * |1 - | Z1 * |Z2 - 1| |1 - Z1|

= (1 - Z1) *() Z1 * (Z2 - 1) 1 - Z1 1 - Review of Prior Cotransformations Choose z1 and z2 such that z = z1+z2 Z1 = bz1 Z2 = bz2 Z = Z1 *Z2 = bz1 bz2 = bz1+z2 = bz T = bt = |1-Z| = |1 - Z1*Z2| Coleman [col95] assumes that Z1 <1 and Z2 >0 so z1<0 and z2>0 t = db (z1) + db (z1 + db(z2) - db (z1)), T = |1 - Z1| * |1 - | Z1 * |Z2 - 1| |1 - Z1| = 1- Z1 - Z1*Z2 + Z1 = 1 - Z1 * Z2 = 1-bz

Improved Cotransformation for Logarithmic Number System (LNS) Subtraction