Design of a Faithful LNS Interpolator

Design of a Faithful LNS Interpolator Mark Arnold University of Manchester Institute of Science and Technology

Outline Why choose Logarithmic Number Systems (LNS)? Floating Point versus LNS Round to Nearest is Hard Restricted versus Unrestricted Faithful Rounding Interpolation and Partitioning Prior Interpolators (Coleman et al., Lewis) Proposed Interpolators Conclusions

Arithmetic Choices • Fixed-point (FX) • Scaled integer—manual rescale after multiply • Hard to design, but common choice for cost-sensitive applications • Floating-point IEEE-754 (FP) • Exponent provides automatic scaling for mantissa • Easier to use but more expensive • Logarithmic Number System (LNS) • Converts to logarithms once—keep as log during computation • Easy as FP, can be faster, cheaper, lower power than FX

1 2 3 4 5 67891 2 3 4 567891 1 2 3 4 567891 2 3 4 567891 Advantages of LNS • Cheaper multiply, divide, square root • Good for applications with high proportion of multiplications • Introduces no additional rounding error log(3) log(2) log(2) + log(3) = log(6) • Most significant bits change less frequently: power savings

Commercial Interest in LNS Motorola: 120MHz LNS 1GFLOP chip[pan99] European Union: LNS microprocessor [col00] Yamaha: Music Synthesizer [kah98] Boeing: Aircraft controls Interactive Machines,Inc.:IMI-500: Animation forJay Jay the Jet Plane Advanced Rendering Hardware Ray-Tracing Engine Technologies:

Notation x = real values, X = corresponding logarithmic representations b = base of the logarithm (b=2 is typical) F = precision 2F__  =  b , i.e., the smallest value > 1.0

LNS Addition Given X = logb(x) and Y = logb(y): Why it works: 1. Let Z = X-Y 1. Z = logb(x/y) 2. Lookup sb(Z) = logb(1+bZ) 2. sb(Z) = logb(1+x/y) 3. T = Y + sb(Z) 3. T = logb(y(1+x/y)) Thus, T = logb(y + x) Hardware: 1 subtractor 1 function approximation unit lookup table (ROM or RAM) for F<12 interpolation for higher precision 1 adder Similar function, db, for subtraction

4.0 2.0  = 42 7 8 = 4.0 4 = 2.0 1.0 Floating Point versus LNS Exactly representable points shown for precision F=2 Floating point has greater relative error here LNS FloatingPoint

Floating Point versus LNS LNS Continuous change in distance means constant relative precision 4.0 2.0 1.0 Discrete change in distance causes wobble in relative precision Lewis’ Observation: Round to Nearest LNS ln(2) Better Than FP! Margin for round error yet still be BTFP Floating Point But, is it worth the cost?

Rounding Modes Round to Nearest Prescribed by IEEE-754 for Floating Point (FP) Affordable for FP at any precision Economical for LNS only at low precision (F<12) Restricted Faithful Unrestricted Faithful

Round to Nearest Non-exactly-representable values round to the nearest of the Two possible exact representations

Round to Nearest The green point is closer to the left representation

Round to Nearest All values on the left, no matter how close to the midpoint, round to this representation

Round to Nearest Points on the right of the midpoint round to this representation

Table Makers’ Dilemma Need interpolation of sb for high precision some results are hard to round to nearest costs much more memory Relax rounding requirements Faithful rounding chooses one of two closest points increasing next-nearest points decreases memory

Faithful Rounding Modes Restricted Faithful “Better than Floating Point” (BTFP) in worse case Like Round-to-Nearest except near midpoint Unrestricted Faithful Our previous simulations show it good enough for some apps Cuts LNS memory size 3- to 6-fold vs. Restricted Probabilistic Model p = probability faithful result does not round to the nearest

Unrestricted Faithful p = .25 Non-exactly-representable values round to either of the Two possible exact representations

Unrestricted Faithful p = .25 3/4 of the points to the left of the midpoint are rounded to the nearest 1/4 of the points to the left of the midpoint are rounded to the next-nearest

Unrestricted Faithful p = .25 The situation on the right of the midpoint is similar

Restricted Faithful p = .25 Non-exactly-representable values round to one of the Two possible exact representations so that the result is better than floating point (BTFP)

Restricted Faithful p = .25 Values close to the left always round there (to the nearest)

Restricted Faithful p = .25 Values near midpoint can round either way

Restricted Faithful p = .25 Values close to the right always round there (to the nearest)

Linear Interpolator high bits of z function ROM of z + with round function function ROM ROM f(z) » f(z) z z » partitioning slope ROM of ROM * * * low bits of z z

Quadratic Interpolator function ROM high bits of z function of z ROM + with round f(z) slope ROM » z * partitioning multiplier ROM of zL2 low bits of z * z quad ROM ROM

Partition Definitions •  : Distance between adjacent tabulated points • Interval: Width  using a particular polynomial approximation • D: Degree of polynomial • Region: Set of interval(s) with the same  • W: Width of a region • Segment: Largest region with same  • W/ = the number of words in a region

Interpolator Partitioning Simple Example D D D D D D D D D D D D 1 1 1 1 1 1 1 1 2 2 2 2 Memory per region Z words 04 14 22 32 … … 0.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Interval Interval W=1region 2 words W=1region 2 words W=1 region 4 words W=1 region 4 words W=1region 2 words 1 W=2 segment with =0.5 W=2 segment with =0.25 Summarize with # words per W=1 region

Choosing Partition Points • Minimum  is determined by: • starting point of the interval, • interpolation method, such as Lagrange • degree of the polynomial, D • (D+1)th derivative of the function • To minimise memory, partition at multiples of D+1: • Linear D=1 Multiple of 2 Lewis90 • Quadratic D=2 Multiple of 3Lewis94 andProposed • To simplify partition hardware, but double memory: • Quadratic D=2 Powers of 2 Coleman • To simplify partition hardware and keep memory almost optimal: • Quadratic D=2 Multiple of 4Proposed

Memory per Region: F=23 Quadratic Interpolation Z Proposed Proposed Lewis Coleman 03232128256 132 32 128256 23232128128 3323264128 41616 64 64 5 16 16 6464 6161632 64 7816 32 64 8 8 83232 98 8 1632 1048 16 32 11 4 81632 1244 832 132 4 8 32 14 2 4832 1524432 1622 4 16 17 2 2 416 1822 216 ... ... ... ... Total 234 256 768 1536

Memory per Region: F=23 Quadratic Interpolation Z Proposed Proposed Lewis Coleman 03232128256 132 32 128256 23232128128 3323264128 41616 64 64 5 16 16 6464 6161632 64 7816 32 64 8 8 83232 98 8 1632 1048 16 32 11 4 81632 1244 832 132 4 8 32 14 2 4832 1524432 1622 4 16 17 2 2 416 1822 216 ... ... ... ... Total 234 256 768 1536 Power of 2 method is inefficient compared to multiple of 3 because each power of 2 partition has to take the largest number from the Lewis table within the power of two segment

Memory per Region: F=23 Quadratic Interpolation Z Proposed Proposed Lewis Coleman 03232128256 132 32 128256 23232128128 3323264128 41616 64 64 5 16 16 6464 6161632 64 7816 32 64 8 8 83232 98 8 1632 1048 16 32 11 4 81632 1244 832 132 4 8 32 14 2 4832 1524432 1622 4 16 17 2 2 416 1822 216 ... ... ... ... Total 234 256 768 1536 Doesn’t fit the multiple of 3 pattern... so do multiple of 4 with first seven regions same as multiple of 3

Next-nearest probability for Lewis’ Restricted Faithful Interpolator multiple-of-3 partitioning Average p= 0.0032 zend

Next-nearest probability for Coleman Restricted Faithful Interpolator multiple-of-3 partitioning Average p= 0.0006 zend

Next-nearest probability for Proposed Unrestricted Faithful Interpolator multiple-of-3 partitioning Average p= 0.074 zend

Next-nearest probability for Proposed Unrestricted Faithful Interpolator multiple-of-4 partitioning Average p= 0.039 zend

Effect of Partitioning for Quadratic Interpolation • Partitioning method interacts with rounding mode • Multiple of 3 =  increases when z increases by 3 Multiple of 4 =  increases when z increases by 4 • Power of 2 =  increases when z doubles Who Partitioning Rounding Words Probability Proposed Multiple of 3 Unrestricted Faithful 234 0.074 Proposed Multiple of 4 Unrestricted Faithful 256 0.039 Lewis Multiple of 3 Restricted Faithful 768 0.0032 Coleman et al. Power of 2 Restricted Faithful 1500 0.00063

Conclusions • Round to nearest is essentially impossible for F=23 quadratic LNS interpolator. • Restricted faithful rounding has low probability (<0.003) of next nearest. • Restricted faithful rounding costs too much. • Multiple of 3 partitioning, 768 words • Power of 2 partitioning, 1500 words €¥$£

Conclusions • Round to nearest is essentially impossible for F=23 quadratic LNS interpolator. • Restricted faithful rounding has low probability (<0.003) of next nearest. • Restricted faithful rounding costs too much. • Multiple of 3 partitioning, 768 words • Power of 2 partitioning, 1500 words • Unrestricted faithful rounding increases probability slightly. • Multiple of 3 partitioning, p < 0.07 • Multiple of 4 partitioning, p < 0.04 • Previous FFT study suggests p < 0.12 is OK • Unrestricted faithful rounding reduces memory cost 3- to 6-fold • Multiple of 3 partitioning, 234 words • Multiple of 4 partitioning, 256 words €¥$£ €£¥$

Design of a Faithful LNS Interpolator

Design of a Faithful LNS Interpolator

Presentation Transcript

Image Interpolator

A Faithful Fragrance

PRIESTHOOD OF THE FAITHFUL

Architecture, Design Patterns and Faithful Implementation

Moses – A Faithful Servant

Architecture, Design Patterns and Faithful Implementation

University of Catania INFN-LNS

DOSIMETRY COMMISSIONING OF THE LNS-INFN

LNS Catania

Faithful

The Prayer of a Faithful King

Unrestricted Faithful Rounding is Good Enough for Some LNS Applications

A faithful Dog

A Faithful Life

Faithful

V. Greco, LNS-INFN

GLIMPSES OF “A FAITHFUL MINISTER.”