1 / 16

A Binary Integer Decimal-based Multiplier for Decimal Floating-Point Arithmetic

Zhongkai Chen. A Binary Integer Decimal-based Multiplier for Decimal Floating-Point Arithmetic. Paper Information.

sylvie
Télécharger la présentation

A Binary Integer Decimal-based Multiplier for Decimal Floating-Point Arithmetic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Zhongkai Chen A Binary Integer Decimal-based Multiplier for Decimal Floating-Point Arithmetic

  2. Paper Information • Gonzalez-Navarro, S.  ;  Tsen, C.  ;  Schulte, M.  ;Univ. of Malaga, MalagaThis paper appears in:  Signals, Systems and Computers, 2007. ACSSC 2007. Conference Record of the Forty-First Asilomar Conference on Publication Date :  4-7 Nov. 2007

  3. Outline • Introduction • BID Multiplication Technique • BID Rounding • BID Multiplier Design • Conclusion

  4. Introduction • Because binary floating-point arithmetic does not provide correct decimal rounding nor exactly represent many decimal fractions, such as 0.01, 0.0475, and 10-8, demand for Decimal Floating-Point (DFP) arithmetic is increasing in global business, e-commerce and financial applications. It is estimated that errors from binary floating-point arithmetic can accumulate to an annual error of over $5 million for large billing systems. • DFP can be represented by Densely Packed Decimal(DPD),Binary Coded Decimal(BCD), and Binary Integer Decimal(BID). Recently Intel published results for a BID software library.

  5. Introduction • BID encoding is more appropriate for implementation in software than in hardware. • The performance of software implementation is not good enough. • The author holds a contrary view: BID is well suited for hardware implementation, since it can share hardware with high-speed binary arithmetic unit. The proposed multiplier can be shared to perform binary floating-point multiplication and other BID-based DFP operations.

  6. Introduction • Compared with other encoding, the challenging problem of BID is rounding: • Rounding off d decimal digits can be performed by dividing the product by 10d, followed by an optional increment of the truncated result based on the rounding mode. This method for rounding, however, has long latency.

  7. BID Multiplication Technique 1. let A and B be the DFP operands represented by the triples of (Asign, Ac, Aexp) and (Bsign, Bc, Bexp), respectively. 2. Intermediate product IPc=Ac*Bc 3. In parallel, IPexp=Aexp+Bexp 4. The number of digits in IPc is calculated to determine whether the result needs rounding 5. Perform multiplication by 10-d to round off d digits. And then adjust IPexp

  8. BID Rounding • A straight-forward approach to determine d is to count the number of digits in the intermediate product, IPC, using a digit counter unit. Once the number of digits in IPcis computed, the number of digits to round off may be computed as d = max(digits(IPc) -precision, 0). The drawback of this approach is that IPcmust be computed before determining how many digits to round off.

  9. BID Rounding • The proposed technique is to use two binary leading-one detectors to determine the bit position of the leading one of both significands. • k=Alop+Blop • 2k< Ac*Bc< 2k+2-1. This estimate may be one digit less than the actual number of decimal digits.

  10. BID Rounding • For example, suppose that the sum k is 63, so IPc is in the range [263, 265-1]. • Suppose that the number of decimal digits in IPcis 19 or 20. If the precision is 16, 3 or 4 digits will be rounded off. • A lookup table (LUT) indexed by k, stores the minimum number of decimal digits to round off, d'. In this case, d'= 3. • To determine exactly how many digits to round off, the same LUT stores pre-calculated values of powers of ten. Specifically, for index k the LUT stores, 10n, the smallest power of ten greater than 2k. • The sign of a comparison between IPc, and 10n lets the design determine the exact number of digits to round off. • So in position k= 63 is also stored 1019 (since 1019 >263), which is compared with IPc. Depending on the result of the comparison, either d = 3 or 4 digits are rounded off.

  11. BID Rounding • Another LUT stores pre-calculated approximation of wd=10-d • The truncated product is: P=IPc*wd

  12. BID Multiplier Design • Why define X and Y as 54 Bits? • The size of each decimal64 significand is 54 bits. • IPc and wd can each be up 108 bits, so the same multiplier can be reused when rounding is needed.

  13. BID Multiplier Design • If rounding is needed: • To reuse the same multiplier, IPc and wd are split into upper and lower halves. The inputs of the multiplier X and Y are fed with these halves, which we denote as IPCH = IPc[107:54], IPCL = IPc[53:0], wdH = wd[107:54], and wdL = wd[53:0]. The final product, P = IPc * wd is obtained after four multiplies: • (PS1,PC1) = wdL* IPcL, (PS2, PC2) = wdL * IPcH, (PS3, PC3) = WdH* IPcL, and (PS4, PC4) = wdH * IPcH.

  14. BID Multiplier Design • A BID-encoded DFP multiplication only takes two cycles if IPcfits the result's precision. If rounding is necessary, it takes eight cycles to produce a result that complies with the IEEE P754 Draft Standard.

  15. Conclusion • The number of digits to round off is obtained in parallel with the calculation of the intermediate product and a binary multiplier with carry-save feedback is employed. • This allows the design to reuse an existing binary multiplier for both significand multiplication and rounding. • This multiplier can also be shared to perform BFP multiplication. • The design has variable latency to take advantage of the fact that multiplication results are not often rounded. • The design demonstrates that BID multiplication can be efficiently implemented in hardware with much better latency than a software implementation.

  16. Thank you

More Related