Create Presentation
Download Presentation

Download Presentation
## Built-In Self-Test for Multipliers

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Built-In Self-Test for Multipliers**Mary Pulukuri Dept. of Electrical & Computer Engineering Auburn University**Outline of Presentation**• Overivew of multiplier architectures • History of Digital Signal Processor (DSP) Architectures in FPGAs • Overview of Virtex-4 DSP • Prior Testing R&D for Multipliers • Our Approach • Analysis Methodology • Simulation Results • Application to Virtex-4 & 5 DSPs • Summary and Conclusions VLSI D&T Seminar**Overview of Multipliers**• Array Multiplier • Final product calculated by using an array of full adders & and gates VLSI D&T Seminar**Overview of Multipliers**• Signed array or Baugh Wooley multiplier • Final product calculated using an array of full adders, and gates & nand gates VLSI D&T Seminar**Overview of Multipliers**• Modified Booth multipliers • Partial products calculated using the modified booth algorithm • Modified booth algorithm uses a binary encoder to calculate partial products using a series of shift operations • Summation of partial products done using CLA adders A.R. Cooper, “Parallel architecture modified Booth multiplier” IEEE Proc. Electronic Circuits and Systems, vol. 135, no. 3, pp. 125-128, 1998 VLSI D&T Seminar**Overview of Multipliers**• Modified Booth/Wallace Tree multipliers • Summation of partial products done using a Wallace Tree • Each column of partial products are summed using a multi-stage setup of half and full adders • Each multi-stage adder circuit generates a sum and carry which form the two final partial products • Two final stage partial products from the wallace tree are added using a CLA adder VLSI D&T Seminar**PC**PC PC PC Xilinx FPGA Architectures • 4000/Spartan • NxN array of unit cells • Unit cell = CLB + routing • Fast carry logic in CLBs for adders • Virtex/Spartan-2 • MxN array of unit cells • Carry logic + AND gate for array multipliers • 4K block RAMs at edges • Virtex-2/Spartan-3 • 18K block RAMs in array • 18x18-bit multipliers with each RAM • “based on modified Booth architecture” • Virtex-4/Virtex-5 • Added 48-bit DSP cores w/multipliers VLSI D&T Seminar**Outputs w/ dedicated routing**X X A(18) B(18) P (48) Y Y C(48) Z Z Inputs for cascading Outputs w/ dedicated routing A(18) B(18) P (48) Inputs for cascading Virtex-4 DSP Architecture • 2 DSP slices per tile • 16-256 tiles in 1-8 columns • Each DSP includes: • 18x18-bit 2's-comp multiplier (w/o adder) • 3-input, 48-bit adder/subtractor • P = Z(X+Y+Cin) • Optional accum reg • User controlled operational modes • For X, Y, & Z MUXs • Configuration bits control other MUXs • Pipelining registers • Accumulator register • Easily tested VLSI D&T Seminar**Larger multiplier**BIST Approach for Virtex-5 DSP VLSI D&T Seminar**Test algorithm depends on architecture**But architecture is not specified in data sheets Eliminate sequential logic architectures “Based on modified Booth” Multiplier choices include: Array Booth Modified Booth Modified Booth/Wallace tree Our assumption based on area/performance analysis Our goal: find/develop architecture independent test algorithm(s) Multiplier Architectures VLSI D&T Seminar**4×4 algorithm**8-bit counter MSB LSB Booth encoding n×n multiplier n 4 4 n × 2n Modified Booth Test Algorithms • Test algorithm uses 8-bit counter (256 vectors) • “ “Effective Built-In Self-Test for Booth Multipliers” • Gizopoulos, Paschalis & Zorian • IEEE Design & Test of Computers pp. 105-111, 1998 • Claim fault coverage ~ 99.8% • 4x4 connections to multiplier inputs • Order of the bits does not matter • Algorithm used in Srinivas Garimella’s MS thesis for Virtex-2 multipliers VLSI D&T Seminar**5×3 algorithm**8-bit counter MSB LSB Booth encoding n×n multiplier n 5 3 n × 2n Modified Booth Test Algorithms • Test algorithm uses 8-bit counter (256 vectors) • “An Effective BIST Architecture for Fast Multiplier Cores” • Paschalis, Kranitis, Psarakis Gizopoulus & Zorian • Proc. Design, Automation and Test in Europe Conf. pp. 117-121, 1999 • Claim fault coverage ~99.8% • 5x3 connections with 5 inputs to Booth encoding • But this was not explicit in paper • Only shown in figure • Order of the bits does not matter • Note that this paper is from 1999 VLSI D&T Seminar**5×3 algorithm**8-bit counter MSB LSB Booth encoding n×n multiplier n 5 3 n × 2n Modified Booth Test Algorithms • Test algorithm uses 8-bit counter (256 vectors) • “Low Power BIST for Wallace Tree-based Fast Multipliers” • Bakalis, Kalligeros, Nikolos, Vergos & Alexiou • Proc. Int. Symp. on Quality of Electronic Design, pp. 433-438, 2000 • Claim fault coverage > 99% • 5x3 connections with 5 inputs to Booth encoding • Specifically stated in paper • But no data to back up claim that 5x3 better than 3x5 • Did they just observe it in Zorian paper? • Note that this paper was published a year later than Zorian VLSI D&T Seminar**5×3 algorithm**3×5 algorithm 8-bit counter MSB LSB Booth encoding n×n multiplier n 5 3 3 5 n × 2n Modified Booth Test Algorithms • Test algorithm uses 8-bit counter (256 vectors) • But which side is Booth encoding? • Xilinx does not specify • Our original approach • Run 5x3 algorithm • 256 vectors • and run 3x5 algorithm • 512 vectors • Include 4x4 if fault coverage improves • 768 vectors • Additional algorithms only require multiplexers to change inputs • Use same 8-bit counter VLSI D&T Seminar**Methodology for Analysis**• Multipliers evaluated • Unsigned array • Signed array – Baugh Wooley • Modified Booth • Carry look-ahead adders sum partial products in every stage • Modified Booth Wallace Tree • Carry look-ahead adder sums final stage partial products • Carry select adder sums final stage partial products • Ripple carry adder sums final stage partial products VLSI D&T Seminar**Methodology for Analysis**• Designed 8-bit models of the multipliers • Fault model: Collapsed single stuck-at gate level faults • Exhaustive testing • To determine undetectable faults • Test algorithms evaluated • 4×4 • 5×3 • 3×5 • 5×3 & 3×5 • 4×4, 5×3 & 3×5 VLSI D&T Seminar**Application to Virtex-4 & 5 DSPs**• In Virtex-4 & 5 DSPs • Final stage carry look-ahead adder (CLA) separated from the multiplier • 5×3 & 3×5 give the same fault coverage for the multiplier alone • Separate test algorithm for the CLA • Run both 5×3 and 3×5 to test for bridging faults on the cascade routing between adjacent slices VLSI D&T Seminar**Summary and Conclusion**• If the architecture of the multiplier is not known: • 3×5 algorithm gives best overall fault coverage for most multipliers • Contradicting the claim of the authors who proposed 5×3 • Running 3×5 & 5×3 gives better fault coverage for all multipliers • Running all three algorithms: 3×5, 5×3 and 4×4 test algorithms provides the best fault coverage for all multipliers • Architecture independent testing • Virtex-4 & Vritex-5 multipliers • Original approach was 3×5 and 5×3 • Better approach would be 3×5 and 4×4 VLSI D&T Seminar**Summary and Conclusion**• For multipliers in Virtex-2 FPGAs • Adder not separated from the multiplier • Run both 3×5 and 5×3 algorithms • These give highest fault coverage for multiplier & CLA • The 3×5 and 4×4 BIST algorithm should be applied to multipliers in • Spartan-3A • Similar to multipliers in Virtex-4 • Spartan-6 • Similar to multipliers in Virtex-4 • Virtex-6 • Similar to multipliers in Virtex-5 • If only 2 algorithms can be applied • Best results if all 3 can be applied VLSI D&T Seminar**Summary and Conclusion**• Area overhead for different approaches • In addition to 8-bit counter • Maximum area overhead for N-bit multiplier: • One test algorithm: 2N 2:1 multiplexers • Two test algorithms: 2N 3:1 multiplexers • 1 additional counter bit for control • All three test algorithms: 2N 4:1 multiplexers • 2 additional counter bits for control • This is worst case since synthesis tools may reduce multiplexers • Particularly in case of two and three test algorithms • Due to counter duplicate bits to same multiplexers • Regardless, this is an area efficient BIST approach • Paper almost finished for JETTA Letter or Trans. IE Corr. • Brad is using 3×5, 5×3 & 4×4 algorithms in test bench for multipliers in Output Response Analyzer (ORA) for mixed signal BIST VLSI D&T Seminar