Advanced Computer Arithmetic RNS Design Considerations Week 11

CENG536 Computer Engineering Department Çankaya University Advanced Computer ArithmeticRNS Design ConsiderationsWeek 11

Algorithm Analysis CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

1. Digital-to-analog converter (ADC) parameters. Depending on preciseness of information representation should be selected the most important parameter – number of bits to represent samples of the input signal. The more bit are used in representation of signal samples – the higher preciseness of data representation. Dynamic range of input signal R typically is measured in decibels where – maximal level of input signal, – minimal level of input signal. Typical parameters of ADC in real projects are: Output signal (result of processing) should have the same dynamic range (recommended). Let ADC in this example will be of 10 bits. Algorithm Analysis CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

2. Mathematical operations Processing the signal, result of arithmetical transforms will be truncated or rounded, because arithmetic unit has fixed number of bits. For example, product of multiplication of two n –bits numbers will be represented in 2n bits and the lower n bits of this result need to be discarded. Each truncation increases resulting errorand decrease number of correct resulting bits. To have the same dynamic range of input and output signal, before the first arithmetic operation input sample should be expanded by adding additional bits to the left of the sign bit. Additional bits Sample of the signal from output of ADC Word format Algorithm Analysis CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Algorithm Analysis CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

x1 X1 x2 X2 Computation of the two point FFT is based on the “butterfly” operation of form Example of 8 point FFT: Algorithm Analysis CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Magnitude of the complex function WNkfluctuates in range 1 and on each stage resulting magnitude may increases 2 times due to addition operation. Error on each stage of this algorithm is 0,5 bit. Because total number of stages for FFT algorithm is defined as Stages = log2(N), the number of additional bits in representation of data may be determined from the table For this example we select N=1024 or 2048. Algorithm Analysis CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

On each stage result of multiplication should be discarded to 16 most significant bits and after addition data should be scaled by applying weighting coefficient ½. These operation prevent overflow and producing all operation on each stage in fixed-point format we always shall have data in input and output in 16 bit format. As error increases on each stage by 0,5 bit, total number correct bits in representation of output samples will be equal 10. Algorithm Analysis CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Designing of RNS CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Dealing with numbers in RNS we process integer numbers and motivation for selecting of range of data representation should be applied on different way. Having ADC of 10 bits and determining array of data of 1024 samples we need to process data presented in 16 bit format. But after multiplication data will be represented in 32 bit format. Here result must be divided by 216 and will be added to another operand, according the logic of the algorithm. Then resulting output data on each stage will be in a twice larger range than range of data in input, that needs scaling coefficient of two to represent input and output data in the same range. Designing of RNS CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

x1 X1 x2 X2 Execution of this algorithm in RNS will be realized in following way. 1. Initially x1 and x2 should be represented in main range of moduli, that must be twice larger, than its necessary to represent results. 2. Before the multiplication x1 and x2 should be expanded on additional set of moduli to avoid overflow of the product. 3. After multiplication product should be divided by range of expansion and it will be represented in main range. 4. Producing summation and then dividing result by 2 we shall have representation of X1 and X2 in main range. Designing of RNS CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Set of moduli for main range should be selected to satisfy requirements of correct representation of input and output data. Selecting and denoting moduli of the main system as p1 , p2 , . . . , pi , . . . , pn we need to have range of data representation P that satisfies P  216. By adding modulo p0 = 2 we have total main range 2P  2216 = 217. In addition, modulo p0= 2 is a good choice to represent negative numbers in artificial form. Moduli for range expansion 1, 2, . . . , k, . . . , m must give range P 216. Full range of data representation is 2PP  217216 = 233. Selection of moduli CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Practical RNS Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Starting execution of any stage of data processing algorithm, the data need to be represented in total main range, for example number A will have form A (0, 1, . . . , i, . . . , n) Before multiplication of this number by another one it should be expanded to full range A (0, 1, . . . , i, . . . , n, 1, 2, . . . , j, . . . , m) where residui by moduli of expansion range (1, 2, . . . , j, . . . , m) will be determined by using of range expanding algorithm. Let be introduced minimal pseudo-orthogonal numbers by moduli of total main range, and let in addition this numbers will be represented by moduli of expansion range Expanding of the range CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

then summing numbers by selecting residui i will be obtained number A (0, 1, . . . , i, . . . , n, 1, 2, . . . , j, . . . , m). After all summations there may happens overflows over the range 2P, but not over the full range. Expanding of the range CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Therefore, this result should be corrected. To realize correction there is necessary store in memory in parallel to minimal numbers in RNS their values in binary positional system. While adding minimal numbers in RNS must be realized binary summation by modulo 2P to determine how many times T overflow occurs • Correction should be realized by subtracting • A = A – T = (0, 1, . . . , i, . . . , n, 1, 2, . . . , j, . . . , m) • – (0, 0, . . . , 0, . . . , 0,  T1,  T2, . . . ,  Tj, . . . ,  Tm) • = (0, 1, . . . , i, . . . , n, 1, 2, . . . , j, . . . , m) • Thus, this algorithm needs arithmetic operations in RNS and binary positional system. Expanding of the range CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Range expansion can be realized in another way. Let be introduced set of minimal by absolute value numbers of form Expanding of the range CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Selecting numbers by moduli pi and adding them according the rule • (i + i) (mod pi) = I • will be obtained necessary result • A = (0, 1, . . . , i, . . . , n, 1, 2, . . . , j, . . . , m) • In this case correction of number because of overflows by modulo 2P will be realized like it was shown before. Final result will be of form • A = A – T = (0, 1, . . . , i, . . . , n, 1, 2, . . . , j, . . . , m) • – (0, 0, . . . , 0, . . . , 0,  T1,  T2, . . . ,  Tj, . . . ,  Tm) • = (0, 1, . . . , i, . . . , n, 1, 2, . . . , j, . . . , m) Expanding of the range CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

After multiplication of the two numbers is done, result C should be scaled to represent it in total main range 2P (compressing of the range). For that may be applied method of zeroing. To have result in total main range, we realize division of number C by P (range expansion). For that must be introduced new set of minimal pseudo-orthogonal numbers of form Method of Zeroing CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

This numbers like for algorithm of range expanding should be represented in both forms – in RNS and in positional binary. • Let we apply zeroing to number • C (0, 1, . . . , i, . . . , n, 1, 2, . . . , j, . . . , m) • Selecting minimal numbers that satisfies rule • j+j (mod j)  0, • and summing them we get • C+C=C (0, 1, . . . , i, . . . , n, 0, 0, . . . , 0, . . . , 0) • In parallel we count overflows by modulo P in positional binary Method of Zeroing CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Then we correct the result by subtracting • C= C– R= (0, 1, . . . , i, . . . , n, 0, 0, . . . , 0, . . . , 0) • – (R0, R1, . . . , Ri, . . . , Rn, 0, 0, . . . , 0, . . . , 0) • = (0, 1, . . . , i, . . . , n, 0, 0, . . . , 0, . . . , 0) • Second stage of range compression is formal division by P for moduli of total main range. This operation gives • where symbol * indicates indefiniteness of division 0 by 0. • These indefinites may be opened by applying of algorithm of range expanding. Method of Zeroing CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

The same result may be obtained in another way. We want transform number • C (0, 1, . . . , i, . . . , n, 1, 2, . . . , j, . . . , m) • to form • C+C=C (0, 1, . . . , i, . . . , n, 0, 0, . . . , 0, . . . , 0). • Let be introduced another set of minimal by absolute value numbers Method of Zeroing CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Using the same rule we realize zeroing of residues by moduli j • j+j (mod j)  0. • Minimal numbers have smaller magnitude, comparing to that , but here as result of zeroing there may happens overflow by modulo P. That is why we need to accompany computation in RNS by those in positional binary to count number of overflows. Realization of second stage of this algorithm, that is division by P, is identical to that analyzed before. Method of Zeroing CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

To transform analog signal to binary code there may be used traditional ADC, produced by industry. Then, this binary positional code need to be transform to RNS. Typical way is using of constants of form • 2i ≡ ( β0i, β1i, … βni) • Having k-bit ADC, transform from binary to RNS will takes k summation. Here we have small set of k constants, but its necessary spent too much time for this transform. • Analyzing 10-bit ADC, lets represent binary number (signal sample) in another form as • A = (a9a8a7a6a5)25 + (a4a3a2a1a0)20 • where ai – is binary digit (0, 1) that has weight 2i. Number of such groups and number of bits in each group of real ADC should be selected by the system designer. Transform From Binary to RNS CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Realizing operations for 10-bit ADC, we need to introduce two sets of constants of form • (a9a8a7a6a5)25≡ (β05, β15, … βn5) • and • (a4a3a3a3a3)20≡ (β00, β10, … βn0) , • totally 322 = 64 constants. • Increasing number of constants we reduce number of arithmetic operations from 10 to 2. • These constants may be represented not only in residui by moduli of total main range, but in residui of expansion range. In this case, starting first stage of FFT algorithm expanding of the range may be skiped. Transform From Binary to RNS CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Finalizing computations, result must be converted rom RNS to positional binary for further displaying. For this purposes we need orthogonal bases of form • Bα1= (α1, 0, . . . , 0, . . . , 0) • Bα2=(0, α2 , . . . , 0, . . . , 0) • . . . . . . . • Bαi= (0, 0, . . . , αi, . . . , 0) • . . . . . . . • Bαn= (0, 0, . . . , 0, . . . , αn) • _______________________________ • A (mod P) ≡ ( α1, α2, . . . , αi, . . . , αn ) • that need to be represented in binary positional system. Summing selected constants by mod P will be determined representation of the number in positional system. Transform From RNS to Binary CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Having, for example, 6 moduli we need constants • and to produce this conversion will be realized 6 additions. To minimize conversion time there we can introduce another set of orthogonal bases not for the only modulo, but for pair of moduli. For our example these constants will be of form • Bα1α2= (α1, α2, 0, 0, 0, 0) • Bα3α4= (0, 0, α3, α4, 0, 0) • Bα5α6= (0, 0, 0, 0, α5, α6) • ________________________________ • A (mod P) ≡ ( α1, α2, α3, α4, α5, α6) • Here we need to realize 3 additions by modulo Pto obtain necessary result. Moduli to create these orthogonal bases may be combined in different order. Here we need • (p1 – 1)(p2– 1) + (p3 – 1)(p4– 1) + (p5 – 1)(p6 – 1) constants. Transform From RNS to Binary CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Algorithms Realization CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Practical implementation of the algorithms can be realized in RAM or ROM memory of the computer. Simplified representation of memory is shown in figure To the input (address lines) are applied residues of operands or different kinds of constants. On the outputs (data lines) will be obtained result of arithmetic or other operation. Algorithms Realization CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

To realize all algorithms analyzed there must be prepared different tables of data, that is • Addition table • Subtraction table • Multiplication table • Orthogonal bases table • Minimal numbers table • Table of numbers minimal by absolute value • . . . and the other tables, depending of algorithm. • Access to each of this tables may be organized by setting individual address code on most significant bits of address lines. These bits will be responsible for selecting of operation code (OpCode) in RNS. Lower address lines will carry information about residue of one or two operands. Algorithms Realization CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

If special external ROM memory is used for storing of tables, there may be organized necessary width of data bus to represent information in residues of all moduli in one word. If standard RAM memory of computer is used – there may be necessary to store output data in two or more memory locations. In any way, no standard methodology for designing of systems in RNS. Realizing a project, designer needs to analyze different ways to find an optimal solution. Algorithms Realization CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Advanced Computer Arithmetic RNS Design Considerations Week 11

Advanced Computer Arithmetic RNS Design Considerations Week 11

Presentation Transcript

Computer Arithmetic:

Computer arithmetic

Advanced Design Considerations

Computer Arithmetic

Advanced Computer Arithmetic Floating Point Arithmetic Week 3

Advanced Computer Arithmetic Residue Number System Week 4

Advanced Computer Arithmetic Solving of Elementary Congruences (Continuing) Week 6

Advanced Computer Arithmetic Fundamentals of computer arithmetic in RNS Week 8

Advanced Computer Arithmetic Algorithms Of Non-Modular Operations Week 9

Computer Arithmetic

CSE 246: Computer Arithmetic Algorithms and Hardware Design Numbers: RNS, DBNS, Montgomory

Advanced Computer Arithmetic RNS in the Complex Domain Week 13

Advanced Computer Arithmetic M ultilevel RNS Week 5

COMPUTER ARITHMETIC

Computer arithmetic

Computer Arithmetic

Computer Arithmetic

Computer arithmetic

Computer Arithmetic

Computer Arithmetic:

Computer Arithmetic