Sub - Nyquist Sampling DSP & Support Change Detector Final Presentation

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Sub -Nyquist Sampling DSP & Support Change DetectorFinal Presentation Performed by: OmerKiselov Daniel Primor :Supervised by Moshe Mishali Inna Rivkin

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory The Whole System Overview CTF (Support recovery) Analog Back-end (Realtime) DSP (Baseband) Expand 1:q Memory Detector

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory The Main Objective SUPPORT GENERATION DSP (Baseband) FIFO FOR DELAY

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory The Block Interface DSP & SUPPORT CHANGE DETECTOR A matrix vector 432 bits Reconstructed samples 432 bits Support Anlysis vector 101 bits Valid samples 1 bit Support Changed 1 bit First Beta (For QR decomposition) 36 bits A Matrix Address 9 bits Samples Bundle 432 bits Valid Supports 1 bit

The Complex Numbers • To avoid all complex multiplications we changed the structures of the matrix. • The matrix is 4 times bigger. For every complex vector multiplication we can still multiply 1 vector with another vector the ordinary way, and get the correct results.

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Inner Block Data-paths DSP & SUPPORT CHANGE DETECTOR BLOCK Pseudo inverse • The DSP Block contains 3 parallel data paths • The DSP is getting the matrix A and the samples bundle y and then solves an equation system to reconstruct the signal from the samples. Real Time Vector Multiplier Support Change Detector

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Pseudo Inverse Data Path Pseudo inverse Pseudo inverse • The pseudo inverse is the largest block on the FPGA. In Matlab – pseudo inverse of matrix A in simply pinv(A); • The options to invert a none square matrix were • The Known way • To attempt matrix decomposition to get better performance. Real Time Vector Multiplier Support Change Detector

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Pseudo Inverse Inner Data Path Pseudo inverse QR Decomposition Inverting an upper triangular matrix • We chose the Algorithm which allows better performance • The pseudo inverse will be created from: • A matrix decomposition • Sub matrix inversion • Multiplying the sub matrixes • The pseudo inverse is the largest block on the FPGA. • The options to invert a none square matrix were • The Known way • To attempt matrix decomposition to get better performance. Matrix Multiplier

In Hardware QR Decomposition Matrix Multiplier Matrix Inversion

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory The Matrix Decomposition Algorithm • The algorithms we checked for matrix decomposition were: • The Cholesky decomposition – has high hardware requirements. Multiplying three matrices and inverting two and transposing is more complicated then the chosen algorithm. • Singular Value Decomposition – this algorithm was tossed after we saw that finding the eigenvalues of a none square matrix in VHDL is both time consuming and complicated. • The QR Decomposition – to decompose the matrix in to two matrices – one upper triangular and one unitary matrix. This algorithm was chosen due to the fact that unitary matrix doesn’t need inverting and that it makes the calculation much easier to understand. In Matlab it is again a single command : qr(A); QR Decomposition Inverting an upper triangular matrix Matrix Multiplier

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory The Matrix Decomposition Algorithm • For the QR decomposition adaptation to hardware we found two algorithms: • Using the Gram-Schmidt process – Performing Gram-Schmidt process on the matrix and then rearranging the equation system in a suited way. this is the result of GS process: • and eventually we get • This algorithm was passed since it returned us to the same situation we came to solve – to invert a none square matrix. • Using Householder reflections – this is a transform similar to Gram Schmidt . We take each vector column of the matrix and perform: • This method has greater numerical stability than the Gram-Schmidt method. The operations per step in the iteration for a nXm • matrix are: Phase 2 Phase 1 Aux1 Aux2

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory The QR Decomposition Algorithm for k = 1:n-1 v = ones(n+1-k,1); if(k<o) v(2:n+1-k) = A(k+1:n,k); end Qk = eye(n); Qk(k:n,k:n) = eye(n+1-k) - (2/(v'*v))*(v*v'); Q = Qk*Q; end [n,m] = size(A); for k = 1:min(n-1,m) v(k:n,1) = aux1(A(k:n,k)); A(k:n,k:m) = aux2(A(k:n,k:m),v(k:n,1)); A(k+1:n,k) = v(k+1:n,1); end Phase 2 Phase 1 B=Phase1(Acore); Qtranse=phase2(B); Rm=Qtranse*Acore; Qm=Qtranse'; if (a(1) >= 0) beta = a(1) + norm(a); else beta = a(1) - norm(a); end v(2:n) = 1/beta * v(2:n); v(1) = 1; Aux1 Aux2 beta = -2/(v'*v); w = v'*A A = A + beta*v*w;

QR decomposition on FPGA Phase 2 Phase 1 Aux 2 Beta calculation unit 24 Multipliers

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory The QR Decomposition Hardware requirements • The QR decomposition unit – QRDEC • Resources: • 6000 ALUTs • 1000 registers • 10000 Block memory bits • 76 DSP block (18 bit multipliers) • During the implementation we transferred the Aux1 unit into the phases and created units for the beta calculation and vector multiplications. Phase 2+Aux1/2 Phase 1+Aux1/2 ALUTs: 1300 Registers :10 ALUTs: 2500 Registers : 450 Block Memory bits 10000 Beta_calc ALUTs: 1900 Registers :550 DSP block: 26 24_mults ALUTs: 850 DSP block: 48 24 mults block + beta calc Aux2 ALUTs: 1500 Registers :10

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Matrix Inversion Algorithm • The QR decomposition unit – QRDEC • Resources: • 6000 ALUTs • 1000 registers • 10000 Block memory bits • 76 DSP block (18 bit multipliers) • During the implementation we transferred the Aux1 unit into the phases and created units for the beta calculation and vector multiplications. • Matrix inversion is a serious bottle neck which is extremely slow. • The alternative ways to invert the matrix were: • The Gaussian Elimination (Ordinary way) – to take a matrix and rank it all the way until we reach the identity matrix. • Analytic solution (adjoin method) – minor matrix multiplied by adjoin of R. • LU Decomposition – to decompose this matrix is a waste of time since it is already triangular so no more decomposition is required. • Alternative Analytic methods (the Newman series ,block wise inversion method etc.) – the amount of calculations needed is greater – plus it is still like inverting the triangular matrix the ordinary way. • The chosen algorithm is the Gaussian Elimination. QR Decomposition Inverting an upper triangular matrix Matrix Multiplier

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Matrix Inversion Algorithm • The unit works on a reusable hardware. • There is an inner unit which invert a vector at a time. • The external unit inserts the vectors in a loop of the support size. • The Matrix we are to inverse has a said before more rows then columns. Thus in order to invert it we can just remove the rows of zeros after the support lines and then invert – making the matrix smaller and saving time. Matrix Inversion Unit Matrix Inversion Unit Vector Inversion Unit

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Matrix Inversion Unit The Vector inverse runs on a faster clock – this work clock is a speed of 2 or three times the main clock (more if possible. Since the multipliers only work at the rate of 50 MHz .There is also a division unit which works in 20 MHz frequency at the most. Resources: 7000 ALUTs 880 registers 26 DSP blocks (18 bit multipliers) for(m=1:s(2)) for(n=1:(m-1)) for(k=1:(m-1)) Rinv(n,m)=Rinv(n,m)+Rinv(n,k)*R(k,m); end end for(w=1:(m-1)) Rinv(w,m)=-Rinv(w,m)/R(m,m); end if(R(m,m)~=0) Rinv(m,m)=1/R(m,m); end end end Matrix Inverse: Unit holds: 14000 memory bits 12500 registers 10000 ALUTs 30 DSP Blocks Matrix Inversion Unit Vector Inversion Unit

Matrix Decomposition Unit FIFO for Original R Matrix Vector Inverter

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Matrix Multiplier Block Matrix Inversion Unit Matrix Multiplier Matrix Multiplier’s Interface QR Decomposition Inverting an upper triangular matrix Vector Multiplier Matrix Multiplier

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Matrix Multiplier Block Matrix Multiplier’s Interface a block for deciding what matrix goes where – since the multiplier is being used by all blocks. Resources for the whole block: ALUTs: 60000 Memory bits : 30000 Registers : 11000 380 DSP blocks Matrix Multiplier Vector Multiplier

Interface To The Matrix Multiplier in Hardware RAM Matrix Multiplier

Matrix Multiplier Vector Multiplier

Vector Multiplier DSP DSP DSP DSP DSP DSP DSP DSP DSP DSP DSP DSP

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Pseudo Inverse Resources Pseudo inverse Pseudo inverse resources: ALUTs: 80000 Memory bits : 60000 Registers : 30000 450 DSP blocks - <1% =34% =50% Real Time Vector Multiplier Support Change Detector

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Real Time Multiplier Pseudo inverse • Real Time Vector Multiplier • the real time multiplier is identical to the matrix multiplier – it multiply one vector (samples bundle) with the pseudo inverse of A. • ALUTs: 50000 • Memory bits : 10000 • Registers :1 • 380 DSP blocks Real Time Vector Multiplier - <1% <1% =42% Support Change Detector

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Support Change Detector Pseudo inverse Support Change Detector Real Time Vector Multiplier Support Change Detector

Support Change Detector • After Simulations we reached a value which for the most reaches 20% False alarms and no miss prediction with regard to the support changes. We examined this on the sample we were given and found that 0.1 is a nominal amount of energy for a signal which is not noise. • The support change detector is a vector multiplier – given one row of the pseudo inversed A matrix and multiply it by the signal to see if any energy there is not noise. • Resources: 100 ALUTs • 400 registers • 26 DSP blocks (18 bit multipliers)

Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Full System TOP! Total System Requirements: ALUT’s : 75000 Memory bits 70000 Registers 30000 DSP Blocks 805 Pins : 1000 All hardware requirements given by Quartus during synthesis. Pseudo inverse EP3SE260 =60% 0.05%= 15%= 101%= - EP3SL110 =87% 0.1%= 34%= 91%= - Real Time Vector Multiplier Support Change Detector

Faults in the design • Under flow & over flow (changed the representation to a different one from the rest of the system – 18 bit 14 mantissa to 18 bit 9 mantissa) • A non invertible matrix – R must be invertible. • Zero columns in the control vector for SCD • Rapid support changes one after the other – compromising delays. • Energy remainder for SCD has no possible way to detect noise or signal. • More than 11 support vectors. Impossible to handle! • If first support are wrong. • Access noise – impossible to reconstruct signal. • Changes in the complex enhance may cause changes to the matrix’s features.

The pseudo inverse module completed the simulation. The support vectors and A_S were taken from the matlab simulation. Plus the samples Yn which were multiplied in the matlab with the matrix. Pseudo inverse takes about 200,000 clock cycles dependent on the amount of supports. Simulation

Performance • The time it takes to perform pseudo inverse is dependent on the number of support vectors. • The maximal possible delay is for 2.5 mega samples – a FIFO at the entrance is needed. • The working frequencies are: for RAM management 100MHz, For secondary work clock 50 MHz, and the main clock is still 20 MHz

Future Work • There are still some glitches in the system. Referring mostly to the change of representation and the new RAM blocks which were inserted. • Errors management. • Handling singular cases. • Hardware debugging. • Timing simulation • Signal tapping • Full system integration.

Part B Gantt Chart

Bibliography • M. Mishali and Y. C. Eldar, "From Theory to Practice: Sub-Nyquist sampling of Sparse Wideband Analog Signals", arXiv 0902.4291; submitted to IEEE Journal of Selected Topics on Signal Processing, Feb. 2009 • Golub, Gene H.; Charles F. Van Loan (1996). Matrix computations (3rd ed.). Baltimore: Johns Hopkins. pp. 257–258. • An Efficient FPGA Implementation of Scalable Matrix Inversion Core using QR Decomposition, Ali Irturk, ShahnamMirzaei and Ryan Kastner, UCSD Technical Report, CS2009-0938. • Implementation of QR Decomposition Algorithms using FPGAs, Ali Irturk, MS Thesis, Department of Electrical and Computer Engineering, University of California, Santa Barbara, June 2007. Advisor: Ryan Kastner. • FPGA Implementation of Adaptive Weight Calculation Core Using QRD-RLS Algorithm, Ali Irturk, ShahnamMirzaei and Ryan Kastner, UCSD Technical Report, CS2009-0937. • Area & power efficient VLSI architecture for computing pseudo inverse of channel matrix in a MIMO wireless system . Khan, Z.; Arslan, T.; Thompson, J.S.; Erdogan, A.T.;

Sub - Nyquist Sampling DSP & Support Change Detector Final Presentation