Final Presentation Annual project (Part A) Winter semester תשע"ב ( 2011 /12)

INS/GPS navigation system using RPF Implemented with Bluspec HDL. Using Xilinx Virtex5 FPGA Final PresentationAnnual project (Part A)Winter semesterתשע"ב (2011/12) Students: Dan Hofshi, Shai Shachrur Supervisor: MonyOrbach

Intro • Abstract • Algorithm Reminder • Previous projects background. • Solution approaches • Detailed information on the final implementation. • summary

Abstract • This project is a part of a continues effort to implement a RPF based navigation system in the laboratory of high speed digital systems at the Technion university. The project and the algorithm initially written by Professor Yaakov Oshman and Mark Koifman from the faculty of Aerospace. • Previous to our project, another group of students tested and simulated the algorithm in a C++ environment and verified the algorithm functionality [1] • Later on, a group of several students designed the algorithm blocks to work on several Altera FPGA simultaneously , as the hardware resources requirements was too much to meet a single FPGA capability. 1. ^ ("Gps computer program", by NetaGalil and MotiPerets, Winter 2010) .

Reminder – The Algorithm Principle of operation Measurement update A visual demonstration of the particle filter navigation , excluding data correction process.

Previous projects information & conclusions • Retrieving information on Timing and location complexity for each of the algorithm blocks and parameters. (Data busses widths, Number of particles , mathematical implementation of certain blocks). • A particle filter implementation on a single FPGA require a fundamental thinking about the way you parallelize the algorithm or reducing mathematical complexity. • A particle filter project is too big to be designed by a single\group without a proper structural design in advance. • Location complexity requires an external memory use.

First Approach for solution • Trying to reduce mathematical complexity - Failed

First approach for solution • Indeed looks very convincing as 53*N multipliers & 11*N trigonometric calculations can be reduced only by using Euler angles through all the algorithm run. But with a close look at the algorithm calculations, you can notice many cases of singularity that can't be solved by Euler angles without leading the algorithm to diverge. • Thus we choose to continue the project with the current verified algorithm using Quaternion calculations.

Initialization: Creates a new Set of N particles Propagation: using the INS data to propagate the particles in time Measurement Update: Using the GPS data to give weight to each particle Routine operation Normalization: Normalize all particles weight to a total sum = 1 Good Effective number of particles check To User Bad State vector revaluation Covariance matrix calculation Re-Sampling Regularization Data correction Re-weight Second approach for solution From sequential to Parallel implementation.

Second approach for solution With a proper parallelization of the algorithm the sequential blocks number can be reduced from 9 to 5 with a real feasibility to be implemented on the desired single FPGA.

Tools and hardware Starting from a point of view that Xilinx Virtex5 FPGA is our board for this tasks, we’ve defined the rest of the working tools. • Bluespec HDL . • Bluespec GUI (Compiler, Simulator) • DDR2 SDRAM external memory. • XUPV5-110T development enviornment.

Project goals • Learning Bluespec and pointing the language advantages/drawbacks. • Design, Built & simulate the top level design of the complete Algorithm infrastructure allowing future design of each of the algorithm blocks by individual groups. • Well describing the future tasks to accomplish the project.

Why Bluespec • Bluespec language syntax corresponds to fit today's large scale digital system design methodologies, with a special respect to parallel design. • It is interesting new design methodology.

Introduction to Bluespec • Bluespec system verilog or in short, Bluespec, is a relatively new high level HDL language. • Bluespec language is designed to provide a way to express high level hardware constructs in an easy and highly parameterized way. • The language syntax enables you to concentrate on high • level details of the design and to bring closer the way you • think to the way you write. • "methods" define an abstract, user defined, interface which can be translated into Verilog outputs and inputs, • "rules" which define a group of abstract operations which can be translated into combinational logic.

Introduction to Bluespec • Atomicity Bluespec rules is considered as an atomic operation: meaning that once you fired a rule, the operation of the rule cannot be interrupted till the rule have finished its logic.

Introduction to Bluespec Rules Methods Methods Rules Rules Registers, Registers, Methods Methods FIFO's, memory components, other submodules FIFO's, memory components, other submodules

The Parallelized algorithm Stage 5 – normalization using the same module as stage 2

Stage 1 • Initialization. • Sequentially randomizing N particle according to GPS and INS data. • Only Write to main memory. Particle Memory Normalization

Stage 2 Propagation: Measurement update: State vector revaluation: The above 3 modules includes a sequential calculation required a single particle at a time, Thus we can allow all the 3 modules to work in parallel. • Each particle first being propagated and than cascaded to the next two modules in parallel. • Measurement update rules works only when a GPS data is valid. • This stage already reads all N particles from the memory. Thus in order to save memory calls the measurement update module prepares the total weight for next stage. State Vector Revaluation Memory Propagation Measurement Update

Stage 3 Normalization: Resampling: Covariance matrix: Covariance matrix square root: • The covariance matrix calculation process is too big to stand the time constrains when taking in sequence to normalization. • In any Case, Normalization module cascades the data to prepare the covariance matrix square root in parallel with re-sampling modules. • At the end of normalization, if the data correction process is irrelevant, the process stops and the data is flushed. Memory Matrix Memory Normalization Memory 1 Resampling Memory 2 Covariance calculation

Stage 4 Regularization: Reweight: Where is a randomized vector • Regularization uses the pre-prepared data of the covariance matrix square root and the resampling data and cascades the results to Rewight. • The same as in stage 2, in order to save memory call cycles, Reweight prepare the total weight for stage 5, Normalization. Memory 2 Memory Matrix Memory Normalization Resampling Regularization

Stage 5 • Normalization The same module as in stage 2 is operating.

Quaternion to Euler and back • Those operation is a separate module that can be cascaded on the way where needed.

A word about timing • Roughly choosing a 150 MHz clock. • Total Time = 6.66 [ns] x 36 x 30,000 = 7.2 ms

Particles memories • Bluespec enables the user to encapsulate a Verilog code with a Bluespec methods. • The Particle memory controller was designed with 3 different spaces. • Main memory – for normal quaternion particles used in the routine operation of the algorithm. • Second and third memory – design to keep particles in their Euler angles form for the data correction process

Particles memories • Particle memory is N sequential & address independent. A Start signal is Asserted in the beginning of each stage. (inner design of the controller should control the addresses) • Read commands are given in advance by the memory controller to avoid data acquisition delay. The data is stored in a local FIFO. • The main memory controller is available at the top level design. • Note: a DDR2 optional burst write\read mode consists of 4*128 bit data that should fit the above tasks.

Covariance matrix memory • The covariance matrix calculation is a set of 17^2 multiplications per particle, creating an additive value of the complete matrix. • In order to stand in the time constrains, a single row calculation of the matrix with a data bus of 17*56 Bit need to be opened to the memory. • Concerning the above, “add” method (instead of write) is added to the covariance matrix memory. (adding entire row to the matrix SUM) • read command is done element by element. • The Covariance matrices memories is available only within the covariance matrix top module. • The Covariance matrix memory is the virtex5 internal block RAMs

Covariance matrix square root memory • The same as the Covariance matrix memory but, • Write method is done element by element [row,col] • Read method is done per Matrix row. (17*56 bit).

Modules design • The user receives an empty module, with predesigned interfaces (Methods) and inner Fifos & registers containing all the relevant data needed for a single particle calculation of its relevant algorithm phase. • In some cases, when a time constrain forces a certain register size or certain data flow, the data flow arrives at the correct size & flow sequence.

Modules design Single module block – for future individual design In Fifo Operational Rule or inner modules Data registers Out Fifo

Future tasks Project B • Understanding and creating a Bluespec wrapper for encapsulating a DDR2 memory controller for Xilinx Virtex5 FPGA. • Writing the Bluespec memory controller for the sequential Particles memory. • Simulating the controller. Future generations: • Writing all of the algorithm inner modules according to the final report descriptions of necessary constrains. The modules can be written in Verilog and encapsulated to Bluespec.

Summary • Top down design processes are easier to implement in Bluespec as no time scheduling is required. • Bluespec HDL encapsulation capabilities allows fast parallelism, simulations and test benches of large systems even if already written in Verilog. • the current BSV structure is operating properly, an inner design of each module can be done and simulated with the same code. • The Number of particles in the algorithm is open for changes without harming the algorithm operation.

The END

Final Presentation Annual project (Part A) Winter semester תשע"ב ( 2011 /12)