300 likes | 484 Vues
STONY BROOK UNIVERSITY. Department of Electrical and Computer Engineering Stony Brook University. Dissertation Defense. ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS. Miodrag Bolic. Advisor: Prof. Petar M. Djuric. Outline. PART III: Implementation of PFs.
E N D
STONY BROOK UNIVERSITY Department of Electrical and Computer Engineering Stony Brook University Dissertation Defense ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Miodrag Bolic Advisor: Prof. Petar M. Djuric
Outline • PART III: Implementation of PFs • PART I: Introduction • Motivation and goals • Challenges • VLSI signal processing architectures • Methodology • Non-parallel implementation • Algorithm characteristics • Modifications of the PF • New resampling algorithms • Architecture • Implementation results • Parallel implementation • Propagation of particles • Parallel resampling • Architectures for parallel resampling • Space exploration • Gaussian PFs • PART II: Theory of PFs • Dynamic model • Monte Carlo sampling • Importance sampling • Resampling • Bearings-only tracking example • Steps and complexity • Conclusions and future work
Observed signal t Estimation t PARTICLE FILTER CHIP Introduction – Motivations andGoals Particle Filter sensor Goal • Increase speed of particle filters
Introduction -Challenges • Challenges • Reducing computational complexity • Randomness – difficult to exploit regular structures in VLSI • Exploiting temporal and spatial concurrency • Contributions • First hardware implementation of particle filters (50 times improvement in speed in comparison with DSP) • New resampling algorithms suitable for hardware implementation • Fast particle filtering algorithms that do not use memories • First distributed algorithms and architectures for particle filters
Outline • PART III: Implementation of PFs • PART I: Introduction • Motivation and goals • Challenges • VLSI signal processing architectures • Methodology • Non-parallel implementation • Algorithm characteristics • Modifications of the PF • New resampling algorithms • Architecture • Implementation results • Parallel implementation • Propagation of particles • Parallel resampling • Architectures for parallel resampling • Space exploration • Gaussian PFs • PART II: Theory of PFs • Dynamic model • Monte Carlo sampling • Importance sampling • Resampling • Bearings-only tracking example • Steps and complexity • Conclusions and future work
Theory of PFs – Dynamic model • Example: Bearings-only tracking • States: position and velocity xk=[xk, Vxk, yk, Vyk]T • Observations: angle zk • General dynamic model • Observation equation: zk=atan(yk/ xk)+vk • State equation: zk=fz(xk,vk) xk=Fxk-1+ Guk xk=fx(xk-1,uk) fx state transition function uk process noise fz measurement function vk observation noise
Use of knowing the posterior All kinds of estimates can be calculated Gaussian processes and linear model Non-Gaussian processes and/or non-linear model Kalman filter Particle filter Theory of PFs – Bayesian approach Objective in Bayesian approach p(x0:k|z1:k) posterior distribution xk? State space model Problem Solution Estimate posterior Integrals are not tractable Monte Carlo Sampling Difficult to drawsamples Importance Sampling
t Theory ofPFs– Monte Carlo Sampling Densities can be approximated by discrete random measures: Particles and Weights State space model Problem Solution Estimate posterior Integrals are not tractable Monte Carlo Sampling Difficult to drawsamples • χapproximates the density p(x) • Integrals simplify to summations Importance Sampling
Theory ofPFs - Importance Sampling 2.Updating of the weightsBayes theory Objective: Approximate a density p(x) by a discrete random measure State space model Problem Solution • Steps: Estimate posterior Integrals are not tractable 1.Generation of particlesproposal density Monte Carlo Sampling Difficult to drawsamples Importance Sampling
Theory ofPFs - Resampling Particles after resampling Particles after resampling • Problems: • Weight Degeneration • Wastage of Computational resources time Solution RESAMPLING Replicate particles in proportion to their weights
Theory ofPFs - Bearings-Only Tracking Example (Cont.) • Blue – True trajectory • Red – Estimates
Theory ofPFs – Steps and Complexity New observation Particle generation 1 2 M . . . Output estimates Output More observations? Complexity Initialize particles Bearings-only tracking problem Number of particles M=1000 4M random number generations 1 2 M . . . M exponential and arctangent functions Weigth computation Normalize weights Propagation of the particles Resampling yes no Exit
Outline • PART III: Implementation of PFs • PART I: Introduction • Motivation and goals • Challenges • VLSI signal processing architectures • Methodology • Non-parallel implementation • Algorithm characteristics • Modifications of the PF • New resampling algorithms • Architecture • Implementation results • Parallel implementation • Propagation of particles • Parallel resampling • Architectures for parallel resampling • Space exploration • Gaussian PFs • PART II: Theory of PFs • Dynamic model • Monte Carlo sampling • Importance sampling • Resampling • Bearings-only tracking example • Steps and complexity • Conclusions and future work
Implementation of PFs – VLSI Signal Processing Architectures • Types of architectures • Programmable digital signal processors • Application-domain specific processors • Application specific processors • Application specific processors • Speed is the main goal • Functionality of the system does not change • Approach • Temporal and spatial concurrency • One-to-one mapping between operations and hardware blocks • FPGA implementation
Algorithmiclevel Architecturelevel RT level Gate level Complexity Joint algorithmic and architectural design • To increase performances, algorithms must be matched to architectures Impact of adesign decision System level Implementation of PFs – Methodology
Implementation of PFs – Algorithm Characteristics Start New observation Particle generation 1 2 M . . . 1 2 M . . . Weight computation Resampling Propagation of particles Exit
Implementation of PFs – Modifications of the PF Modifications Architecture Algorithm Fine-grain pipelining Avoiding normalization Spatial concurrency Loop transformations Dedicated hardware Finite precision arithmetic Addressing schemes
Implementation of PFs – Implementation results • Hardware platform is Xilinx Virtex-II Pro • Clock period is 10ns • PFs is applied to the bearings-only tracking problem • 1000 particles is used • Sampling frequency • Resources • DSP: ~ 1kHz • FPGA: ~ 50 kHz • Logic blocks: 4% • Memories: 3% • Percentage of utilization of the PF blocks
1 1 M M 1 1 M M Implementation of PFs – Parallelism Start • Universal architecture with a central unit New observation Particle generation Processing Element 1 Processing Element 2 2 . . . Central Unit 2 . . . Weight computation Processing Element 3 Processing Element 4 Resampling Propagation of particles • Processing elements (PE) • Particle generation • Weight computation • Central Unit • Algorithm for particle propagation • Resampling Exit
PE 2 PE 1 PE 3 PE 4 Implementation of PFs – Propagation of Particles time Particles after resampling Disadvantages of the particle propagation step • Random communication pattern • Decision about connections is not known before the run time • Requires dynamic type of a network • Speed-up is significantly affected t Processing Element 1 Processing Element 2 Central Unit Processing Element 3 Processing Element 4
N=4 N=0 N=4 N=8 4 4 1 1 1 2 2 1 4 1 1 4 1 3 3 4 4 N=4 N=0 N=4 N=8 Implementation of PFs – Parallel Resampling N=0 N=13 1 2 3 4 N=0 N=3 • Solution • The way in which Monte Carlo sampling is performed is modified • Advantages • Propagation is only local • Propagation is controlled in advance by a designer • Performances are the same as in the sequential applications • Result • Speed-up is almost equal to the number of PEs (up to 8 PEs)
Central Unit Implementation of PFs Architectures forParallel Resampling • Controlled particle propagation after resampling PE1 PE3 PE2 PE4 Architecture that allows adaptive connection among the processing elements
Limit: Available memory Limit: Logic blocks Implementation of PFs – Space exploration • Hardware platform is Xilinx Virtex-II Pro • Clock period is 10ns • PFs are applied to the bearings-only tracking problem
Start New observation • Advantages • Sampling period is minimal ~ MTclk • No need for memories for storing particles • Simple communication in parallel implementation • Disadvantages Computing the mean and the covariance matrix • Higher computational complexity • Limited scope of applications Implementation of PFs – Gaussian PFs • Functionality No • Propagates only first two moments • Approximates densities by Gaussians • No need for resampling Yes Drawing conditioning particles 1 2 M . . . 1 2 M . . . Particle generation 1 2 M . . . Weight computation Exit
Implementation of PFs – Gaussian PFs (cont.) Minimum sampling period versus number of PEs of parallel GPFs and SIRs
Conclusions and Future Work • Summary • Modification of the algorithms to be suitable for hardware implementation • Development of parallel algorithms and architectures • Implementation of the particle filter in FPGA • Analysis of the other types of particle filtering algorithms • Future work • Simplifying floating to fixed-point conversion • Developing application-domain specific processor for PFs • Developing reconfigurable architectures for PFs
STONY BROOK UNIVERSITY Department of Electrical and Computer Engineering Stony Brook University Dissertation Defense ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Miodrag Bolic Advisor: Prof. Petar M. Djuric