430 likes | 599 Vues
APPLIED SIGNAL PROCESSING AND IMPLEMENTATION (ASPI). Introduction for 7th semester Fall 2005. Embedded Systems group: pk, yml, abo, ssc, jmk, dlc, rab, oo Dicom group: kjh, pr, uh, . Outline. Rationale for ASPI Basic ASPI Model (A 3 ) Trends: S8 -> S9 -> S10 Course structure
E N D
APPLIED SIGNAL PROCESSING AND IMPLEMENTATION(ASPI) Introduction for 7th semester Fall 2005 Embedded Systems group: pk, yml,abo, ssc, jmk, dlc, rab, oo Dicom group: kjh, pr, uh, ....
Outline • Rationale for ASPI • Basic ASPI Model (A3) • Trends: S8 -> S9 -> S10 • Course structure • Project examples: S8 – S9/S10 • Lab facilities • Demonstrations • Conclusion
Rationale for ASPI/1 • Embedded System: • a collection of heterogeneous parts • subject to stringent design constraint such as ...
Rationale for ASPI/2Embedded Systems From To Nokia 7710
Rationale for ASPI/3Shannon Beats Moore’s Law and Energy Plays a Major Role Algorithmic Complexity (Shannon’s Law) 3G Processor Performance (~Moore’s Law) 2G Battery Capacity 1G Source: Jan Rabaey, Summer Course, 2000
Basic ASPI Model (A3) Applications Algorithms Architectures Equalizer FIR/IIR DSP/FPGA For each application => many candidate algorithms For each algorithm => many implementation architectures => Large no. of solutions => Large Design Space => ASPI challenge
FPGA • FPGA components: • Dedicated I/O blocks • Programmable LogicArrayBlocks (LAB)- combinatorial / seqential circuits- routing resources • Dedicated blocks- RAM blocks- multipliers- processors (ARM/PowerPC) • Development tools
ASPI Design Principle Pipelined Serial Parallel • Transform a serial specification into a combination of: • Serial, parallel and pipelined units • That satifies the design constraints: Area, Time => Power
Trends: S8 -> S9 -> S10 Applications 1 Algorithms 2 Architectures 3 • Application: Non-Linear Signal Processing/Mobile Communication • Algorithm selection • Simulation • Architecture selection and mapping • Example later
Compiler optimization Compiler optimiser C code modifications
Trends: S8 -> S9 -> S10 Applications 1 Algorithms 2 Architectures 4 5 3 • Application: Non-Linear Signal Processing/Mobile Communication • Algorithm selection • Simulation • Architecture selection and modelling • Design Space Exploration • HW/SW Co-Design
Design Space Exploration Amax Tmax Constraints: Area, Time => Power = Area*fclock Area Possible solutions (A*T ~ K) Time
Trends: S8 -> S9 -> S10 Applications Constraints Algorithms Properties Architectures • Implementing a complete design trajectory • With solutions where properties satisfies constraints
ASPI Course Structure Algorithm analysis SW Platform analysis HW Platform analysis SW compilers HW compilers Design Space Expoloration Design Methodology 8.sem 9.Sem
9th Semester Courses EL : ELective Course
Technology • Simulation tools / Language: • Matlab/M • Ptolemy/(M)any • Design Trotter/C • Processors / Language: • ARM/ C++, ASM • TI 320-6413/C++, ASM • Blackfin/ C++, ASM • Microblaze/ C++, ASM • NIOS/ C++, ASM • Programmable Logic: • Xilinx FPGA/ Handel-C • Altera FPGA/ Handel-C
Technology Lab facilities Xilinx Virtex FPGA Celoxica RC203 board
Technology Lab facilities Altera Stratix FPGA Altera Stratix board
Technology Lab facilities Analog Devices Blackfin board Analog Devices Blackfin DSP
Project Examples: S8/S9/S10 • S8 Noise Suppression in Speech • S9 FPGA implementation of a JPEG 2000 encoder/decoder • Reed Solomon Decoder for DVB-H • Most projects involves external contacts in other research groups or companies
Noise Suppression in Speech ASPI 8, Gruppe 840 Søren Birk Sørensen Andreas Popp Michael Smed Kristensen
Agenda • Applikation • Systemoversigt • Algoritme • Princip i algoritme • Resultater • Arkitektur • Implementation
Systemoversigt • Krav • Forbedring af taleforståelighed • Forbedring af signal-støj-forhold (SNR) • Acceptabel forsinkelse i systemet (latenstid)
Resultater SNR ikke væsentligt forbedret Taleforståelse: Fra ”Very poor” til ”Good” Latenstid: 35 ms
Implementation • Dele af algoritmen blev implementeret på et TI TMS320C6713 udviklingsboard • Floating point • Varierende pipeline dybde • 8 instruktioner i parallel • Analysere resultat af compilering • Efterfølgende optimering
Foretagede optimeringer • Eksekveringstid • Anden algoritme til autokorrelationsberegning • Loop unrolling giver mere parallelitet • Informere kompiler om dataafhængighed • Udnyttelse af pipeline • Anden divisionsberegning • Kortere eksekveringstid
Resultat af optimering • Autokorrelationsberegning • 24096 cycles 2624 cycles • 153% mere end estimeret minimum antal cycles • Levinson funktion • 3842 cycles 1122 cycles • 26% mere end estimeret minimum antal cycles
9th semester project example”FPGA implementation of a JPEG 2000 encoder/decoder”
FPGA implementation of a JPEG2000 encoder/decoder • Motivation • JPEG2000 is up to six times more complex to implement than JPEG • 2 complex DSP algorithms at the heart of JPEG2000 • Discrete Wavelet Transform (DWT) • Embedded Block Coding with Optimized Truncation (EBCOT) • FPGAs provide the ability to accelerate arithmetic operations via parallel processing JPEG2K Block diagram (encoder)
FPGA implementation of a JPEG 2000 encoder/decoder • Project flow • Analysis of reference C-code • processing analysis (search for potential parallelism) • memory analysis (memory requirements) • Sketch an architecture based on the analysis (architectural exploration) • FPGA implementation • Handel-C language to describe the architecture • Handel-C to FPGA (Celoxica Design-suite) • Analysis -> architectural refinement
S10 Project: Reed-Solomon Decoder Nokia 7710 Parity Data Data • Application: • from DVB-T to DVB-H • FEC: RS(n,k,t) => RS(255, 191, 64) • Constraints: • Frame size: upto 2 MB • Data rate: 2 MB/S • Time constraint: ASAP
S10 Project: Reed-Solomon Decoder • Complexity: • Execution on ARM: 22 min/2MB frame
S10 Project: Reed-Solomon Decoder • Algorithm: • Galois field arithmetic GF(28) • Data: 8 bit bytes • operators: binary +, *, not • Properties: • no carry, overflow or rounding error => • bitwise operations In parallel • Short critical path (delay) => high clock rate • Identification of parallelism • coarse grain @ function level • fine grain @ operations level
S10 Project: Reed-Solomon Decoder • Results: • Execution on ARM: 22 min/2MB frame • Parallelism:the error locator and the evaluator polynomial can be computed concurrently • Reusable DataPath: Syndrome computation, Chien Search, polynomial evaluation and error correction can be performed on the same parallel DataPath
S10 Project: Reed-Solomon Decoder • Results: • DataPath: 65 8 bit blocks • Design Space Exploration:
Conclusion • ASPI salient features: • based on Models and Methods • application independent but also • application related • encompasses new technologies and tools • driven by current research projects • local & global industry cooperation Any questions - • before student presentation continues
Reklame Min A3 'opdragelse' er kommet rigtig til gavn – vi veksler frem og tilbage mellem applikation, algoritme og arkitektur noejagtig som vi gjorde i de gode gamle dage i VLSI gruppen. Desvaerre faar vi ikke gjort meget ved aritmetikken – syntese vaerktoejerne kommer med meget effektive modulgeneratorer for multipliers, adders etc. – og I den 0.18u teknologi vi arbejder i er de mere end rigeligt hurtige. Saa aritmetikken er mere en del af min baggrund for at forstaa hvad modul generatorerne spytter ud - og hvordan vi bedst udnytter dem. (Og dog - det lysner - jeg skal til at designe en divider for naeste generation IC !-) Uddrag af e-mail fra: Jack Andersen <jandersen@d2audio.com>
ASPI Home Page, Staff etc • Home Page: http://kom.aau.dk/~dsp/aspi-05/sites/default/ • Secretary: • Dorthe Sparre, NJV12 A5-214, Tlf. 9635 8616, dsp@kom.aau.dk • Staff: • Peter Koch, Yannick LeMoullec, Ole Olsen • Daniel Lázaro Cuadrado, Anders B. Olsen, Jesper Michael Kristensen, Søren Skovgaard Christensen, Rasmus Abildgren • Location: • Offices: B1-208, -211, -213, NJV12 A5-207 • Lab: NJ14 3-015 • Students: A6-108