1 / 33

Training Software version v2.2

Training Software version v2.2. Levels of the Design Flow. A|RT Builder in his design environment. Generated HDL Design: Architecture. C description is mapped in a “one-to-one fashion into a sequential HDL description

zenda
Télécharger la présentation

Training Software version v2.2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Training Software version v2.2

  2. Levels of the Design Flow

  3. A|RT Builder in his design environment

  4. Generated HDL Design: Architecture • C description is mapped in a “one-to-one fashion into a sequential HDL description • For a single C function, the generated HDL Component is a Mealy machine with 2 parts: • Compute part • Reset/Update part

  5. Generated HDL Design: Architecture • The function hierarchy in C is preserved by the generation of a corresponding component hierarchy in the HDL • Logic synthesis can be performed bottom-up • Interchangeability of components with similar behavior

  6. Generated HDL Design: Compute Process • Purely combinatorial • Calculates outputs and next state, based upon inputs and current state (Mealy machine) • Signals are exchanged with lower-level components • Input signals and current state are copied into local variables first • calculations are done using local variables • next state and outputs are assigned to output signals

  7. Pure Combinatorial logic Example #include <fxp.h> void adv_comb( const Int<4> a, const Int<4> b, const Int<4> c, Int<4>& z ) { #pragma OUT(z) Int<4> tmp = a*b; z = tmp-c; }

  8. DSP data types • integer and fractional numbers are a special case of fixed point • fix <p,q> (ART designer & SystemC) p q 1 0 1 -19/8 = -2.375 1 1 1 fix <8,3> 1 0 -24 23 22 2-2 21 20 2-1 2-3 Scale factor 1/8 negative weight 2’s complement quantization error Same alu handles fix <8,1>, fix <8,2>, fix <8,3>, ... if q=0 then integer e.g. int <8,0> if q=p-1 then fractional e.g. int <8,7>

  9. DSP data types -19/8 1 1 1 0 1 1 Int <8,3> 1 0 Int <8,4> 97/16 1 0 1 0 0 1 0 0 -1843/128 1 1 1 0 1 1 1 0 0 0 1 1 1 0 1 0 Some processors (C54) have special instructions for fractional numbers s x x x s y y y -------- s s z z z z z z s z z z z z z 0 => if FRCT = 1

  10. C Type A|RT Library Type [signed] char Fix<8,0> unsigned char Ufix<8,0> Fix<16,0> [signed] short [int] unsigned short [int] Ufix<16,0> [signed] int Fix<32,0> unsigned int Ufix<32,0> [signed] long [int] Fix<32,0> unsigned long [int] Ufix<32,0> bool Uint<1> void Not mapped Data Type Conversion • The standard C types are mapped into A|RT Library types before being mapped into HDL types

  11. Overview Introduction Key Concepts Operating Procedures C Subset Exercise A|RT style guide Verification and logic synthesis Advanced Use Exercises

  12. Internal Organization of an A|RT Builder Project

  13. Compile Build Design Flow C code C Test Bench Flow Graph Verification HDL Test Bench VHDL Verilog

  14. Compile C code Flow Graph Build VHDL Verilog Compile Step • Compiles C Code into internal Flow graph • ANSI C • optionally extended with A|RT Library types • Optimizations • Constant Propagation • Dead Code Elimination • Renaming Elimination • Block Flattening • Results • Optimized Data Dependency Graph • C(++) testbench

  15. C code Compile Flow Graph Build VHDL Verilog Build step • Builds HDL Description • VHDL IEEE 1076-1987 • Verilog IEEE 1364-1995 • Options • HDL configuration • Test-bench configuration • Results • Synthesizable HDL description • HDL testbench

  16. Cross-Highlighting Between Source and HDL Using the right mouse button

  17. C subset • This part specifies the subset of the ANSI C language that is supported by A|RT Builder. • On top of the C subset, the fixed-point datatypes and operators provided by A|RT Library are fully supported. The ANSI C constructs that are NOT supported by A|RT Builder will be listed. For a number of constructs, the text describes in what way they are supported. Constructs that are not mentioned are fully supported.

  18. Overview Introduction Key Concepts Operating Procedures C Subset Exercise A|RT style guide Verification and logic synthesis Advanced Use Exercises

  19. Exercise 1: A New Project • A|RT Builder is organized in projects • Select Project>New (or <New> button in the toolbar) • Enter a name for your project

  20. Exercise 2: Import, Compile & Build • Select Source>Import… • Browse to the subdirectory that contain the training examples. • Select file ex1.cxx and press <open> • The Import dialog box: • Compile and Build the design

  21. Exercise 3: Inspecting the HDL • Look at the HDL design by selecting Reports>Design… • Use cross-highlighting for more transparent analysis. • Notice the following: • Parameters of the ‘A|RT Builder run’ and cross-reference information • Definition of the HDL component • Translation of the body of the C function to COMPUTE_PROC process • Signal casting

  22. Overview Introduction Key Concepts Operating Procedures C Subset Exercise A|RT style guide Verification and logic synthesis Advanced Use Exercises

  23. Purpose of the Test Benches • C Test Bench • created during compile step • helps in validating the design as entered and modified in A|RT Builder • HDL Test Bench • created during build step • allows you to verify whether the behavior of HDL output is identical to the C description • Requirements • input file(s) • reference file to compare with generated output file(s)

  24. Bench Behavior • Input files • Binary representations (or decimal values for the C bench) of the input signals, saved in ASCII format. 1 and 0 are the only valid characters. One input word per line. • Every input argument in the main C function must have a corresponding input file with filename: <name_of_C_input>.INP • The wordsize of the representation in the file must match the datatype wordsize • Output files • .OUT files are generated by the C test bench and the HDL test bench • Besides 0 and 1, they can contain X, Z and - characters

  25. Overview Introduction Key Concepts Operating Procedures C Subset Exercise A|RT style guide Verification and logic synthesis Advanced Use Exercises

  26. Pipelining • From input to output, a design contains a series of logical and arithmetic operations. • Suppose a specific clock rate is imposed on a design. • Pipelining is achieved by introducing pipeline registers in order to reduce the number of logic gates between registers. • Now, the outputs are delayed with the number of clockcycles equal to the number of pipeline stages. Valid outputs are obtained after an initial startup phase. For a long chain of logic, this will not always be feasible, depending on the gate delay parameters of the target technology.

  27. Pipelining • Achieved by introducing pipeline registers in order to reduce the number of logic gates between registers • Valid output now obtained after a startup phase of 1 clock cycle MAX clockrate = 20Mhz MAX clockrate = 33Mhz

  28. Pure Combinatorial logic Pipelined Pipelining : Example #include <fxp.h> void adv_comb( const Int<4> a, const Int<4> b, const Int<4> c, Int<4>& z ) { #pragma OUT(z) Int<4> tmp = a*b; z = tmp-c; } #include <fxp.h> void adv_pipe( const Int<4> a, const Int<4> b, const Int<4> c, Int<4>& z ) { #pragma OUT(z) static Int<4> tmp=0; z = tmp-c; Int<4> tmp_nxt = a*b; tmp = tmp_nxt; }

  29. Resource Sharing • Trading speed for area by using multiple clock cycles to execute the algorithm once. • This way, being able to share resources between different clock cycles. • However, the efficiency of the resource sharing depends on the used synthesis tool. • The synthesizing step can be steered in the right direction by describing the resource sharing explicitly in the C description.

  30. Purely combinatorial val1 * val2 out + val3 * val4 Resource Sharing : Example #include <fxp.h> void mac2( const Int<32> val1, const Int<32> val2, const Int<32> val3, const Int<32> val4, Int<32>& out ) { #pragma OUT(out) Int<32> prod1 = val1*val2; Int<32> prod2 = val3*val4; out=prod1 + prod2; }

  31. Implicit resource sharing (2 cycles) A new input is supplied every 2 cycles A valid output is obtained every second cycle Resource sharing depends on the intelligence of the synthesis tools val1 out * + DFF val2 val3 * val4 Resource Sharing : Example #include <fxp.h> void mac2_mult( const Int<32> val1, const Int<32> val2, const Int<32> val3, const Int<32> val4, Int<32>& out ) { #pragma OUT(out) static Int<32> prod1 = 0; static Uint<1> cycle = 0u; switch (cycle) { case 0: // process 1st cycle {prod1=val1*val2;} break; case 1: // process 2nd cycle {Int<32> prod2=val3*val4; out = prod1 + prod2;} break; } ++cycle; // update cycle if (cycle==2) cycle=0; }

  32. Resource Sharing : Example (2) • Explicit resource sharing (execution in 2 cycles) Int<4> result=inp1*inp2; switch (cycle) { case 0: // 1st cycle prod1=result; break; case 1: // 2nd cycle out = prod1 + result; break; } ++ cycle; if (cycle==2) cycle=0; } #include <fxp.h> void mac2_multi( const Int<4> val1, const Int<4> val2, const Int<4> val3, const Int<4> val4, Int<4>& out ) { #pragma OUT(out) static Int<4> prod1=0; static Uint<1> cycle = 0u; switch(cycle) { case 0: // 1st cycle inp1=val1; inp2=val2; break; case 1: // 2nd cycle inp1=val3; inp2=val4; break; } + DFF *

  33. Resource Sharing of For Loops • Hardware within loops will be generated as many times as there are loop iterations • costly • not feasible for high clockrates • When creating a state machine to execute the algorithm • the hardware within the loop can be shared over all loop iterations • every loop iteration will be performed in a single clock cycle

More Related