1 / 16

Macro instruction synthesis for embedded processors

Macro instruction synthesis for embedded processors. Pinhong Chen Yunjian Jiang (william) - CS252 project presentation. Control. I/D Mem . Macro Instr. Ext. ALU. control. Reg Bus unit. Reg/Mem Access. Motivation. Start from a simple processor core

hong
Télécharger la présentation

Macro instruction synthesis for embedded processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation

  2. Control I/DMem. Macro Instr. Ext. ALU control Reg Bus unit Reg/Mem Access Motivation • Start from a simple processor core • Find new macro instructions to enhance performance and reduce code size • Application-specific • Using dedicated hardware to speed up Application

  3. RISC8 Architecture • Why RISC8? • Simple • 8-bit ISA with 43 Instructions • Addressable space 64K bytes • Complete ISA, including • Load/Store, Arithmetic, Logical , Branch, Multiplication,Division, Stack Operation, Subroutine call, Interrupt Operations, etc. • Small • Verilog core size is 3.5K gates in 0.25um • clock speed of 300MHz is reported (our result is about 200MHz) • Synthesizable RTL Core • Free assembler

  4. Instr. Profiling Istr. Syn Istr. Syn Istr. Syn Methodology Application (*.c) Front end performance IR (exp. tree) Code Gen. simulation RTL exp. tree Asm. code Assembler mach. code

  5. ASSIGN ADD VAR AND VAR VAR CON Different Levels of expression trees sum += c & 5 ASSIGN ASSIGN reg byte MOV acc addr16 ADD ADD acc AND VAR reg AND VAR con08 byte acc reg addr16 byte con08 Reconstructed from mach. code SUIF IR RTL IR after code gen

  6. Expression trees SUIF IR • Data type carried • Inaccurate cost • No profiling • Simple – less tree nodes • Machine independent • Register level • Data type carried • One-to-one between macro instructions • Profiling data can be back annotated • Machine dependent • Machine code • Data type lost • One-to-one between machine instructions • Profiling data accurate • Large expression trees • Machine dependent

  7. Instruction Enumeration • Traverse tree structure in post-order • Normalize sub-tree orders • Combine patterns from sub-trees • Hash new instruction patterns • Collect register usage and memory access for evaluation • Annotate profiling information ADD acc reg AND byte acc reg byte con08

  8. Machine Code Level Tree Reconstruction • Build IR tree from machine codes • Recover data dependencies from assembly code • Clear definition by ISA • eg. AND r2 ==> acc=acc & r2 • Limited to a basic block • Eliminate intermediate storage nodes ADD acc reg AND byte acc reg byte con08

  9. Machine Code Level Tree Reconstruction • Build IR tree from machine codes • Recover data dependencies from assembly code • Clear definition by ISA • eg. AND r2 ==> acc=acc & r2 • Limited to a basic block • Eliminate intermediate storage nodes ADD AND byte byte con08

  10. Table-Driven Assembly Development Tools New Instruction Candidates Istr. Syn New Instr. Select Instr. Profile Special Instr. Special Instr. Simulator Disassembler Instr. Table performance Asm. code Assembler mach. code Asm. code

  11. Table-driven back-end tool automation @new_ins=( 'mac'=>{otree=>['r0','nADD','r0',['nMUL','Rn','addr16']], pattern=>'Rn addr16', code=>['00000011','00000$Rn','$addr16[0]','$addr16[1]'], sim=>'$R[0]+=$R[$Rn]*$memory[$addr16]', cycles=>13, decode=>'$Rn=$memory[$pc++] & 0x7; $addr16[0]=$memory[$pc++]; $addr16[1]=$memory[$pc++]; $addr16=$addr16[0]|($addr16[1]<<8);‘ });

  12. Op-Code Reuse • Op codes may not be fully used in a specific application • Remove un-used instruction op-codes • Typical applications use far less than 256 op-codes • Cost of op-code reuse • Decoding logic • Less flexibility

  13. Implementation • Compiler front-end: SUIF • Code generator: SPAM-olive • Retargeted to RISC8 • RTL pattern enumeration: C++ • RISC8 assembler: PERL • RISC8 simulator: PERL • Machine level pattern enumeration: PERL • Macro driven instruction implementation automation: PERL

  14. Benchmarks

  15. GSM encoder • Hardware/software tradeoff • Software gain: execution speed, code size • Hardware cost: functional unit, decoding logic, data path configuration

  16. Conclusions • RTL level pattern enumeration • Key to automating instruction identification, code-generation, assembly and simulation • No need to change algorithm source code • Hardware/software trade-off • Good estimation of performance gain and hardware cost at register-transfer level • Op-code reuse

More Related