Implementing a Functional/Timing Partitioned Microprocessor Simulator with an FPGA

HASim Nirav Dave*, Michael Pellauer*, Joel Emer†*, & Arvind* Massachusetts Institute of Technology* Computer Science and Artificial Intelligence Lab Cambridge, MA Intel Corporation VSSAD† Hudson, MA Implementing a Functional/Timing Partitioned Microprocessor Simulator with an FPGA

HASim - Why? • Micro-architectural Simulations are Important • Better estimates for expected outcomes • SW Simulations are slow to run • 100s of KIPs • HW “simulations” take a long time to design

Underlying Beliefs • Modeling something is generally easier than designing it • Don't need to be totally faithful to design for what you need • It's easy to make modeling mistakes • Need to insert checks to assure you didn't cheat • Appropriate partitioning improves reuses • Split computational aspects from timing

HASim – What? • HASim is a partitioned hardware simulation framework • Two Partitions: • Functional (FP) – Executes instructions • Timing (TP) – Responsible for determining the timing of the emulated machine

Token Gen Fet Dec Exe Mem LCom GCom HASim: The Picture TimingPartition Functional Partition Memory Bypassing Unit RegFile

Functional Partition Zoom – In TP Request to do Instruction i Response to TP’s Request <Token> <Token, DependencyInfo> Info From Prev. Stage <Token, Inst> Token Table Information to Next Stage <Token, DecodedInst> Decoder Unit To MapTable (in BypassUnit)

Functional Partition - Execute <Token> <Token, Result Value> <Token, DecodedInst> Token Table <Token, ExecedInst> Execute To RegFile (in BypassUnit)

Automated Checks • We'd like our model to: • Obey Causality of data usage • No reading values before they're created • Meet expected times for different stages • e.g. Decode of an instruction completes takes at least 1 cycle • Decode should not take more than two cycles Want very these very simple checks • Let's have the FP verify these!

Verifying Casuality • All execution interactions to the functional model are provided • Annotate all data with emulated clock it was created on • FP checks time on accesses of data

Leveraging FP structure for Timing • Sometimes the best way to model something is to just make it • Use the target designs cache structure as the FP's cache structure • Can just measure the number of target ticks • May need to record some more information (where misses occurred) to get appropriate timing

Similar Ideas - FAST • FAST – similar underlying beliefs • Differences: • SW vs. HW functional partitions • decoupled partitions vs. tight coupling • some additional correctness checks needed • Not clear which approach is more effective

Similar Ideas - UNUM • UNUM – Another parameterized HW framework • Much more emphasis on HW quality and structure • Much more work to generate • “Believable” low-level values • Aimed later in the design selection cycle

Current Progress • Initial functional partition • Singlescalar OOO design • Simple RISC ISA • Physical Reg File • Fast branch rewinds • Simple Pipeline Timing Partition

Future Progress • Porting a real ISA to design • x86 (w/ µops) • More complicated timing models • Reorder Buffer designs • Large Cache simulations

Thanks! {ndave, pellauer, arvind}@csail.mit.edu emer@intel.com

Implementing a Functional/Timing Partitioned Microprocessor Simulator with an FPGA