150 likes | 285 Vues
This work presents HASim, a partitioned hardware simulation framework designed to improve micro-architectural simulations. By separating the functional and timing aspects, HASim allows for better estimates and faster simulations compared to traditional software methods. The functional partition executes instructions while the timing partition manages the timing of the emulated machine. With built-in automated checks ensuring causality and meeting expected timing constraints, HASim aims to facilitate more accurate and efficient microprocessor designs, ultimately streamlining the design process for complex architectures.
E N D
HASim Nirav Dave*, Michael Pellauer*, Joel Emer†*, & Arvind* Massachusetts Institute of Technology* Computer Science and Artificial Intelligence Lab Cambridge, MA Intel Corporation VSSAD† Hudson, MA Implementing a Functional/Timing Partitioned Microprocessor Simulator with an FPGA
HASim - Why? • Micro-architectural Simulations are Important • Better estimates for expected outcomes • SW Simulations are slow to run • 100s of KIPs • HW “simulations” take a long time to design
Underlying Beliefs • Modeling something is generally easier than designing it • Don't need to be totally faithful to design for what you need • It's easy to make modeling mistakes • Need to insert checks to assure you didn't cheat • Appropriate partitioning improves reuses • Split computational aspects from timing
HASim – What? • HASim is a partitioned hardware simulation framework • Two Partitions: • Functional (FP) – Executes instructions • Timing (TP) – Responsible for determining the timing of the emulated machine
Token Gen Fet Dec Exe Mem LCom GCom HASim: The Picture TimingPartition Functional Partition Memory Bypassing Unit RegFile
Functional Partition Zoom – In TP Request to do Instruction i Response to TP’s Request <Token> <Token, DependencyInfo> Info From Prev. Stage <Token, Inst> Token Table Information to Next Stage <Token, DecodedInst> Decoder Unit To MapTable (in BypassUnit)
Functional Partition - Execute <Token> <Token, Result Value> <Token, DecodedInst> Token Table <Token, ExecedInst> Execute To RegFile (in BypassUnit)
Automated Checks • We'd like our model to: • Obey Causality of data usage • No reading values before they're created • Meet expected times for different stages • e.g. Decode of an instruction completes takes at least 1 cycle • Decode should not take more than two cycles Want very these very simple checks • Let's have the FP verify these!
Verifying Casuality • All execution interactions to the functional model are provided • Annotate all data with emulated clock it was created on • FP checks time on accesses of data
Leveraging FP structure for Timing • Sometimes the best way to model something is to just make it • Use the target designs cache structure as the FP's cache structure • Can just measure the number of target ticks • May need to record some more information (where misses occurred) to get appropriate timing
Similar Ideas - FAST • FAST – similar underlying beliefs • Differences: • SW vs. HW functional partitions • decoupled partitions vs. tight coupling • some additional correctness checks needed • Not clear which approach is more effective
Similar Ideas - UNUM • UNUM – Another parameterized HW framework • Much more emphasis on HW quality and structure • Much more work to generate • “Believable” low-level values • Aimed later in the design selection cycle
Current Progress • Initial functional partition • Singlescalar OOO design • Simple RISC ISA • Physical Reg File • Fast branch rewinds • Simple Pipeline Timing Partition
Future Progress • Porting a real ISA to design • x86 (w/ µops) • More complicated timing models • Reorder Buffer designs • Large Cache simulations
Thanks! {ndave, pellauer, arvind}@csail.mit.edu emer@intel.com