1 / 14

The ESW Paradigm

The ESW Paradigm. Manoj Franklin & Guirndar S. Sohi 05/10/2002. Observations. Large exploitable ILP, theoretically Close instructions dependent; parallelism possible further down stream Centralized resources is bad Minimizing comm cost is important. What about others?. Dataflow model

kaili
Télécharger la présentation

The ESW Paradigm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The ESW Paradigm Manoj Franklin & Guirndar S. Sohi 05/10/2002

  2. Observations • Large exploitable ILP, theoretically • Close instructions dependent; parallelism possible further down stream • Centralized resources is bad • Minimizing comm cost is important

  3. What about others? • Dataflow model + most general • unconventional PL paradigm • comm cost can be high • SS, VLIW (sequential) + temporal locality • large centralized HW • compiler too dumb • not scalable • ESW = dataflow + sequential

  4. Design Goals • Decentralized resources • Minimize wasted execution • Speculative memory address disambiguation • realizability Replace large dynamic window with many small ones

  5. How it works • Basic window • Single entry, loop-free, call-free block • Equal, superset or subset of basic block • Execute basic windows in parallel • Multiple independent stages • Complete with branch prediction, L1 cache, reg file…etc.

  6. Dist Inst Supply Optimization: Snooping on L2-L1 Cache traffic

  7. Dist Inter-Inst Comm • Architecture: • dist. future file • create/use masks for dep. check • Observation: • Register use mostly within basic block • The rest in subsequent blocks

  8. Dist DMem System • Problem: • Addr. space large, can’t create/use mask • Need to maintain consistency between multiple copies • Solution: ARB

  9. ARB • - Bits cleared upon commit • Restart stages when dependency violated • When load, forward values from ARB if already exists Q. What happens when ARB’s full?

  10. Simulation Environment • Custom simulator using MIPS R2000 pipeline • Up to 2 inst fetch/decode/issued/ per IE • Up to 32 inst per basic window • 4K word L1 cache, 64KB L2 DM Cache (100% hit rate, what??) • 3-bit counter branch prediction

  11. Results • Optimizations: • Moving up instruction • Expand basic window (in eqntott and expresso) Basic window <= basic block But is 100% cache hit rate reasonable?

  12. Discussion • Compare this to CMP? RAW? • Does the trade-off strike a balance?

  13. New Results (1) In order execution

  14. New Results (2) Out of order execution

More Related