90 likes | 209 Vues
Explore the extensive MacSim tutorials presented at ISCA-39 in 2012, providing insights into architectural studies. This resource covers a range of topics, including thread fetch policies, branch predictors, cache performance, DRAM scheduling, and interconnection studies. Learn about software and hardware prefetchers, cache sharing, TLP-aware management, and the impact of instruction fetch on GPGPU performance. The tutorials also delve into power modeling and verification for CPUs and GPUs, making it a vital asset for researchers and practitioners in computer architecture.
E N D
MacSim Architecture Studies MacSim Tutorial (In ISCA-39, 2012)
Architecture Studies Using MacSim • Thread fetch policies • Branch predictor • Software and Hardware prefetcher • Cache studies (sharing, inclusion) • DRAM scheduling • Interconnection studies • Power model Front-end Memory System Misc. MacSim Tutorial (In ISCA-39, 2012)
Prefetcher Study MacSim Trace Generator (PIN, GPUOCelot) Frontend Memory System Software prefetch instructions PTX prefetch, prefetchu x86 prefetcht0, prefetcht1, prefetchnta Hardware prefetch requests Hardware Prefetcher Stream, stride, GHB, … • Many-thread Aware Prefetching Mechanism [Lee et al. MICRO-43, 2010] • When prefetching works, when it doesn’t, and why [Lee et al. ACM TACO, 2012] MacSim Tutorial (In ISCA-39, 2012)
Cache and NoC Studies $ $ $ $ $ $ $ Private Caches Interconnection Interconnection Shared $ Shared Cache • TLP-Aware Cache Management Policy [Lee and Kim, HPCA-18, 2012] Cache studies – sharing, inclusion property On-chip interconnection studies MacSim Tutorial (In ISCA-39, 2012)
Heterogeneity Aware NoC • Heterogeneous link configuration CPU GPU MC Ring Network Different topologies L3 C C M M C C M M C0 C1 C2 G0 G1 G2 C C G G M1 M0 L3 L3 L3 L3 C0 G0 C2 G1 C1 G2 C C G G M1 M0 L3 L3 L3 L3 • On-chip Interconnection for CPU-GPU Heterogeneous Architecture [Lee et al. under review] MacSim Tutorial (In ISCA-39, 2012)
Instruction Fetch and DRAM Scheduling Trace Generator (GPUOCelot) Frontend RR, ICOUNT, FAIR, LRF, … Execution DRAM FCFS, FRFCFS, FAIR, … • Effect of Instruction Fetch and Memory Scheduling on GPU Performance [Lakshminarayana and Kim, LCA-GPGPU, 2010] MacSim Tutorial (In ISCA-39, 2012)
DRAM Scheduling in GPGPUs DRAM Bank DRAM Controller Qs for Core-0 Qs for Core-1 Potential of Requests from Core-0 = |W0|α + |W1|α + |W2|α+ |W3|α = 4α+ 3α+ 5α (α < 1) Reduction in potential if: row hit from queue of length L is serviced next Lα – (L – 1)α row hit from queue of length L is serviced next Lα – (L – 1/m)α m = cost of servicing row miss/cost of servicing row hit Tolerance(Core-0) < Tolerance(Core-1) select Core-0 Servicing row hit from W1 (of Core-0) results in greatest reduction in potential, so service row hits from W1 next W0 W1 W2 W3 W0 W1 W2 W3 RH RM RM RM RM RH RM RM RM RH RM RM RM RH RH RM RM Core-0 Core-1 Tolerance(Core-0) < Tolerance(Core-1) • DRAM Scheduling Policy for GPGPU Architectures Based on a Potential Function [Lakshminarayana et al. IEEE CAL, 2011] MacSim Tutorial (In ISCA-39, 2012)
Power Research & Validation • Verifying simulator and GTX580 • Modeling X86-CPU power • Modeling GPU power • Still on-going research MacSim Tutorial (In ISCA-39, 2012)
MacSim’s Roadmap OpenGL Program ARM Architecture Mobile Platform Power/Energy Model 2012 ~ 2013 MacSim Tutorial (In ISCA-39, 2012)