Efficient Metadata Management for Irregular Data Prefetching
530 likes | 918 Vues
Efficient Metadata Management for Irregular Data Prefetching. Hao Wu , Krishnendra Nathella, Dam Sunwoo, Akanksha Jain, Calvin Lin. Regular Prefetching. Some programs access memory sequentially e.g. MPEG player Regular prefetchers are effective and widely used e.g. Best offset prefetcher.
Efficient Metadata Management for Irregular Data Prefetching
E N D
Presentation Transcript
Efficient Metadata Managementfor Irregular Data Prefetching Hao Wu, Krishnendra Nathella, Dam Sunwoo, Akanksha Jain, Calvin Lin
Regular Prefetching • Some programs access memory sequentially • e.g. MPEG player • Regular prefetchers are effective and widely used • e.g. Best offset prefetcher D G F A B C E
The Problem: Irregular Accesses • Common in many programs • ~30% performance opportunity for irregular SPEC2006 benchmarks D C E A X B Y
Temporal Prefetchers • Memorize correlations • Replay memorized accesses D C E A X B Y
Temporal Prefetchers • High metadata overhead (10~20 MB) • Too large to fit on-chip • Metadata stored off-chip • Problematic! Cache Metadata Traffic Demand Accesses DRAM Metadata
Irregular Stream Buffer (ISB) [MICRO’13] • Introduced an on-chip metadata cache • Metadata cache synchronized with TLB ~4× overhead Cache Metadata Demand Accesses DRAM Metadata
Our Solution: Managed ISB (MISB) • A new metadata management scheme • Decouples metadata management from TLB • Prefetches metadata Cache Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
Background: ISB • Assign a structural address for each access in a stream • Convert irregular access streams to sequential streams D C E A X B Y Metadata
Background: ISB • Prefetch the next address in structural address space D C E A X B Y Metadata
Background: ISB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata
Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata
Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata
Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata
Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata
Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata
Deficiencies of ISB On-Chip Metadata TLB Demand Accesses On-Chip Metadata Size Required = TLB Size * Cache Lines Per Page
Deficiencies of ISB On-Chip Metadata Size Required = TLB Size * Cache Lines Per Page • Metadata is managed at coarse granularity • ~90% traffic is useless due to lack of spatial locality • Metadata size is proportional to page size • ISB does not scale to large pages • Metadata size is proportional to TLB size • ISB does not work for two-level TLBs
Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses
Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses
On-Chip Metadata MISB Operation A=? Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
On-Chip Metadata MISB Operation Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses A=71 DRAM Metadata
On-Chip Metadata MISB Operation A=71 Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses
On-Chip Metadata MISB Operation A=71 Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
On-Chip Metadata MISB Operation Cache 72=?, 73=? Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
On-Chip Metadata MISB Operation Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses 72=X, 73=B DRAM Metadata
On-Chip Metadata MISB Operation 72=X, 73=B Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses
On-Chip Metadata MISB Operation M=? Cache Metadata Off-Chip Metadata Metadata Prefetcher Useless Traffic! Demand Accesses DRAM Metadata
On-Chip Metadata MISB Operation M=? Cache Metadata Bloom Filter Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
On-Chip Metadata MISB Operation M= × Cache Metadata Bloom Filter Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata
Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses
Evaluation Methodology • Industrial Simulator • ARMv8 AArch64 • OoO Core • 2-level TLB • Bandwidth: 32GB/s • Multicore – ChampSim • Similar trends as the industrial simulator • SPEC2006 • Irregular Subset • CloudSuite
Evaluated Prefetchers • Global correlation MISB STMS & Domino ISB • PC localization • PC localization
Global vs. PC-Localization while ( ! end ) { read tree->next; if (condition) read linked_list->next; } F Ba1Aa2D C E a3 …. Global F B A D C E …. PC localization a1 a2 a3 …. • PC-localization: Segregate the global stream by the load instruction’s PC • PC-localized streams are more predictable! F a1 a2 a3 B G A C D E
Evaluated Prefetchers • Global correlation • Metadata not cacheable Idealized STMS & Domino MISB ISB • PC localization • Metadata cacheable • Prefetches metadata • PC localization • Metadata cacheable • Syncs metadata with TLB
Traffic Overhead 1316%
Traffic Overhead 1316% 70%
Conclusions • MISB manages metadata effectively • Uses fine grained metadata caching • Introduces a metadata prefetcher • Empirical results • 70% traffic overhead vs. 342% for STMS • 23% speedup vs. 10% for idealized STMS • MISB makes temporal prefetching practical Scan QR Code for More Info
Thank you! Scan QR Code for More Info