1 / 17

Compressed Tag Architecture for Low-Power Embedded Cache Systems

Compressed Tag Architecture for Low-Power Embedded Cache Systems. Jong Wook Kwak and Young Tae Jeon Journal of Systems Architecture Volume 56, Issue 9, pp.419-428 Sep. 2010 Presenter: Chun-Hung Lai. Abstract.

xenos-love
Télécharger la présentation

Compressed Tag Architecture for Low-Power Embedded Cache Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compressed Tag Architecture for Low-Power Embedded Cache Systems Jong Wook Kwak and Young Tae Jeon Journal of Systems Architecture Volume 56, Issue 9, pp.419-428 Sep. 2010 Presenter: Chun-Hung Lai

  2. Abstract • Processor in embedded systems mostly employ cache architectures in order to alleviate the access latency gap between processors and memory systems. Caches in embedded systems usually occupy a major fraction of the implemented chip area. The power dissipation of cache system thus constitutes a significant fraction of the power dissipated by the entire processor in embedded systems. • In this paper, we propose the compressed tag architecture to reduce the power dissipation of the tag store in cache systems. We introduce a new tag-matching mechanism by using a locality buffer and a tag compression technique. The main power reduction feature of our proposal is the use of small tag space matching instead of full tag matching, with modest additional hardware costs. The simulation results show that the proposed model provides a power and energy-delay product reduction of up t0 27.8% and 26.5%, respectively, while still providing a comparable level of system performance to regular cache systems.

  3. What’s the Problem • The cache power dissipation constitutes a major fraction of the embedded processor • The tag bits requires a significant fraction of the cache area • However, conventional tag bits are unnecessarily large • Goal: reduce power consumption in the tag of cache memory • Propose a “Compressed Tag Architecture” cache • Use partial tag bits instead of full tag matching Number of Tag Bits VS. Cache Hit Ratio - 5~6 tag bits comparison provides the same level of full tag bit comparison Locality -> small fraction of addr. range Tags Reduced tags XXXX011 XXXX011 XXXX011 0110011 0110011 0110011 Process data set Address Space:

  4. Related Works Manage partial tag bit (upper address bits) information Reduce HW cost for data value predictor Energy-efficient cache architecture Address compression for on-chip address bus Fewer tag bits to uniquely identify each instruction Upper bits of recently occurring address are saved for compression Partial tag resolution [11] Partial tag generation [7] Partial match address compression [13] Focus on: reduce HW cost Small tag to enable only the the data array of the matched way Move the MSB tag bits from cache into an external register Partial tag comparison [9] Tag Overflow Buffering (TOB) [12] False hit Fragile when locality changes frequently Analyze and solve the locality change problem Solve the false hit problem Compressed Tag architecture for low power cache This paper:

  5. Preface • Address decomposition for compressed tag architecture • Locality Buffer (LoB) and Locality Compressed Bit (LCB) TagH: role of locality detection - If programs exhibit locality: the address of TagH bits are same for successive requests TagH is saved in a separated register, LOB Index bits of the LoB s Only TagL field is checked on a tag comparison - If hit, use LCB to find TagH

  6. The Compressed Tag Architecture • The shaded areas indicate the required cache modification Only TagL is checked LCB:LocalityCompressedBit 1 Index a cache set 2 TagL miss: - a cache miss 3 TagL hit: - LoB is accessed by LCB bits 4 LoB hit: - a final hit 5 • If LoB miss: • other LoB entries are checked while a cache line fill, if a match occurs: • - then a corresponding LCB is changed to indicate a correct LoB entry TagH is saved in a Locality Buffer

  7. If There are Still Misses in All LoB Entries • Goal: to support these locality changes • Add a Locality Miss Buffer (LoMB) • Save localities that are not included in LoB (provide a second chance) • Add hit counters for each entry of LoB and LoMB • Increase when a match occurs • Replacement mechanism for locality buffer If one of LoMB is hit LoMB hit counter ++; if (LoMB hit counter > threshold) { replace a LoB entry that has the smallest hit counter; } else { not allocate in LoB and cache } 1 2 Prevent a short locality changes If all LoMB are misses: All LoB entries are misses: Place into LoMB LoMB: - Check all LoMB entries

  8. Example of the Compressed Tag Cache Operation- 1 • Scenario: TagL is miss and hit in locality buffer 1. TagL miss: - mean: cache miss -> fetch from memory 3 2. LoB is accessed by LCB - LoB hit -> LoB hit counter ++ 1 3. Cache in L1 cache - the LCB bit of the hit LoB entry is updated in cache TagL miss 2 LoB hit • Note, LoB hit includes: • The entry indexed by LCB is hit • Hit in other entries

  9. Example of the Compressed Tag Cache Operation- 2 • Scenario: TagL is miss, miss in locality buffer, and hit in locality miss buffer 1. TagL miss: - mean: cache miss -> fetch from memory 2. LoB is accessed by LCB - LoB miss -> 3. Check all LoMB entries - LoMB hit -> LoMB hit counter++ LoB miss 4. If (LoMB hit counter >= threshold) - Replace LoB with LoMB - Cache in L1 cache (LCB bit is updated accordingly) 1 TagL miss 2 Locality buffer replacement 4 If (LoMB hit counter < threshold) - Feed to CPU without caching 3 LoMB hit (No caching)

  10. Example of the Compressed Tag Cache Operation- 3 • Scenario: TagL is miss, miss in locality buffer, and miss in locality miss buffer 1. TagL miss: - mean: cache miss -> fetch from memory 2. LoB is accessed by LCB - LoB miss -> 4 3. Check all LoMB entries - LoMB miss-> 4. Place into LoMB - If (LoMB available) insert; else select a candidate and replace; LoB miss 1 TagL miss 2 5. Feed to CPU without caching (No caching) 3 LoMB miss

  11. Overall Operations for Each Compressed Tag Component Example 1 Example 2 Example 3

  12. Number of “TagL” Bits VS. Miss Ratio(%) • For a 4KB cache, partial tag bits (TagL) varies from 0~8 • 1-6 tag bits (TagL) are enough to provide a comparable miss ratio with the full tag • On average, 6 bits are enough • When using 5 bits, the increase of miss ratio is 0.83% on average Bold number: miss ratio of partial tag policy becomes the same as full tag policy

  13. Number of LoB Entries VS. Miss Ratio (%) • Four entry LoB provides comparable miss ratios with the conv. Policy • Within a 0.3% variation Decide The proper number of LoB entry is four

  14. Number of LoMB Entries VS. Miss Ratio (%) • 2 entry LoMB provides comparable miss ratios with the conv. • Although 3 or 4 entry provides slightly better The proper number of LoMB entry is two The following energy evaluation will be made on the configuration : 4-entry LoB, 2-entry LoMB, and hit counter with threshold of one

  15. Cache Energy Saving For 4KB, 18% on average • For 4,8,16KB cache • The energy saving is up to 27.8%, 17.2%, 9.8% respectively • The proposed tag architecture provides more energy saving • In case of a small cache size

  16. Conclusion • This paper proposed an energy-efficient compressed tag architecture • Exploit the memory access locality exhibited by programs • Small fraction of addr. range -> partial tag bits are enough • Most of the tag bits are moved out of the cache into a Locality Buffer (LoB) • Locality changes are solve by LoMB and hit counters • Results show that the proposed scheme • The tag address bits is reduced to one-fourth of the original size • The energy saving is up to 27.8% • While still providing a comparable performance level with conv.

  17. Comment for This Paper • This paper is one of the innovation of the tag organization in my research tree • Reduce the number of tag bits during each tag comparison • Programmable active tag bits • TLB index-based tagging • Tag overflow buffering • Selective physical tag/virtual tag cache… • Things can be improved • The illustration of the LoB access can be overlapped with the cache access is not clear • Why LOB can be accessed while accessing the cache?? • Need LCB bits in cache to index a LOB entry • The related works for the tag innovation used in processor cache are not sufficient

More Related