1 / 20

Using Prediction to Accelerate Coherence Protocols

Using Prediction to Accelerate Coherence Protocols. Authors : Shubendu S. Mukherjee and Mark D. Hill 1998. Proceedings. The 25th Annual International Symposium on Computer Architecture Publication Date: 27 Jun-1 Jul 1998 On page(s): 179-190. Presenter : Naresh Sukumar. Motivation.

dugan
Télécharger la présentation

Using Prediction to Accelerate Coherence Protocols

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Prediction to Accelerate Coherence Protocols Authors : Shubendu S. Mukherjee and Mark D. Hill 1998. Proceedings. The 25th Annual International Symposium on Computer Architecture Publication Date: 27 Jun-1 Jul 1998 On page(s): 179-190 Presenter : Naresh Sukumar

  2. Motivation • In multi processors using directory protocols, some memory references suffer long latencies for misses to remotely-cached blocks. • To ameliorate this latency, standard coherence protocols have been augmented with optimizations for specific sharing patterns (eg. Read-modify-write, producer-consumer and migratory sharing • This paper aims to create a general prediction logic that adapts to the actual patterns encountered during operation.

  3. What will be covered ? • Introduction to the directory protocol • General behavior of a predictor. • The Cosmos coherence message predictor. • Integrating Cosmos with a coherence Protocol. • Benchmarking the Cosmos • Analysis of the Results • Conclusions

  4. Introduction to the Directory Protocol • Preferred method of cache coherence in large-scale shared-memory multiprocessors. • This protocol associates state with both caches and memory at the granularity of a cache block. • To simplify discussion, this paper considers a full-map and write-invalidate directory protocol. A sample of coherence messages usually found in full-map, write-invalidate coherence protocols.

  5. Disadvantages • It often incurs multiple long-latency operations. • A directory may need to exchange messages with other caches before it can respond to a processor's request for a memory block. A store action to a block residing in another node’s cache

  6. General Behavior of a Predictor • Predictors predict future sharing patterns and take actions to overlap coherence message activity with current work. • Types: • Read-modify-write • Pair-wise sharing • Dynamic self-invalidation • Migratory protocols • Predictors would sit beside each standard directory and cache module to monitor coherence activity and request appropriate actions.

  7. The Cosmos coherence message Predictor • Signature patterns • Basic structure of Cosmos • Updating Cosmos • Adaptability to a complex signature • Filtering Noise • Implementation issues for Cosmos.

  8. Signature patterns Sequence of message signatures by the producer cache, consumer cache and directory. In a slightly more complicated example, we can have two consumers sending a get_ro_request. It can be seen later that the order in which they arrive does not matter.

  9. Basic Structure of Cosmos Logic structure of the Cosmos coherence message predictor • Two important things required: • Address of cache blocks – As patterns may be different for different cache blocks. • History of messages for a cache block.

  10. Basic Structure of Cosmos contd… MHT – Message History Table PHT – Pattern History Table Obtaining a Prediction from Cosmos

  11. Updating Cosmos • Index into MHR table with address of a cache block • Use the entry in MHR to index into the corresponding PHT. • Write new <sender, type> tuple as new prediction for the index corresponding to the MHR entry. • Left shift the <sender, type> tuple into the MHR for the cache block.

  12. Adaptability to a complex signature Cosmos can adapt to complex message streams. For a scenario where the directory receives messages from two or three consumers, the Cosmos can adapt itself making itself immune to the order of arrival of the messages.

  13. Filtering Noise • For ex. If 99% of the time, message B follows message A, then on seeing message A, Cosmos will predict the next message to be B. • The prediction should not change if rarely, these messages arrive in the sequence A, C, B instead of A, B. • Use counter and update the prediction only if there are two consecutive message mis-predictions for the same block.

  14. Implementation issues for Cosmos • Cosmos is a two-level adaptive predictor. • The first level containing the MHRs can be merged with the cache block state maintained at both directories and caches. • The second-level is challenging as it may require large amounts of memory. But statistically, it was found that the memory overhead for 128bytes cache blocks is less than 14% for an MHR depth of one

  15. Integrating Cosmos with a coherence Protocol • Mapping Predictions to Actions. • Determining When to Perform Actions. • Detecting and Handling Mis-Predictions • Actions that move protocol between two “legal” states. • Actions that move the protocol state to a future state, but do not expose this state to the processor • Actions that allow both the processor and the protocol to move to future states.

  16. Modeling the Performance • For the simplistic model the parameters are defined as below. • p – prediction accuracy for each message. • f – fraction of delay incurred on messages predicted correctly • r – penalty due to mis-predicted message. A crude execution model that translates coherence message prediction rates into a parallel program’s speedup.

  17. Benchmarking the Cosmos Bench marks that were run Prediction accuracy for the various benchmarks

  18. Analysis of the Results • Filters increase prediction accuracy slightly, but only for predictors with MHR depth of one. • Time to reach steady state prediction rates varies with the application. • Memory requirement of Cosmos Predictors is generally within 22%

  19. Conclusions • Cosmos is less complex than including composition of predictors of several directed optimizations in a single protocol. • Cosmos can identify application specific patterns not known a priori • Cosmos has high accuracies of 80% and above for most applications. • Compared to other optimizations, Cosmos requires more hardware resource to store, access and update the MHT and PHT.

  20. Thank You Questions ??

More Related