1 / 27

StarFish : highly-available block storage

StarFish : highly-available block storage. Eran Gabber Jeff Fellin Michael Flaster Fengrui Gu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003 USENIX Annual Technical Conference Presenter: D00922019 林敬棋. Introduction. Important data need to be protected .

lizina
Télécharger la présentation

StarFish : highly-available block storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StarFish: highly-available block storage Eran Gabber Jeff Fellin Michael Flaster FengruiGu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003 USENIX Annual Technical Conference Presenter: D00922019 林敬棋

  2. Introduction • Important data need to be protected. • Making replicas. • Replication on remote sites • Reduce the amount of data lost in failure. • Decrease the time required to recover from catastrophic site failure.

  3. StarFish • A highly-available geographically-dispersed block storage system. • Does not require expensive dedicated communication lines to all replicas to achieve highly-available . • Achieves good performance even during recovery from a replica failure. • Single-owner access semantics.

  4. Architecture • StarFish consists of • One Host Element(HE) • Provides storage virtualization and read cache. • N Storage Element(SE) • Q: write quorum size. • Synchronous updates to a quorum of Q SEs, and asynchronous updates to the rest.

  5. Recommended Setup N = 3, Q = 2 MAN : Metropolitan Area Network WAN :Wide Area Network

  6. Another Deployment

  7. SE Recovery • Write log • HE keeps a circular buffer of recent writes. • Each SE maintains a circular buffer of recent writes on a log disk. • Three types of recovery • Quick recovery • Replay recovery • Full recovery

  8. Availability and Reliability • Assume that the failure and recovery processes of the network links and SEs are i.i.d Poisson processes with combined mean failure and recovery rates of λ and μ per second. • Similarly, the HE has Poisson-distributedλhe and μhe .

  9. Availability • The steady-state probability that at least Q SEs are available. • Derived from the standard machine repairman mode.

  10. Machine Repairman Model

  11. Availability(cont.)

  12. Availability(cont.) • X★9:the number of 9s in an availability measure. • Achieve a much higher availability when N = 2Q + 1. • For fixed N, availability decrease with larger quorum size. • Increasing quorum size trades off availability for reliability.

  13. Reliability • The probability of no data loss. • The reliability increases with larger Q. • Two approaches • Make Q > floor(N/2) and at least Q SEs are available. • Reduce availability and performance. • Read-only consistency

  14. Read-only Consistency • Available in read-only mode during failure. • Read-only mode obviates the need for Q SEs to be available to handle updates. • Increase availability

  15. Availability with Read-only Consistency

  16. Observations • If ρhe = 0, availability is independent of Q. • Can always recover from HE. • If ρhe increase, availability increase with Q. • Largest increase occurs from Q = 1 to Q = 2, and bounded by 3/16 when ρ = 1. • Diminishing gain after Q = 2. • Suggest Q = 2 in practical system.

  17. Implementation

  18. Performance Measurements • Compares with a direct-attached RAID unit.

  19. Settings • Different network delays • 1, 2, 4, 8, 23, 36, 65 ms • Different bandwidth limitations • 31, 51, 62, 93, 124 Mb/s. • Benchmark: • Micro-benchmark • Read hit • Read miss • Write • PostMark

  20. Effects of network delays and HE cache size • Near SE delay: 4ms; Far SE delay: 8ms • No cache miss if HE cache size = 400MB

  21. Observation • Large HE cache improves performance. • HE can respond to more read requests without communicating with SE. • Does not change write requests. • Especially beneficial when local SE has significant delays. • Q = 2 and 400MB cache size is not influenced by the delay to local SE. • Depend on near SE.

  22. Normal Operation and placement of the far SE • 1-8: 1, 2, 4, 8 ms; 4-12: 4, 8, 12 ms • 23-65: 23, 36, 65 ms; 31-124: 31,51,62,93,124 Mbps • Local SE delay: 0ms N = 3

  23. Normal Operation and placement of the far SE(Cont.) N = 3 8 threads

  24. Normal Operation and placement of the far SE(Cont.)

  25. Observation • Performance is influenced mostly by two parameters • Write quorum size • Delay to the SE. • StarFish can provide adequate performance when one of the SEs is placed in a remote location. • At least 85% of the performance of a direct-attached RAID.

  26. Recovery • Performance degrades more during full recovery.

  27. Conclusion • The StarFish system reveals significant benefits from a third copy of the data at an intermediate distance. • A StarFish system with 3 replicas, a write quorum size of 2, and read-only consistency yields better than 99.9999% availability assuming individual Storage Element availability of 99%.

More Related