430 likes | 562 Vues
This lecture explores advanced array coding techniques and storage solutions in large-scale distributed systems used by major tech players like Facebook, Amazon, and Google. It discusses the challenges of node failures, the efficacy of traditional replication methods versus advanced erasure codes like Reed-Solomon and EVENODD. By analyzing various failure scenarios and the required redundancy, we aim to optimize storage efficiency and recovery processes, ensuring robust support for multiple disk failures while minimizing overheads associated with data protection strategies.
E N D
Large Scale Storage Systems • Big Data Players: Facebook, Amazon, Google, Yahoo,… Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!) • Failures are the norm 3
Node failures at Facebook Date XORingElephants: Novel Erasure Codes for Big Data M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, VLDB 2013 4
State-of-the-Art Storing Schemes • 3x Replication: • Easily implemented and maintained • Can tolerate any 2 disk failures • Large storage overhead of 300% - A Big Problem! • More sophisticated schemes: • Reed-Solomon (RS) Codes • The repair problem Widely used 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 5
Problem Setup • Disks are stored together in a group (rack) • Disk failures should be supported • Requirements: • Support as many disk failures as possible • And yet… • Optimal and fast recovery • Low complexity
Problem Setup • Question 1: How many extra disks are required to support a singledisk failure? Answer: 1, How? • Question 2: How many extra disks are required to support twodisk failures? Answer: 2, How? • Question 3: How many extra disks are required to support 3disk failures?Answer: 3, How?
Problem Setup • Question 1: How many extra disks are required to support a singledisk failure? • Question 2: How many extra disks are required to support twodisk failures? • Question 3: How many extra disks are required to support 3disk failures? A B C A+B+C A B C A+B+C A+B+C A B C A+B+C A+B+C ’A+’B+’C
Problem Setup • Question 1: How many extra disks are required to support a singledisk failure? • Question 2: How many extra disks are required to support twodisk failures? • Question 3: How many extra disks are required to support d disk failures? A B C A+B+C {(x1,x2,x3,x4): x1+x2+x3+x4= 0 } {(x1,x2,x3,x4,x5): x1+x2+x3+x4=0x1+x2+x3+x5=0 } A B C A+B+C A+B+C {(x1,x2,x3,x4,x5,x6): x1+x2+x3+x4=0x1+x2+x3+x5=0’x1+’x2+’x3+x6=0} A B C A+B+C A+B+C ’A+’B+’C
Problem Setup • Question 1: How many extra disks are required to support a singledisk failure? • Question 2: How many extra disks are required to support twodisk failures? • Question 3: How many extra disks are required to support d disk failures? {(x1,x2,x3,x4): x1+x2+x3+x4= 0 } A B C A+B+C {(x1,x2,x3,x4): H1∙(x1,x2,x3,x4)T=0} H1 = (1,1,1,1) {(x1,x2,x3,x4,x5): x1+x2+x3+x4=0x1+x2+x3+x5=0 } A B C A+B+C A+B+C {(x1,x2,x3,x4,x5): H2∙(x1,x2,x3,x4,x5)T=0} H2= (1,1,1,1,0; ,,,0,1) {(x1,x2,x3,x4,x5,x6): x1+x2+x3+x4=0x1+x2+x3+x5=0’x1+’x2+’x3+x6=0} A B C A+B+C A+B+C ’A+’B+’C {(x1,x2,x3,x4,x5,x6):H3∙(x1,x2,x3,x4,x5,x6)T=0} H3= (1,1,1,1,0,0; ,,,0,1,0; ’,’,’,0,1,0)
Problem Setup • Question 2: How many extra disks are required to support twodisk failures? • Question: What is the requirement on H2? Answer: Every 2x2 sub-matrix has rank two • Question: What is the requirement on H3? Answer: Every 3x3 sub-matrix has rank three {(x1,x2,x3,x4,x5): x1+x2+x3+x4=0x1+x2+x3+x5=0 } A B C A+B+C A+B+C {(x1,x2,x3,x4,x5): H2∙(x1,x2,x3,x4,x5)T=0} H2= (1,1,1,1,0; ,,,0,1)
Problem Setup • Question: How many extra disks are required to support ddisk failures?Answer: d, How? {(x1,x2,…,xn-1,xn):H∙(x1,x2,…,xn-1,xn)T=0}, n=k+d • What is the requirement on H? • Answer: Every sub-matrix of size dxd has rank d • Is it possible to construct such matrices?
Reed Solomon Codes • A code with parity check matrix of the form Where is a primitive element at some extension field and O() > n-1 Claim: Every sub-matrix of size dxd has full rank
Reed Solomon Codes • Advantages: • Support the maximum number of disk failures • Are very comment in practice and have relatively efficient encoding/decoding schemes • Disadvantages • Require to work over large fields • Need to require all the disks in order to recover even a single disk failure – not efficient rebuild
EVENODD Codes • Designed by Mario Balum, Jim Brady, JehoshuaBruck, and Jai Menon • Goal: Construct array codes correcting 2 disk failures using only binary XOR operations • No need for calculations over extension fields • Code construction: • Every disk is a column • The array size is (m-1)x(m+2), m is prime • The last two arrays are used for parity
EVENODD Codes • Code construction: • Every disk is a column • The array size is (m-1)x(m+2), m is prime • The last two arrays are used for parity
EVENODD Codes • Redundancy Calculation: • First parity drive – a simple XOR of the first m-1 disks for 0 ≤ l ≤ m-2 • Second parity drive – S=1 for 0 ≤ l ≤ m-2
EVENODD Codes • Redundancy Calculation: • First parity drive – a simple XOR of the first m-1 disks for 0 ≤ l ≤ m-2 • Second parity drive – S=1 for 0 ≤ l ≤ m-2
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
EVENODD Codes • Redundancy Calculation: • First parity drive – a simple XOR of the first m-1 disks for 0 ≤ l ≤ m-2 • Second parity drive – S=1 for 0 ≤ l ≤ m-2
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
EVENODD Codes • Redundancy Calculation: • First parity drive – a simple XOR of the first m-1 disks for 0 ≤ l ≤ m-2 • Second parity drive – S=1 for 0 ≤ l ≤ m-2