1 / 34

Improve sketching of Hamming Distance with Error Correcting

Improve sketching of Hamming Distance with Error Correcting. Ely Porat Bar-Ilan University Google Inc. Ohad Lipsky Bar-Ilan University Check Point Inc. December 2003. Problem Definition (1). Alice. Bob. T A. T B. n. n. hamm(T A ,T B ). Given k - bound on the number of mismatches.

redell
Télécharger la présentation

Improve sketching of Hamming Distance with Error Correcting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improve sketching of Hamming Distance with Error Correcting Ely Porat Bar-Ilan University Google Inc Ohad Lipsky Bar-Ilan University Check Point Inc December 2003

  2. Problem Definition (1) Alice Bob TA TB n n hamm(TA,TB) Given k - bound on the number of mismatches December 2003

  3. Problem Definition (2) TA TB n n S S SA SB Calculate hamm(TA,TB) given only SA,SB Finding the mistakes Given k - bound on the number of mismatches December 2003

  4. Motivations • Data Bases • Internet • Error Correcting Router C Router B Router A Router D December 2003

  5. Outline: • Simple Solution • Error Correcting • Improved Solution • Improve more • Recursion • File sharing December 2003

  6. Simplest Solution - O(k2log1/) • Binary Alphabet • Allocate k2 cells. • Take the input array and hash each bit to one of the cells. • In each cell remember the xor of all the values hash to it. 0 1 1 0 December 2003

  7. Simplest Solution - O(k2log1/) 0 1 0 0 1 1 0 0 December 2003

  8. Simplest Solution - O(k2log1/) • Due to the birthday principal: The probability that 2 Error will fall to the same cell < 1/2 • log1/ - to get a probability to fail  0 1 1 0 December 2003

  9. Alphabet • Denote with S the size of the alphabet. • We can encode each latter with it’s unary representation. • The only effect is that each mistake will be counted twice. 0 - 1000000….0 1 - 0100000….0 . S-1 - 0000000….1 0 - 1000000….0 5 - 0000010….0 December 2003

  10. Error correcting - O(k2logNS) • Here we allocate two kind of k2cellsk2 of logS bits.k2 of logNS bits. C1[h(A[i])]+=A[i] 5 8 3 2 C2[h(A[i])]+=iA[i] 15 6 7 8 December 2003

  11. Error correcting - O(k2logNS) • As before with probability > 1/2 there won’t fall 2 Errors in the same cell. C1[h(A[i])]+=A[i] 5 8 3 2 C1[h(A[i])]+=iA[i] 15 6 7 8 December 2003

  12. Error correcting - O(k2logNS) • We get from the red cells: 5 5 8 3 2 C1[h(A[i])]+=A[i] 5 6 3 2 3 8 - 6 = 5 - 3 December 2003

  13. Error correcting - O(k2logNS) • We get from the blue cells: 0 1 2 5 15 11 7 5 C2[h(A[i])]+=iA[i] 15 9 7 5 3 11 - 9 = 2*(5 - 3) => i=2 December 2003

  14. Error correcting - O(k2logNS) • The probability to succeed is about 1/2. • To lower the failer probability we will run it 3 times. • We will get a list of possible mistakes each time. • Output all the mistakes that appear in at least 2 of the 3 runs. December 2003

  15. O(klog2k) - Solution • The Idea is two stage hashes: k/logk w.h.p O(logk) Bar-Yossef, Jayram, Kumar, Sivakumar 03 December 2003

  16. O(klog2k) - Solution keep accumulated XOR The Probability to fail is less then 1/2. Run it 2logk times And take the max. => failer probabilty less then 1/k2 O(logk) O(log2k) Space = O(log3k) Bar-Yossef, Jayram, Kumar, Sivakumar 03 December 2003

  17. O(klog2k) - Solution k/logk O(log3k) O(log3k) O(log3k) O(log3k) O(klog2k) P(Failer)  k/logk * 1/k2 < 1/k Bar-Yossef, Jayram, Kumar, Sivakumar 03 December 2003

  18. O(k2log*klogk) -Idea (recursion) k/logk Pr(F)<1/logck logk/loglogk logk/loglogk runs, take max December 2003

  19. Error Correcting O(klogNS) Alice Bob TA TB n n r0r1r2… p=(N3S) Constant Probability December 2003

  20. Error Correcting O(klogNS) Alice Bob TA TB n n If we wrong w.h.p j>n December 2003

  21. Error Correcting O(klogNS) Alice Bob TA TB n n rj , aj - bj December 2003

  22. Error Correcting O(klogNS) Alice Bob TA TB n n O(klnk) December 2003

  23. Recursion Alice Bob TA TB n n ck TA TB n n December 2003

  24. Recursion Alice Bob TA TB n n ck O(klogNS) December 2003

  25. Complexity TA TB n n S S SA SB Size: O(klogNS) Computing sketch: O(nlogk) Comparing sketches: O(klogk) December 2003

  26. O(klogk) -Solution • We can just encode in unary and hash the input to k3 cells and then run the O(klogNS)=O(klogk) algorithm. December 2003

  27. Reed-Solomon Codes We manage to develop a deterministic algorithm based on that. But the encoding and the decoding is slower. Amir, Farach 95Feigenbaum, Ishai, Malkin, Nissim, Strauss, Wright 01Bar-Yossef, Jayram, Kumar, Sivakumar 03 Efremenko, Porat, Rothschild 06Efremenko, Porat 07

  28. File Sharing Napster source n Source need to stay until someone will have the whole file. (and willing to stay) There is bottleneck at the end.

  29. File Sharing emule/kazaa/torrent source n The source has to send nlnn blocks before disconnecting. Sometimes there are some bottlenecks

  30. Improved File Sharing - Ver 1 a0a1a2…………….an-1 source n n6

  31. Improved File Sharing - Ver 1 n6 Each client that got n points can recreate the file There is no more nlnn Almost no bottlenecks

  32. Improved File Sharing - Ver 2 a0a1a2…………….an-1 source n Send linear equations on the file.

  33. Improved File Sharing - Ver 2 a0a1a2…………….an-1 source n Problems: 1. Heavy to encode each packet we need to go over all the file. 2. Very heavy to decode O(n2) block operation + O(n3) fields operations. Facts: 1. If you get n(1/2-) random combination of two blocks you won’t have dependents w.h.p. 2. If you have d - pairs combinations you can easilly reduce your system to n-d variables. Solution: Use sparse functionals

  34. Improved File Sharing - Ver 2 a0a1a2…………….an-1 source n Futures: Backward compatibility. Even if you don’t have the whole file you can mix functionals.

More Related