1 / 142

Anonymous communications: High latency systems

Anonymous communications: High latency systems. Messaging anonymity & the traffic analysis of hardened systems. FOSAD 2010 – Bertinoro , Italy. Network identity today. Networking Relation between identity and efficient routing Identifiers: MAC, IP, email, screen name

satya
Télécharger la présentation

Anonymous communications: High latency systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anonymous communications: High latency systems Messaging anonymity & the traffic analysis of hardened systems FOSAD 2010 – Bertinoro, Italy

  2. Network identity today • Networking • Relation between identity and efficient routing • Identifiers: MAC, IP, email, screen name • No network privacy = no privacy! • The identification spectrum today “The Mess” we are in! Strong Identification Full Anonymity Pseudonymity

  3. Network identity today (contd.) No anonymity No identification Weak identifiers easy to modulate Expensive / unreliable logs. IP / MAC address changes Open wifi access points Botnets Partial solution Authentication Open issues: DoS and network level attacks • Weak identifiers everywhere: • IP, MAC • Logging at all levels • Login names / authentication • PK certificates in clear • Also: • Location data leaked • Application data leakage

  4. Ethernet packet format Anthony F. J. Levi - http://www.usc.edu/dept/engineering/eleceng/Adv_Network_Tech/Html/datacom/ No integrity or authenticity MAC Address

  5. IP packet format 3.1. Internet Header Format A summary of the contents of the internet header follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Example Internet Datagram Header Figure 4. RFC: 791 INTERNET PROTOCOL DARPA INTERNET PROGRAM PROTOCOL SPECIFICATION September 1981 Link different packets together Weak identifiers No integrity / authenticity Same for TCP, SMTP, IRC, HTTP, ...

  6. Outline –4 lectures • Objective: foundations of anonymity & traffic analysis • Why anonymity? • Lecture 1 – Unconditional anonymity • DC-nets in detail, Broadcast & their discontents • Lecture 2 – Black box, long term attacks • Statistical disclosure & its Bayesian formulation • Lecture 3 – Mix networks in detail • Sphinx format, mix-networks • Lecture 4 – Bayesian traffic analysis of mix networks

  7. Anonymity in communications • Specialized applications • Electronic voting • Auctions / bidding / stock market • Incident reporting • Witness protection / whistle blowing • Showing anonymous credentials! • General applications • Freedom of speech • Profiling / price discrimination • Spam avoidance • Investigation / market research • Censorship resistance

  8. Anonymity properties (1) • Sender anonymity • Alice sends a message to Bob. Bob cannot know who Alice is. • Receiver anonymity • Alice can send a message to Bob, but cannot find out who Bob is. • Bi-directional anonymity • Alice and Bob can talk to each other, but neither of them know the identity of the other.

  9. Anonymity properties (2) • 3rd party anonymity • Alice and Bob converse and know each other, but no third party can find this out. • Unobservability • Alice and Bob take part in some communication, but no one can tell if they are transmitting or receiving messages.

  10. Pseudonymity properties • Unlinkability • Two messages sent (received) by Alice (Bob) cannot be linked to the same sender (receiver). • Pseudonymity • All actions are linkable to a pseudonym, which is unlinkable to a principal (Alice)

  11. Anonymity through Broadcast Point 1: Do not re-invent this E(Junk) Point 2: Many ways to do broadcast - Ring - Trees It has all been done (Buses) Simple receiver anonymity E(Message) Point 3:Is your anonymity system better than this? E(Junk) Point 4: What are the problems here? Coordination Sender anonymity Latency Bandwidth E(Junk) E(Junk)

  12. Unconditional anonymity • DC-nets • Dining Cryptographers (David Chaum 1985) • Multi-party computation resulting in a message being broadcast anonymously • 2 twists • How to implement DC-nets through broadcast trees • How to achieve reliability? • Communication cost...

  13. The Dining Cryptographers (1) • “Three cryptographers are sitting down to dinner at their favourite three-star restaurant. • Their waiter informs them that arrangements have been made with the maitre d'hotel for the bill to be paid anonymously. • One of the cryptographers might be paying for the dinner, or it might have been NSA (U.S. National Security Agency). • The three cryptographers respect each other's right to make an anonymous payment, but they wonder if NSA is paying.”

  14. The Dining Cryptographers (2) I didn’t I paid Ron Adi Did the NSA pay? I didn’t Wit

  15. The Dining Cryptographers (2) I didn’t ma = 0 Toss coin car br = mr + car + crw I paid mr = 1 Ron ba = ma + car + caw Toss coin crw Adi Combine: B = ba + br + bw = ma + mr +mw = mr (mod 2) Toss coin caw I didn’t mw = 0 bw = mw + crw + caw Wit

  16. DC-nets • Generalise • Many participants • Larger message size • Conceptually many coins in parallel (xor) • Or: use +/- (mod 2|m|) • Arbitrary key (coin) sharing • Graph G: • nodes - participants, • edges - keys shared • What security?

  17. The Dining Cryptographers (3) ma = 0 Toss m coins car br = mr+ car - crw(mod 2m) I want to send mr = message Ron ba = ma - car + caw (mod 2m) Toss m coins crw Adi Combine: B = ba + br + bw = ma + mr +mw = mr (mod 2m) Toss m coins caw mw = 0 bw = mw + crw - caw (mod 2m) Wit

  18. Key sharing graph • Derive coins • cabi= H[Kab, i]for round i • Stream cipher (Kab) • Alice broadcasts • ba = cab + cac + ma C B A Shared key Kab

  19. Key sharing graph – security (1) • If B and C corrupt • Alice broadcasts • ba = cab + cac+ ma • Adversary’s view • ba = cab + cac + ma • No Anonymity C B A Shared key Kab

  20. Key sharing graph – security (2) • Adversary nodes partition the graph into a blue and green sub-graph • Calculate: • Bblue = ∑bj, j is blue • Bgreen = ∑bi, i is green • Substract known keys • Bblue+ Kred-blue = ∑mj • Bgreen + K’red-green = ∑mi • Discover the originating subgraph. • Reduction in anonymity C B A Anonymity set size = 4 (not 11 or 8!)

  21. DC-net implementation • bi broadcast graph • Tree – independent of key sharing graph • = Key sharing graph – No DoS unless split in graph • Communication in 2 phases: • 1) Key sharing (off-line) • 2) Round sync & broadcast Combine: B = bi = mr (mod 2m) Aggregator

  22. DC-net implementation (p2p) • bi broadcast graph • Tree – independent of key sharing graph • = Key sharing graph – No DoS unless split in graph • Communication in 2 phases: • 1) Key sharing (off-line) • 2) Round sync & broadcast (peer-to-peer?) • Ring? • Tree?

  23. DC collisions ma = 0 Toss m coins car br = mr+ car - crw(mod 2m) I want to send mr = message1 Ron ba = ma - car + caw (mod 2m) Toss m coins crw Adi Combine: B = ba + br + bw = ma + mr +mw = collision (mod 2m) Toss m coins caw I want to send mw = message2 bw = mw + crw - caw (mod 2m) Wit

  24. How to resolve collisions? • Ethernet: • detect collisions (redundancy) • Exponential back-off, with random re-transmission • Collisions do not destroy all information • B = ba + br + bw= ma+ mr +mw = = collision(mod m) = message 1 + message2(mod m) • N collisions can be decoded in N transmissions Cool!

  25. Reliability – How? Trick 1: run 2 DC-net rounds in parallel Trick 2: retransmit if greater than average (1, m1) Round 1 (5, m1+m2+m4+m5+m6) (1, m2) Round 2 (3, m1+m2+m4) (0, 0) (2, m5+m6) (1, m4) Round 3 Round 5 (2, m2+m4) (1, m6) (1, m1) (1, m5) (1, m5) (1, m6) Round 4 (1, m4) (1, m2) (0, 0)

  26. Other tricks with DC nets • Reliability  DoS resistance! • How to protect a DC-network against disruption? • Solved problem? • DC in a Disco – “trap & trace” • Modern crypto: commitments, secret sharing, and banning • Note the homomorphism: collisions = addition • Can tally votes / histograms through DC-nets

  27. DC-net shortcommings • Security is great! • Full key sharing graph  perfect anonymity • Communication cost – BAD • (N broadcasts for each message!) • Naive: O(N2) cost, O(1) Latency • Not so naive: O(N) messages, O(N) latency • Ring structure for broadcast • Expander graph: O(N) messages, O(logN) latency? • Centralized: O(N) messages, O(1) latency • Not practical for large(r) N!  • Local wireless communications? • Perfect Anonymity ?

  28. Black-box, long term traffic analysis attacks “In the long run we are all dead” – Keynes

  29. Anonymity so far... • DC-nets – setting • Single message from Alice to Bob, or between a set group • Real communications • Alice has a few friends that she messages often • Interactive stream between Alice and Bob (TCP) • Emails from Alice to Bob (SMTP) • Alice is not always on-line (network churn) • Repetition – patterns -> Attacks

  30. Fundamental limits • Even perfect anonymity systems leak information when participants change • Remember: DC-nets do not scale well • Setting: • N senders / receivers – Alice is one of them • Alice messages a small number of friends: • RA in {Bob, Charlie, Debbie} • Through a MIX / DC-net • Perfect anonymity of size K • Can we infer Alice’s friends?

  31. Setting • Alice sends a single message to one of her friends • Anonymity set size = KEntropy metric EA = log K • Perfect! • rA in RA= {Bob, Charlie, Debbie} Alice Anonymity System K-1 Sendersout of N-1others K-1 Receiversout of Nothers (Model as random receivers)

  32. Many rounds • Observe many rounds in which Alice participates • Rounds in which Alice participates will output a message to her friends! • Infer the set of friends! T1 Alice Alice Alice Alice • rA1 • rA4 • rA2 • rA3 Anonymity System Anonymity System Anonymity System Anonymity System Others Others Others Others Others Others Others Others T2 T3 T4 ... Tt

  33. Hitting set attack (1) • Guess the set of friends of Alice (RA’) • Constraint |RA’| = m • Accept if an element is in the output of each round • Downside: Cost • N receivers, m size – (N choose m) options • Exponential – Bad • Good approximations...

  34. Statistical disclosure attack • Note that the friends of Alice will be in the sets more often than random receivers • How often? Expected number of messages per receiver: • μother = (1 / N) ∙ (K-1) ∙ t • μAlice = (1 / m) ∙ t + μother • Just count the number of messages per receiver when Alice is sending! • μAlice >μother

  35. Comparison: HS and SDA • Parameters: N=20 m=3 K=5 t=45 KA={[0, 13, 19]} Round Receivers SDA SDA_error #Hitting sets 1 [15, 13, 14, 5, 9] [13, 14, 15] 2 685 2 [19, 10, 17, 13, 8] [13, 17, 19] 1 395 3 [0, 7, 0, 13, 5] [0, 5, 13] 1 257 4 [16, 18, 6, 13, 10] [5, 10, 13] 2 203 5 [1, 17, 1, 13, 6] [10, 13, 17] 2 179 6 [18, 15, 17, 13, 17] [13, 17, 18] 2 175 7 [0, 13, 11, 8, 4] [0, 13, 17] 1 171 8 [15, 18, 0, 8, 12] [0, 13, 17] 1 80 9 [15, 18, 15, 19, 14] [13, 15, 18] 2 41 10 [0, 12, 4, 2, 8] [0, 13, 15] 1 16 11 [9, 13, 14, 19, 15] [0, 13, 15] 1 16 12 [13, 6, 2, 16, 0] [0, 13, 15] 1 16 13 [1, 0, 3, 5, 1] [0, 13, 15] 1 4 14 [17, 10, 14, 11, 19] [0, 13, 15] 1 2 15 [12, 14, 17, 13, 0] [0, 13, 17] 1 2 16 [18, 19, 19, 8, 11] [0, 13, 19] 0 1 17 [4, 1, 19, 0, 19] [0, 13, 19] 0 1 18 [0, 6, 1, 18, 3] [0, 13, 19] 0 1 19 [5, 1, 14, 0, 5] [0, 13, 19] 0 1 20 [17, 18, 2, 4, 13] [0, 13, 19] 0 1 21 [8, 10, 1, 18, 13] [0, 13, 19] 0 1 22 [14, 4, 13, 12, 4] [0, 13, 19] 0 1 23 [19, 13, 3, 17, 12] [0, 13, 19] 0 1 24 [8, 18, 0, 10, 18] [0, 13, 18] 1 1 Round 16: Both attacks give correct result SDA: Can give wrong results – need more evidence

  36. HS and SDA (continued) 25 [19, 4, 13, 15, 0] [0, 13, 19] 0 1 26 [13, 0, 17, 13, 12] [0, 13, 19] 0 1 27 [11, 13, 18, 15, 14] [0, 13, 18] 1 1 28 [19, 14, 2, 18, 4] [0, 13, 18] 1 1 29 [13, 14, 12, 0, 2] [0, 13, 18] 1 1 30 [15, 19, 0, 12, 0] [0, 13, 19] 0 1 31 [17, 18, 6, 15, 13] [0, 13, 18] 1 1 32 [10, 9, 15, 7, 13] [0, 13, 18] 1 1 33 [19, 9, 7, 4, 6] [0, 13, 19] 0 1 34 [19, 15, 6, 15, 13] [0, 13, 19] 0 1 35 [8, 19, 14, 13, 18] [0, 13, 19] 0 1 36 [15, 4, 7, 13, 13] [0, 13, 19] 0 1 37 [3, 4, 16, 13, 4] [0, 13, 19] 0 1 38 [15, 13, 19, 15, 12] [0, 13, 19] 0 1 39 [2, 0, 0, 17, 0] [0, 13, 19] 0 1 40 [6, 17, 9, 4, 13] [0, 13, 19] 0 1 41 [8, 17, 13, 0, 17] [0, 13, 19] 0 1 42 [7, 15, 7, 19, 14] [0, 13, 19] 0 1 43 [13, 0, 17, 3, 16] [0, 13, 19] 0 1 44 [7, 3, 16, 19, 5] [0, 13, 19] 0 1 45 [13, 0, 16, 13, 6] [0, 13, 19] 0 1 SDA: Can give wrong results – need more evidence

  37. Disclosure attack family • Counter-intuitive • The larger N the easiest the attack • Hitting-set attacks • More accurate, need less information • Slower to implement • Sensitive to Model • E.g. Alice sends dummy messages with probability p. • Statistical disclosure attacks • Need more data • Very efficient to implement (vectorised) – Faster partial results • Can be extended to more complex models (pool mix, replies, ...) • Bayesian modelling of the problem • A systematic approach to traffic analysis

  38. Traffic analysis is probabilistic inference (1) Profile Alice   Alice • Full generative model: • Pick a profile for Alice Alice and a Profile for others Other from prior distributions (once!) • For each round: • Pick a multi-set of senders uniformly from the multi-sets of size K of possible senders • Pick a permutation M1 from kk uniformly at random • For each sender pick a receiver according to the probability in their respective profiles T1 Anonymity System • rA1  Alice Profile Other   Others rother Other Matching M1  M Observation O1 = Senders ( Uniform) + receivers (as above)

  39. Traffic analysis is probabilistic inference (2) Profile Alice  Alice • Full generative model: • Pick a profile for Alice Alice and a Profile for others Other from prior distribution (once!) • For each round: • Pick a multi-set of senders in Oi uniformly from the multi-sets of size K of possible senders • Pick a permutation M1 from kk uniformly at random • For each sender pick a receiver rAlice and rother according to the probability in their respective profiles T1 Anonymity System • rA1  Alice Profile Other  Others rother Other Matching M1  M Observation O1 = Senders ( Uniform) + receivers (as above)

  40. Traffic analysis is probabilistic inference (3) Profile Alice  Alice • The Disclosure attack inference problem: • “Given all known and observed variables (priors , M, observationsOi) determine the distribution over the hidden random variables (profiles Alice , Otherand matching Mi)” • Using information from all rounds, assuming profiles are stable • How? (a) maths = Bayesian inference (b) computation = Gibbs sampling T1 Anonymity System • rA1  Alice Profile Other  Others rother Other Matching M1  M Observation O1 = Senders ( Uniform) + receivers (as above)

  41. Lets apply probability theory • What are profiles? • Keep it simple: a multinomial distribution over all users. • E.g. Alice= [Bob: 0.4, Charlie: 0.3, Debbie: 0.3]Other= [Uniform(1/n-1)] • Prior distribution on multinomials: Dirichlet Distribution, i.e. samples from Dirichlet distribution are multinomial distributions • Other profile models are also possible: restrictions on number of contacts, cliques, mutual friends, … • Other quantities have known conditional probabilities. • What are those?

  42. Traffic analysis is probabilistic inference (4) Profile Alice  Alice • Full generative model: • Pick a profile for Alice Alice and a Profile for others Other from prior distribution (once!) • For each round: • Pick a multi-set of senders in Oi uniformly from the multi-sets of size K of possible senders • Pick a permutation M1 from kk uniformly at random • For each sender pick a receiver rAlice and rother according to the probability in their respective profiles • Pr[rA1| Alice,M1,O1] T1 Anonymity System Pr[Alice| ] • rA1  Alice Profile Other  Others rother Other Pr[Other | ] • Pr[rother | other,M1,O1] Pr[O1 | K] Matching M1  M Pr[M1 | M] Observation O1 = Senders ( Uniform) + receivers (as above)

  43. Re-arrange probabilities • What we have:Pr[Alice, Other,Mi , riA,riother| Oi, M, , K] == Pr[riA| Alice,M1,Oi] x Pr[riother| other,M1,Oi] x Pr[Mi| M] x Pr[Alice| ] x Pr[Other | ] • What we really want:Pr[Alice , Other, Mi |riA,riother, Oi, M, , K]

  44. Revision: Bayes theorem • Pr[A=a, B=b] = Pr[A=a | B=b] x Pr[B = b] =Pr[B=b | A=a] x Pr[A = a] • Pr[A=a | B=b] = Pr[B=b | A=a ] x Pr[A=a ] / Pr[B = b] • Pr[B=b] = (A=a) Pr[A=a, B=b] = (A=a) Pr[A=a | B=b] x Pr[B = b] Derivation Inverse probability Meaning Forward probability Prior Normalising factor Normalising factor Large spaces = no direct computation

  45. Apply Bayes A B • Pr[Alice , Other, Mi| riA,riother, Oi, M, , K] =Pr[ riA,riother| Alice , Other, MiOi, M, , K] xPr[Alice , Other, Mi| Oi, M, , K]Pr[ riA,riother| Alice , Other, MiOi, M, , K] xPr[Alice , Other, Mi | Oi, M, , K] A B  A    All riA,riother Large spaces = no direct computation

  46. The good, the bad and the ugly • Good: can compute the probability sought up to a multiplicative constant • Pr[Alice , Other, Mi | riA,riother, Oi, M, , K] Pr[ riA,riother| Alice , Other, MiOi, M, , K] xPr[Alice , Other, Mi | Oi, M, , K] • Bad: Forget about computing the constant – in general • Ugly: Use computational methods to estimate quantities of interest for traffic analysis

  47. Quantities of Interest • Full hidden state: Alice , Other, Mi • Marginal probabilities & distributions • Pr[Alice->Bob] – Are Alice and Bob friends? • Mx – Who is talking to whom at round x? • How can we get information about those? • Without computing the exact probability!

  48. Revision: Sampling • Consider a distribution Pr[A=a] • Direct computation of mean: • E[A] = aPr[A=a] x a • Computation of mean through sampling: • Samples A1, A2, …, An A • E[A] iAi / n  

  49. Revision: Sampling (2) • Consider a distribution Pr[A=a] • Computation of mean through sampling: • Samples A1, A2, …, An A • E[A] iAi / n • More: • Estimates of variance • Estimate of percentiles – confidence intervals(0.5%, 2.5%, 50%, 97.5%, 99.5%)

  50. Traffic analysis is probabilistic inference (5) Profile Alice  Alice • The Disclosure attack inference problem: • “Given all known and observed variables (priors , M, observationsOi) determine the distribution over the hidden random variables (profiles Alice , Otherand matching Mi)” • Using information from all rounds, assuming profiles are stable • How? (a) maths = Bayesian inference (b) computation = Gibbs sampling T1 Anonymity System • rA1  Alice Profile Other  Others rother Other Matching M1  M Observation O1 = Senders ( Uniform) + receivers (as above) Sample

More Related