210 likes | 297 Vues
Understand how implicit identifiers such as SSIDs, network destinations, broadcast packet sizes, and MAC protocol fields can compromise user privacy in .802.11 environments. Learn about the potential risks and implications associated with user fingerprinting methods.
E N D
802.11 User Fingerprinting Jeff Pang, Ben Greenstein, Ramki Gummadi, Srini Seshan, and David Wetherall Most slides borrowed from Ben
Location Privacy is at Risk Your MAC address: 00:0E:35:CE:1F:59 Usually < 100m You “The adversary” (a.k.a., some dude with a laptop)
MAC address now: 00:0E:35:CE:1F:59 MAC address later: 00:AA:BB:CC:DD:EE Are pseudonyms enough?
Implicit Identifiers Remain • Consider one user at SIGCOMM 2004 • Visible in an “anonymized” trace • MAC addresses scrubbed • Effectively a pseudonym • Transferred 512MB via bittorrent • => Crappy performance for everyone else • Let’s call him Bob • Can we figure out who Bob is?
Implicit Identifier: SSIDs • SSIDs in Probe Requests • Windows XP, Mac OS X probe for your preferred networks by default • Set of networks advertised in a traffic sample • Determined by a user’s preferred networks list SSID Probe: “roofnet” Bob
What if Bob used pseudonyms? • “roofnet” probe occurred during different session than bittorrent download • Can no longer explicitly associate “roofnet” with poor network etiquette • Can we do it implicitly?
Implicit Identifier: Network Destinations • Network Destinations • Set of IP <address, port> pairs in a traffic sample • In SIGCOMM, each visited by 1.15 users on average • A user is likely to visit a site repeatedly (e.g., an email server) SSH/IMAP server: 159.16.40.45 Bob
What if network is encrypted? • Can’t see IP addresses through link-layer encryption like WPA • Is Bob safe now?
Implicit Identifier: Broadcast Packet Sizes • Broadcast Packet Sizes • Set of 802.11 broadcast packet sizes in a traffic sample • E.g., Windows machines NetBIOS naming advertisements; FileMaker and Microsoft Office advertise themselves • In SIGCOMM, only 16% more unique <application, size> tuples than unique sizes Broadcast packet sizes: 239, 245, 257 Bob
Implicit Identifier:MAC Protocol Fields • MAC Protocol Fields • Header bits (e.g., power mgmt., order) • Supported rates • Offered authentication algorithms Mac Protocol Fields: 11,4,2,1Mbps, WEP, etc. Bob
What else do implicit identifiers tell us? David J. Wetherall Anonymized 802.11 Traces from SIGCOMM 2004 Search on Wigle for “djw” in the Seattle area A pseudonym Google pinpoints David’s home (to within 200 ft)
Automating Implicit Identifiers ? ? ? TRAINING: Collect some traffic known to be from Bob OBSERVATION: Which traffic is from Bob?
Simulate using SIGCOMM, USCD Split trace into training data and observation data Sample = 1hour of traffic to/from a user Assume pseudonyms Methodology “The adversary”
Did this traffic sample come from Bob? Naïve Bayesian Classifier: We say sample s (with features fi) is from Bob if Pr[s from Bob | s has features fi] > T How to convert implicit identifiers into features?
Did This Traffic Sample Come from Bob? Features: Set similarity (Jaccard Index), weighted by frequency: Rare djw linksys IR_Guest SIGCOMM_1 Common SAMPLE FORVALIDATION PROFILE FROMTRAINING
60% TPR with 99% FPR Higher FPR, likely due to not being user specific Useful in combination with other features, to rule out identities Individual Feature Accuracy
Multi-feature Accuracy • Samples from 1 in 4 users are identified >50% of the time with 0.001 FPR bcast + ssids + fields + netdests bcast + ssids + fields bcast + ssids
Was Bob here today? • Maybe… • Suppose N users present • Over an 8 hour day, 8*N opportunities to misclassify a user’s traffic • Instead, say Bob is present iff multiple samples are classified as his
In a busy coffee shop with 25 concurrent users, more than half (54%) can be identified with 90% accuracy 4 hour median to detect (4 samples) 27% with two 9s. Was Bob here today?
Conclusion: Pseudonyms Are Insufficient • 4 new identifiers: netdests, ssids, fields, bcast • Average user emits highly distinguishing identifiers • Adversary can combine features • Future • Uncover more identifiers (timing, etc.) • Validate on longer/more diverse traces(SSIDs stable in home setting for >=2 weeks) • Build a better link layer