Privacy Week 7 - February 28, March 2
Administrivia • Assign roles for class debate
Privacy policies • Policies let consumers know about site’s privacy practices • Consumers can decide whether practices are acceptable, when to opt-out • Presence increases consumer trust • Make companies subject to FTC privacy-related enforcement • Rapid adoption 1998-2001* * G.R. Milne and M.J. Culnan 2002. Using the Content of Online Privacy Notices to Inform Public Policy: A Longitudinal Analysis of the 1998-2002 US Web Surveys. The Information Society 18, 5, 245-359.
How are online privacy concerns different from offline privacy concerns?
Web privacy concerns • Data is often collected silently • Web allows large quantities of data to be collected inexpensively and unobtrusively • Data from multiple sources may be merged • Non-identifiable information can become identifiable when merged • Data collected for business purposes may be used in civil and criminal proceedings • Users given no meaningful choice • Few sites offer alternatives
Browsers chatter about IP address, domain name, organization, Referring page Platform: O/S, browser What information is requested URLs and search terms Cookies To anyone who might be listening End servers System administrators Internet Service Providers Other third parties Advertising networks Anyone who might subpoena log files later Browser Chatter
Typical HTTP request with cookie GET /retail/searchresults.asp?qu=beer HTTP/1.0 Referer: http://www.us.buy.com/default.asp User-Agent: Mozilla/4.75 [en] (X11; U; NetBSD 1.5_ALPHA i386) Host: www.us.buy.com Accept: image/gif, image/jpeg, image/pjpeg, */* Accept-Language:en Cookie:buycountry=us; dcLocName=Basket; dcCatID=6773; dcLocID=6773; dcAd=buybasket; loc=; parentLocName=Basket; parentLoc=6773; ShopperManager%2F=ShopperManager%2F=66FUQULL0QBT8MMTVSC5MMNKBJFWDVH7; Store=107; Category=0
Referer log problems • GET methods result in values in URL • These URLs are sent in the referer header to next host • Example: http://www.merchant.com/cgi_bin/order?name=Tom+Jones&address=here+there&credit+card=234876923234&PIN=1234&->index.html • Access log example
Cookies • What are cookies? • What are people concerned about cookies? • What useful purposes do cookies serve?
Cookies 101 • Cookies can be useful • Used like a staple to attach multiple parts of a form together • Used to identify you when you return to a web site so you don’t have to remember a password • Used to help web sites understand how people use them • Cookies can do unexpected things • Used to profile users and track their activities, especially across web sites
How cookies work – the basics • A cookie stores a small string of characters • A web site asks your browser to “set” a cookie • Whenever you return to that site your browser sends the cookie back automatically Please store cookie xyzzy Here is cookie xyzzy site browser site browser First visit to site Later visits
Cookies are only sent back to the “site” that set them – but this may be any host in domain Sites setting cookies indicate path, domain, and expiration for cookies Cookies can store user info or a database key that is used to look up user info – either way the cookie enables info to be linked to the current browsing session How cookies work – advanced Send me with requests for index.html on y.x.com for this session only Send me with any request to x.com until 2008 DatabaseUsers … Email … Visits … User=Joe Email=Joe@x.com Visits=13 User=4576904309
Cookie terminology • Cookie Replay – sending a cookie back to a site • Session cookie – cookie replayed only during current browsing session • Persistent cookie – cookie replayed until expiration date • First-party cookie – cookie associated with the site the user requested • Third-party cookie – cookie associated with an image, ad, frame, or other content from a site with a different domain name that is embedded in the site the user requested • Browser interprets third-party cookie based on domain name, even if both domains are owned by the same company
Web bugs • Invisible “images” (1-by-1 pixels, transparent) embedded in web pages and cause referer info and cookies to be transferred • Also called web beacons, clear gifs, tracker gifs,etc. • Work just like banner ads from ad networks, but you can’t see them unless you look at the code behind a web page • Also embedded in HTML formatted email messages, MS Word documents, etc. • For software to detect web bugs see: http://www.bugnosis.org
How data can be linked • Every time the same cookie is replayed to a site, the site may add information to the record associated with that cookie • Number of times you visit a link, time, date • What page you visit • What page you visited last • Information you type into a web form • If multiple cookies are replayed together, they are usually logged together, effectively linking their data • Narrow scoped cookie might get logged with broad scoped cookie
search for medical information buy CD replay cookie set cookie Ad Ad Ad networks Ad companycan get yourname and address fromCD order andlink them to your search Search Service CD Store
Personal data: Email address Full name Mailing address (street, city, state, and Zip code) Phone number Transactional data: Details of plane trips Search phrases used at search engines Health conditions What ad networks may know… “It was not necessary for me to click on the banner ads for information to be sent to DoubleClick servers.” – Richard M. Smith
Online and offline merging • In November 1999, DoubleClick purchased Abacus Direct, a company possessing detailed consumer profiles on more than 90% of US households. • In mid-February 2000 DoubleClick announced plans to merge “anonymous” online data with personal information obtained from offline databases • By the first week in March 2000 the plans were put on hold • Stock dropped from $125 (12/99) to $80 (03/00)
Offline data goes online… The Cranor family’s 25 most frequentgrocerypurchases (sorted by nutritional value)!
Subpoenas • Data on online activities is increasingly of interest in civil and criminal cases • The only way to avoid subpoenas is to not have data • In the US, your files on your computer in your home have much greater legal protection that your files stored on a server on the network
P3P: Introduction Original Idea behind P3P • A framework for automated privacy discussions • Web sites disclose their privacy practices in standard machine-readable formats • Web browsers automatically retrieve P3P privacy policies and compare them to users’ privacy preferences • Sites and browsers can then negotiate about privacy terms
P3P: Introduction P3P history • Idea discussed at November 1995 FTC meeting • Ad Hoc “Internet Privacy Working Group” convened to discuss the idea in Fall 1996 • W3C began working on P3P in Summer 1997 • Several working groups chartered with dozens of participants from industry, non-profits, academia, government • Numerous public working drafts issued, and feedback resulted in many changes • Early ideas about negotiation and agreement ultimately removed • Automatic data transfer added and then removed • Patent issue stalled progress, but ultimately became non-issue • P3P issued as official W3C Recommendation on April 16, 2002 • http://www.w3.org/TR/P3P/
P3P: Introduction P3P1.0 – A first step • Offers an easy way for web sites to communicate about their privacy policies in a standard machine-readable format • Can be deployed using existing web servers • This will enable the development of tools that: • Provide snapshots of sites’ policies • Compare policies with user preferences • Alert and advise the user
P3P: Introduction The basics • P3P provides a standard XML format that web sites use to encode their privacy policies • Sites also provide XML “policy reference files” to indicate which policy applies to which part of the site • Sites can optionally provide a “compact policy” by configuring their servers to issue a special P3P header when cookies are set • No special server software required • User software to read P3P policies called a “P3P user agent”
P3P: Enabling your web site – overview and options What’s in a P3P policy? • Name and contact information for site • The kind of access provided • Mechanisms for resolving privacy disputes • The kinds of data collected • How collected data is used, and whether individuals can opt-in or opt-out of any of these uses • Whether/when data may be shared and whether there is opt-in or opt-out • Data retention policy
P3P: Introduction GET /index.html HTTP/1.1 Host: www.att.com . . . Request web page HTTP/1.1 200 OK Content-Type: text/html . . . Send web page A simple HTTP transaction WebServer
P3P: Introduction GET /w3c/p3p.xml HTTP/1.1 Host: www.att.com Request Policy Reference File Send Policy Reference File Request P3P Policy Send P3P Policy GET /index.html HTTP/1.1 Host: www.att.com . . . Request web page HTTP/1.1 200 OK Content-Type: text/html . . . Send web page … with P3P 1.0 added WebServer
P3P: Introduction P3P in IE6 Automatic processing of compact policies only; third-party cookies without compact policies blocked by default Privacy icon on status bar indicates that a cookie has been blocked – pop-up appears the first time the privacy icon appears
P3P: Introduction Users can click on privacy icon forlist of cookies; privacy summariesare available atsites that are P3P-enabled
P3P: Introduction Privacy summary report isgenerated automaticallyfrom full P3P policy
P3P: Introduction P3P in Netscape 7 Preview version similar to IE6, focusing, on cookies; cookies without compact policies (both first-party and third-party) are “flagged” rather than blocked by default Indicates flagged cookie
P3P: Introduction Users can view English translation of (part of) compact policy in Cookie Manager
P3P: Introduction A policy summary can be generated automatically from full P3P policy
Privacy Bird • Free download of beta from http://privacybird.com/ • Origninally developed at AT&T Labs • Released as open source • “Browser helper object” for IE6 • Reads P3P policies at all P3P-enabled sites automatically • Bird icon at top of browser window indicates whether site matches user’s privacy preferences • Clicking on bird icon gives more information
Privacy Finder • Prototype developed at AT&T Labs, improved and deployed by CUPS • Uses Google or Yahoo! API to retrieve search results • Checks each result for P3P policy • Evaluates P3P policy against user’s preferences • Reorders search results • Composes search result page with privacy annotations next to each P3P-enabled result • Users can retrieve “Privacy Report” similar to Privacy Bird policy summary
Is Privacy Finder useful? • Do users care about web site privacy? • Have enough web sites adopted P3P that typical search results contain sites with P3P policies? • Do users have meaningful choices among privacy policies? • Do users understand information provided by Privacy Finder? • Does Privacy Finder influence online purchasing decisions?
Have enough sites adopted P3P? • We weren’t sure, so we did a study…. • Draft paper at http://lorrie.cranor.org/pubs/www06.pdf • Previous studies examined lists of “most popular” web sites for P3P adoption, but this gives incomplete picture