200 likes | 322 Vues
Private Communications Issues in the Social Web era. CS315 – Web Search and Data Mining. The AOL search data release. By the way, …. … search companies log your searches …. Privacy concerns. Data is often collected silently
 
                
                E N D
Private Communications Issues in the Social Web era CS315 – Web Search and Data Mining
By the way, … … search companies log your searches …
Privacy concerns • Data is often collected silently • Web allows large quantities of data to be collected inexpensively and unobtrusively • Data from multiple sources may be merged • Non-identifiable information can become identifiable when merged • Data collected for business purposes may be used in civil and criminal proceedings • Users given no meaningful choice • Few sites offer alternatives
Browser Chatter • Browsers chatter about • IP address, domain name, organization, • Referring page • Platform: O/S, browser • What information is requested • URLs and search terms • Cookies • To anyone who might be listening • End servers • System administrators • Internet Service Providers • Other third parties • Advertising networks • Anyone who might subpoena log files later
Cookies 101 • Cookies can be useful • Used like a staple to attach multiple parts of a form together • Used to identify you when you return to a web site so you don’t have to remember a password • Used to help web sites understand how people use them • Cookies can do unexpected things • Used to profile users and track their activities, especially across web sites
How cookies work – the basics • A cookie stores a small string of characters • A web site asks your browser to “set” a cookie • Whenever you return to that site your browser sends the cookie back automatically Please store cookie xyzzy Here is cookie xyzzy site browser site browser First visit to site Later visits
How cookies work – advanced • Cookies are only sent back to the “site” that set them – but this may be any host in domain • Sites setting cookies indicate path, domain, and expiration for cookies • Cookies can store user info or a database key that is used to look up user info Send me with requests for index.html on y.x.com for this session only Send me with any request to x.com until 2008 DatabaseUsers … Email … Visits … User=Joe Email=Joe@x.com Visits=13 User=4576904309
Cookie terminology • Cookie Replay – sending a cookie back to a site • Session cookie – cookie replayed only during current browsing session • Persistent cookie – cookie replayed until expiration date • First-party cookie – cookie associated with the site the user requested • Third-party cookie – cookie associated with an image, ad, frame, or other content from a site with a different domain name that is embedded in the site the user requested • Browser interprets third-party cookie based on domain name, even if both domains are owned by the same company
“Web bugs” • Invisible “images” (1-by-1 pixels, transparent) embedded in web pages and cause referrer info and cookies to be transferred • Also called web beacons, clear gifs, tracker gifs,etc. • Work just like banner ads from ad networks, but you can’t see them unless you look at the page source • Also embedded in HTML formatted email messages, MS Word documents, etc.
search for medical information buy CD replay cookie set cookie Random Ad Ad networks Medical Ad Ad companycan get your name & address from CD order and link them to your search (This is NOT how Google Ads work) Search Service CD Store
What ad networks may know… • Personal data: • Email address • Full name • Mailing address (street, city, state, and Zip code) • Phone number • Transactional data: • Details of plane trips • Search phrases used at search engines • Health conditions “It was not necessary for me to click on the banner ads for information to be sent to DoubleClick servers.” – Richard M. Smith
Offline data goes online… • The Stop and Shop grocery store began posting purchase information for customers who had frequent shopper cards • The Cranor family’s 25 most frequentgrocery purchases (sorted by nutritional value)!
Spyware • Spyware: Software that employs a user's Internet connection, without their knowledge or explicit permission, to collect info • Most products use pseudonymous, but unique ID • Over 800 known freeware and shareware products contain Spyware, for example: • Beeline Search Utility • GoZilla Download Manager • Comet Cursor • Often difficult to uninstall! • Anti-Spyware Sites: • http://grc.com/oo/spyware.htm • http://www.adcop.org/smallfish • http://www.spychecker.com • http://cexx.org/adware.htm
Devices that monitor you Creative Labs Nomad JukeBox Music transfer software reportsall uploads to Creative Labs. http://www.nomadworld.com Sony eMarker Lets you figure out the artitst and title of songs you hear on the radio. And keeps a personal log of all the music you like on the emarker Web site. http://www.emarker.com Sportbrain Monitors daily workout. Customphone cradle uploads data to company Web site for analysis. http://www.sportbrain.com/ :CueCat Keeps personal log of advertisements you‘re interested in. http://www.crq.com/cuecat.html See http://www.privacyfoundation.org/
HTTP request that sets a Cookie Web Page Request Header GET /models/model_overview.asp?ModelName=S2000 HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */* Referer: http://www.google.com/search?hl=en&q=s2000 Accept-Language: en-us User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) Host: automobiles.honda.com Proxy-Connection: Keep-Alive Web Site which uses Cookies Web Page Response Header HTTP/1.1 200 OK Server: Microsoft-IIS/5.0 Date: Wed, 05 Oct 2005 19:51:07 GMT X-Powered-By: ASP.NET P3P: policyref="http://www.honda.com/w3c/privacy.xml ", CP="IDC DSP COR ADMiDEViTAIaPSAiPSDiIVAiCONi OUR SAMi IND PHY ONL COM NAV STA" pragma: no-cache cache-control: no-store Content-Length: 1435 Content-Type: text/html Expires: Sat, 18 Jan 1997 17:36:16 GMT Set-Cookie: bhCookieSaveSess=1; path=/ Set-Cookie: bhCookieSess=1; path=/ Set-Cookie: bhCookiePerm=1; expires=Fri, 07-Oct-2005 19:51:06 GMT; path=/ Set-Cookie: BrowserInfo=VBScript=True&BrowserOS=Win&Crawler=False&BrowserVer=6&BrowserName=IE; path=/ Cache-control: private
HTTP request after the cookie is set Web Site which uses Cookies Request Header after cookie is set GET /models/model_overview.asp?ModelName=S2000&bhcp=1 HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */* Accept-Language: en-us User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) Host: automobiles.honda.com Proxy-Connection: Keep-Alive Cookie: bhCookieSaveSess=1; bhCookieSess=1; bhCookiePerm=1; BrowserInfo=VBScript=True&BrowserOS=Win&Crawler=False&BrowserVer=6&BrowserName=IE; bhResults=bhjs=1; bhPrevResults=bhjs=1