Spam and Botnet Reputation Randomized Control Trials and Policy

John S. Quarterman Quarterman Creations antispam@quarterman.com Leigh L. Linden Department of Economics University of Texas at Austin leigh.linden@austin.utexas.edu Qian Tang, Gene Moo Lee,Andrew B. Whinston Center for Research in Electronic Commerce University of Texas at Austin abw@uts.cc.utexas.edu Supported by NSF grants no. 1228990 and 081338; the usual disclaimers apply Spam and Botnet ReputationRandomized Control Trials and Policy

Abstract • Formal randomized control trials (RCT) to test outbound spam rankings as proxy for Internet security: implementation and policy issues. • Ongoing, frequent, regular, comprehensive time-series data; very different from traditional medical or educational RCT. • Endogeneity, causality, botnet migration, other non-obvious features, difficulties, solutions. • Implications for public policy, governmental, trade organizations, participation, etc.

Reputational Incentives for Organizational Cooperation • Detection is much more important than prevention –Bruce Schneier • Yet orgs try to secure themselves separately • Hoping nobody can see their security problems • Few cross-organizational metrics to compare their competitors; even fewer the public can see • Like baseball with no scoreboard and no spectators: who can tell who's winning or losing?

Background • 50 years of social comparison theory plus recent econometric extensions to organizations, including on the Internet, indicate reputational comparisons of competitors should change their behavior (see paper and WEIS 2013 paper for references). • Current emphasis on “the importance of implementing a dynamic, continuous improvement process” (NIST Draft Cybersecurity Framework)

A proxy, not a panacea • Outbound spam is a proxy for poor infosec • Data available from anti-spam blocklists • A sneeze indicates disease, and outbound spam indicates poor infosec • Test reputational rankings as ongoing incentives (and measurements) for org infosec improvements • Provide drilldowns to help orgs improve • Seldom tried, never before with RCT • Won't solve all problems, but could help many

Research Objective • Evaluate potential infosec improvement through disclosure of security performance • Presentation is as important as disclosure: • Appearance, delivery, frequency, regularity, comprehensiveness, participation • Many potential policy venues: • Legislation, stock markets, trade groups, user groups, and more reputational rankings • All this augments, enables, and evaluates pre-existing security methods

Active public disclosure On the web Organized Frequent, regular Publicized Helps all orgs with their security issues Promotes competition and cooperation Active vs. passive disclosure • Traditional passive disclosure • Obscure databases • Arbitrary order • Infrequent • Not publicized • Can make disclosed companies targets • VERIS Community Database of breaches is a step forward

SpamRankings.net, est. 2011

Rankings of 1,000s of orgs Just U.S. (homogeneity) All orgs by SIC | NAIC codes Composite Borda + CBL and PSBL volume and host Website + email to treated orgs + marketing campaign + control Search page Some drilldowns included From top 10 worldwide to all U.S. • Top 10 ASN rankings • World, US, CA, BE, TR • One industry ranking • CBL and PSBL volume rankings • Website + ad hoc publicity • No search • Custom drilldowns

Why just the United States? • Heterogeneity: same language, mostly the same culture and legal regime • Enough organizations for RCT (more than 2,000) • Reserves rest of the world for further experiments • Organizational characterization: availability of information for stratification by industry (SIC, NAICS) and within industry (ISP vs. hosting, etc.)

Search

Data Sources: Spam is already disclosed! • Outgoing spam usually from botnets or phished accounts: computers' owners often don't know • Bot herders will try to infest any Internet-connected computer that can send email, not just ISPs • Anti-spam blocklists use spamtraps (and other) to collect lists of spamming addresses for use in blocking spam • We get some of those lists (plus volume and other custom data) from PSBL and CBL blocklists (and others not yet used in published rankings) • Each blocklist has its own spamtraps and detection methods; we continually compare them

Live time-series data, not surveys • More detailed; More independent • We don't need cooperation to rank an org • If an organization stops sending spam, that's good • Not just ISPs (or just banks, or just...): all orgs • Every monthly ranking is a longitudinal study • Often do much longer studies from our archives • Which continually accumulate: ongoing, not static • Continually tuning; can layer on new treatments

Data Analysis • Data aggregation: IP address to netblock to Autonomous System (AS), and now to organization, using BGP routing data now separately for each day via Team Cymru and CBL, accounting for overlaps at each level to avoid overcounting volume • Plus: botnet labels from CBL, manual categorizations (Edu, Hosting, ISP, Medical, Financial, etc.), website, services offered, etc.: plus SIC and SAIC codes from LexisNexis and other sources • Choose groupings and process data into rankings, with tables and google chart line, pie, and bar charts; plus composite Borda count ranking • Many technical hours dedicated to data integrity, plusautomated alarms at each pipeline step

An organizational ranking page

Can't do ranking tables:can't list orgs in control group

Can do You are Hereon distribution graph

This also goes in treatment email Can spell out rank and percent

Composite Borda score • Universe n ~ 9,000 • For each constituent ranking • Points = n – rank • Borda score = sum of points for all rankings

Constituent rankings

Composite Borda ranking • Sort orgs by composite Borda score • Highest score is highest rank, rank 1 • Score zero is lowest rank, with a very low rank number (many orgs with this rank)

Detail to help orgs find problems

Drilldowns • Microsoft asked why #1 on U.S. Rankings from PSBL data: drilldowns said because of Hotmail • Medical rankings typically result from very small numbers (1-5) of addresses: easy to find and fix • Infestation by a botnet seen elsewhere (Festi?) can indicate trying infosec that worked elsewhere • We can supply such information; doing so promotes collaboration and improved infosec • Drilldowns are also publicization and treatment

RCT Groups • Treatment group(s): how many? • More help separate independent variables • But also decrease sample size per group • Control group: one (1) • Researchers can see rankings & drilldowns • Nobody else can • Although there are ways orgs can guess some aspects: it's the Internet, not a private medical study

The endogeneity problem(causal relationship) • Reduced spam is good: lower rank shows that • But we really want improved security • How to distinguish that from botnets or spammers moved to control group? • AKA botnet migration (special case of botnet epidemiology) • Need some treatments bot herders and spammers can't see: private treatments • Plus drilldowns to help orgs fix security

Treatment and Control Groups • Treatment choices: • Public or private • Public: on public website • Private: only treated org can see org web page • Directly contact org (email) or not • Drilldown or not • Doesn't mean 2 * 2 * 2 = 8 treatment groups • Can layer on successive treatments over time • Probably add more later as we think of them • Or as treated orgs request new features

From chronic spamming to chronic not Resistance

Resistance • Keeps botnets out a long time • Before resistance, orgs just keep spamming • Medical orgs developed resistance in 2011: much less spam for a long time • Botnet migration might look like resistance • but if many botnets are ejected from the same org, more likely due to security changes • So resistance to multiple botnets probably isn't botnet migration

Bots get back in and spam gets out, but fixed quickly Resilience

Resilience • Ejects spamming quickly • Medical orgs didn't keep spammers out forever • But when spam comes back, it's ejected quickly now • Resilience is not botnet migration: • it's a bot getting back in and being ejected • Don't need additional treatments or treatment groups to measure resistance or resilience

Additional rankings • In the NSF proposal: add phishing rankings • Derived from Anti-Phishing Working Group (APWG) database • If one ranking is publicized and another is not • yet an org improves on both rankings • Likely the org improved its underlying security • Not botnet migration (bots don't spam and phish) • Also org did not just squelch outbound spam • Possibly more rankings from other data sources

Endogeneity, causality, security • Private treatments • Drilldowns • Resistance to multiple botnets* • Resilience* • Multiple rankings* • That's at least five ways to examine statistically • *Most of which don't require more treatment groups • Plus spot checks by asking orgs what they did

Interaction • Contacts • Send only one (1) email per org • Which email address to use? • Delivery • Email must be portable across internal departments • Intelligible to all: sports score analogy, FAQ, etc. • Social media, traditional press, etc. • Responses • CRM, integrating email, voice, social media

Policy Implications • Principal objective: demonstrate security reporting can enable solutions for improved infosec • Beyond passive disclosure to active reputation • Disclosure → rankings → reputation → competition and cooperation → better security

Possible requirers of disclosure • Governments (legislation and regulation) • Trade associations (best practices) • Standards bodies (standards) • Stock markets (rules) • Insurers (policy requirements) • Customers (vote with their purchases) • The public (societal norms)

Not all private; not all government • Government has not addressed infosec well • Top-down mandates aren't enough • Private parties cannot alone beat miscreants • Whose big advantage is participation and cooperation • Need multi-level multi-organizational loose cooperation for Internet commons governance • Beyond producer-consumer to participatory competition • Transparent disclosure to enable rewards for good reputation in the public goods game

Many levels of policy implications • Customers differentiate vendors (from ISPs to hospitals) based on infosec • Organizations compare to their peers and reduce their risk of downtime, information theft, and loss of business due to customers going elsewhere • Society should benefit through fewer outages, less information theft, better allocation of infosec that actually works, etc. • National and world security should be improved by more robust infosec less vulnerable to attack

Acknowledgements • This material is based upon work supported by the National Science Foundation under Grants No. 1228990 and 0831338. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. • We also gratefully acknowledge custom data from CBL, PSBL, the University of Texas Computer Science Department, Quarterman Creations, and especially Team Cymru. None of them are responsible for anything we do, either.

Spam and Botnet Reputation Randomized Control Trials and Policy

Spam and Botnet Reputation Randomized Control Trials and Policy

Presentation Transcript

Randomized controlled trials

Randomized Controlled Trials

Missing Data in Randomized Control Trials

WHY LARGE-SCALE RANDOMIZED CONTROL TRIALS?

Randomized Controlled Trials

Missing Data in Randomized Control Trials

Group-Randomized Trials

RANDOMIZED TRIALS

Randomized Control Trials

Surrogate Endpoints and Non-randomized Trials

Randomized Control Trials (RCTs)

Botnet-generated Spam

Randomized Trials Outcomes and Adverse Events

Randomized Control Trials for Agriculture

RANDOMIZED TRIALS

Randomized Trials

Monitoring Randomized Trials

Botnet and Spam Detection in High-Speed Networks

Missing Data in Randomized Control Trials

Randomized Control Trials (RCTs): Key Considerations and Technical Components

Randomized Control Trials

WHY LARGE-SCALE RANDOMIZED CONTROL TRIALS?