Future Issues Vern Paxson September 20, 2005
Bots • Likely increasingly relevant compared to worms with commercialization of malware, due to finer-grained, lower profile • Blended version: use worm to get bots (or use bots to flash worm) • Today, detectable via monitoring IRC • near term: need IRC-on-random-port detectors (easy) • longer term: arms race • Would like to understand protocols, intent behind uses to not only monitor but actively participate ... • ... modulo pesky issue of legality • may be able to fake up lame bots in the near-term • likely labor-intensive
Camouflage • If we're discovered, attackers may • avoid, diminishing our analysis • mislead, skewing our analysis • bombard, complicating our lives • How much time do we have before attackers care about us? • depends on publicity we engender • also, may detect us incidentally since care about thwarting other efforts based on similar technology
Camouflage, Con’t • Hiding the telescope feeds • would like highly diverse set of endpoints • in fact, want this even w/o camouflage problem • end-user clients? trust & privacy? • dynamic endpoints (prevents long-term address-based fingerprinting) - exploit DHCP? • client takes leases, tunnels them back • modified dhcpd that tunnels unallocated addresses • issues WRT outbound response? • Hiding the faux interaction • how do we prevent fingerprinting based on front-end filtering? • fingerprinting based on VM detection? • Fundamental: fingerprinting based on containment?
Attracting traffic • How do we plug our honeyfarm capabilities into application-level topologies? • join P2P networks, IM buddy lists, Web links, IRC? • How do we get indexed in directories? • DNS zone files, email address books, host content so Google finds us? • Opportunistic use of hotspots? • e.g., 18.104.22.168/23; acme.com • Any principles we can pursue rather than one-offs? Can we leverage failures? • e.g., SMTP servers when asked for non-existent name might occasionally tunnel to us • e.g., Web servers on 404 might send to us
Attracting traffic, con’t • How do we plug our honeyfarm capabilities into application-level Going out and looking for trouble • partnering with MSR Strider Honeymonkeys? • Eliza technology for agitating eleet IRC channels? • These all add to camouflage issues • Long shot: sniff live traffic and replay it looking for trouble
Worm growth prediction • How can we predict early in the onset just how a worm will evolve? • Relates to "situational awareness" issues • Would like to have a quantitative basis for worm "threat level” • Basic idea: take analytic model for growth and invert it • fit measured probe behavior and solve for growth rate and size of susceptible population • Challenges • noisy telescope data leads to big swings in prediction • doable for pure random worms; more complex target selection tougher • possible to incorporate service density information to improve prediction quality early in life cycle?
VM-based replay/introspection • Speculative execution potentially great boon to containment • Introspection key technology for detecting malware that does not manifest in network activity • monitor and manipulate guest OS from safety of VMM • Utility of replay • can try malcode against "panel" of OS/application versions • can "undo" effects (at least local effects) of infection (possible defense strategy) • can crudely evaluate whether host infected _after_ infection, then replay with high level of instrumentation turned on • ReVirt from Xen available soon (Umich)
Host-level Analysis • How to understand what a piece of malcode does • Doing in real-time in generality appears intractable • at least, given our resources • one could picture having technology for matching malware components against a library of programming idioms • presumably commercial AV is way ahead of us here • Possibly doable in empirical fashion, given replay + introspection, for understanding behavior @ high-level ... • ... if code not written to evade • Extracting abstract malcode descriptions • Take known instance of exploit (provided by honeyfarm) • Replay to generate execution/state transition stream • Attempt to generalize to variants exploiting the same flaw • Distill broad host-based signature • Honeyfarm as oracle for finding bad things, facilitates replay
Legal issues • Liability issues WRT our use of honeypots • Evidentiary issues about how well some of our detection/identification technologies would serve in court
TAB Questions Revisited • Are we considering the right threats? • what about mobile phone malware, spyware, bots, etc. • Are there technical approaches we should be considering? • Are we missing any important partnership opportunities? • for data, technology, or expertise? • Are we missing any key capabilities on our team? • What education/training is necessary/missing for practitioners in the field? What should we be thinking about incorporating into our outreach workshop in this regard?