1 / 1

A Renewal Theory Approach to Anomaly Detection in Communication Networks

= ?. = !. =. +. +. +. t = 1. t = 2. t = 3. t = 4. Summary graph. Day 220:. 0.3. 0.6. 0.1. 0.5. 0.7. 0.5. 0.2. 0.9. 0.4. Day 250:. 0.8. 0.7. 0.3. 0.1. 0.3. 0.2. Sorted by degree. Recency. MCD Analysis. x max. x min. Brian Thompson † bthom@cs.rutgers.edu

Télécharger la présentation

A Renewal Theory Approach to Anomaly Detection in Communication Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. = ? = ! = + + + t = 1 t = 2 t = 3 t = 4 Summary graph Day 220: 0.3 0.6 0.1 0.5 0.7 0.5 0.2 0.9 0.4 Day 250: 0.8 0.7 0.3 0.1 0.3 0.2 Sorted by degree Recency MCD Analysis xmax xmin Brian Thompson† bthom@cs.rutgers.edu †Rutgers University Tina Eliassi-Rad†‡ eliassirad1@llnl.gov ‡Lawrence Livermore Lab 1 1 1 Introduction/Motivation Our Approach Experimental Results 4 1 1 2 1 1 • Consider the weighted graph Gt = (V,E) representing a communication network at time t, with w(e) = Rec(e,t) • For , let XE’,p= # of edges in E’ with w(e) ≤ p • We define the p-divergence of E’ as follows: • Experiments on 4 datasets: Enron email, LBNL IP traffic, Twitter messages, and Reality Mining Bluetooth proximity • Clear and intuitive visualization reveals anomalous activity in the Bluetooth dataset at two points in time • A communication network is a time-evolving graph that models interactions between entities over time • Pervasive in today’s world: phone calls, blog posts, email, social network messages, IP connections • Volatile: static network analysis tools not sufficient • Goal: Efficiently identify local or global changes in communication activity or graph structure over time , where X ~ Bin(|E’|,p) A Renewal Theory Approach to AnomalyDetection in Communication Networks • Let E’ be the set of thick edges • |E’| = 6 • XE’,0.3 = 4 • P(X ≥ 4) = 0.07 • Div0.3(E’) = 14.2 Model • The max-divergence of E’ is: • Intuitively, p-divergence of d means that the probability of at least XE’,pedges occurring p-recently is 1/d • A (maximal) p-component of G = (V,E) is a connected subgraph C = (V’,E’) such that (1) w(e) ≤ p for all e in E’ and (2) w(e) > p for all e not in E’ incident to V’ • The set of p-components partition V, for all p in [0,1] • The p-components of Gt for p = 0.3 are shown in blue • Communication across an edge is modeled as a sequence of time-stamped events, which yields a distribution of inter-arrival times (IATs) • A simple plot of MCD over time (left) identifies hand-labeled scanning activity in the LBNL dataset, as well as other anomalies overlooked by human analysts • The plot at right shows scalability using the Twitter dataset (263k nodes, 308k edges, 1.1 million timestamps)  • IATs for human interaction frequently follow a power-law distribution • The Bounded Pareto allows us to model communication concisely, and make updates in real-time and constant space Algorithm • The MCD Algorithm: • Calculate edge weights using the Recency function • Gradually increase the edge threshold, updating components and divergence values as necessary • Output: Disjoint components with max divergence Conclusions • The recency function Rec : 2T x T → [0,1] assigns a weight to edge e at time t based on its age, i.e. the time since the last event, subject to the constraints: • Rec is uniquely determined by the constraints • The uniformity property eliminates time-scale bias • Traditional network analysis is inadequate for dealing with communication networks, which are dynamic and volatile • Studying the inter-arrival time distributions of edges is a novel approach for analyzing communication networks • Our algorithms are streaming, and run in O(m) space and O(m log m) time, where m is the # of edges in the dataset • MCD analysis can be easily visualized and used as a tool for monitoring activity in a variety of real-world domains • Rec(e,t) = 0 at the time an event occurs, 1 when age = xmax, and is increasing in between • Rec(e,t) is uniform over [0,1] when sampled uniformly in time This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. IM Review and Release number<Insert number here>

More Related