Anomaly Detection Systems

Anomaly Detection Systems

Contents • Statistical methods • Systems with learning • Clustering in anomaly detection systems

Anomaly detection • Anomaly detection involves a process of establishing profiles of normal behaviour, comparing actual user/network behaviour to those profiles, and flagging deviations from the normal. • The basis of anomaly detection is the assertion that abnormal behaviour patterns indicate misuse of systems.

Anomaly detection • Profiles are defined as sets of metrics. Metrics are measures of particular aspects of user behaviour. • Each metric is associated with a threshold or range of values.

Anomaly detection • Anomaly detection depends on an assumption that users exhibit predictable, consistent patterns of system usage. • The approach also accommodates adaptations to changes in user behaviour over time.

Anomaly detection • The completeness of anomaly detection has yet to be verified (no one knows whether any given set of metrics is rich enough to express all anomalous behaviour).

Statistical methods • Parametric methods • Analytical approaches in which assumptions are made about the underlying distribution of the data being analyzed. • The usual assumption is that the distributions of usage patterns are Gaussian: x0 – mean  - standard deviation

Statistical methods • Non-parametric methods • Involve nonparametric data classification techniques - cluster analysis.

Statistical methods • The Denning’s model (the IDES model for intrusion). • Four statistical models may be included in the system: • Operational model • Mean and standard deviation model • Multivariate model • Markov process model. • Each model is suitable for a particular type of system metric.

Statistical methods • Operational model • This model applies to metrics such as event counters for the number of password failures in a particular time interval. • The model compares the metric to a set threshold, triggering an anomaly when the metric exceeds the threshold value.

Statistical methods • Mean and standard deviation model • A classical mean and standard deviation characterization of data. • The assumption is that all the analyzer knows about system behaviour metrics are the mean and standard deviations.

Statistical methods • Mean and standard deviation model (cont.) • A new behaviour observation is defined to be abnormal if it falls outside a confidence interval. • This confidence interval is defined as d standard deviations from the mean for some parameter d (usually d=3).

Statistical methods • Mean and standard deviation model (cont.) • This characterization is applicable to event counters, interval timers, and resource measures (memory, CPU, etc.) • It is possible to assign weights to these computations, such that more recent data are assigned greater weights.

Statistical methods • Multivariate model • This is an extension to the mean and standard deviation model. • It is based on performing correlations among two or more metrics. • Instead of basing the detection of an anomaly strictly on one measure, one might base it on the correlation of that measure with another measure.

Statistical methods • Multivariate model (cont.) • Example: • Instead of detecting an anomaly based solely on the observed length of a session, one might base it on the correlation of the length of the session with the number of CPU cycles utilized.

Statistical methods • Markov process model • Under this model, the detector considers each different type of audit event as a state variable and uses a state transition matrix to characterize the transition frequencies between states (not the frequencies of the individual states/audit records).

Statistical methods • Markov process model (cont.) • A new observation is defined as anomalous if its probability, as determined by the previous state and value in the state transition matrix, is too low/high. • This allows the detector to spot unusual command or event sequences, not just single events. • This introduces the notion of performing stateful analysis of event streams (frequent episodes, etc.)

Statistical methods • Markov process model (cont.) • Example - NIDES (Next-generation Intrusion Detection Expert System) • Developed by SRI (Stanford Research Institute) in 1990s. • Measures various activity levels. • Combines these into a single “normality” measure. • Checks this against a threshold. • If the measure is above the threshold, the activity is considered abnormal.

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • NIDES measures • Intensity measures • An example would be the number of audit records (log entries) generated within a set time interval. • Several different time intervals are used in order to track short-, medium-, and long-term behaviour.

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • NIDES measures (cont.) • Distribution measures • The overall distribution of the various audit records (log file entries) is tracked via histograms. • A difference measure is defined to determine how close a given short-term histogram is to “normal” behaviour.

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • NIDES measures (cont.) • Categorical data • The names of files accessed or the names of remote computers accessed are examples of categorical data used.

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • NIDES measures (cont.) • Counting measures • These are numerical values that measure parameters such as the number of seconds of CPU time used. • They are generally taken over a fixed amount of time or over a specific event, such as a single login. • Thus, they are similar in character to intensity measures, although they measure a different kind of activity.

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • The different measurements each define a statistic Sj . • These measurements are assumed (constructed to be) appropriate (this includes normalization), and are combined to produce a 2-like statistic:

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • A more complicated measure would include the correlation between the events (as was done with IDES): • Here, C is the correlation matrix between Si and Sj for all i and j. IS is called the IDES score.

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • NIDES compares recent activity with past activity, using a methodology that amounts to a sliding window on the past. • Thus it is designed to detect changes in activity and to adapt to new activity levels.

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • NIDES intensity measures are counts of audit records per time unit etc. • This provides an overall activity level for the system. • These are updated continuously rather than recomputed at each time interval.

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • Possible elements that can be monitored with this basic idea: • Average system load. • Number of active processes. • Number of E-mails received. • Different types of audit records (can be tracked separately).

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • The obvious extension of the intensity measures idea is to track the different types of audit records. • This leads to a distribution (histogram) for the audit records. • Similarly, one could track the sizes of E-mail messages received, or the types of files accessed. • These can be updated continuously. • Distributions are then compared by means of a squared error metric.

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • Categorical measures can be for example the names of files accessed. • They are treated just like distributional measures. • Now each bin corresponds to a categorical, while with distributional measures the bin can correspond to a range of values. • The updates are still done continuously.

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • All the measures are combined into the T2 statistic. • The value is compared with a threshold to determine if the activity is “abnormal”. • The threshold is usually set empirically, based on the observed network behaviour in some period of time.

Statistical methods • Markov process model (cont.) • Example - NIDES (cont.) • NIDES produces a single, overall measure of “normality”, which could allow further investigation into the components that make up the statistic upon an alert. • The problem with this is that an unusually low value for one statistic can mask a high one for another – multifaceted measures are more useful.

Statistical methods • Advantages of parametric approach • Statistical anomaly detection could reveal interesting, sometimes suspicious, activities that could lead to discoveries of security breaches.

Statistical methods • Advantages of parametric approach (cont.) • Parametric statistical systems do not require the constant updates and maintenance that misuse detection systems do. • However, metrics must be well chosen, adequate for good discrimination, and well-adapted to changes in behaviour (that is, changes in user behaviour must produce a consistent, noticeable change in the corresponding metrics).

Statistical methods • Disadvantages of parametric approach • Batch mode processing of audit records, which eliminates the capability to perform automated responses to block damage. • Although more recent systems attempt to perform real-time analysis of audit data, the memory and processing loads involved in using and maintaining the user profile knowledge base usually cause the system to lag behind audit record generation.

Statistical methods • Disadvantages of parametric approach (cont.) • The nature of statistical analysis reduces the capability of taking into account the sequential relationships between events. • The exact order of the occurrence of events is not provided as an attribute in most of these systems.

Statistical methods • Disadvantages of parametric approach (cont.) • Because many anomalies indicating attack depend on such sequential event relationships, this situation represents a serious limitation to the approach. • In cases when quantitative methods (Denning's operational model) are utilized, it is also difficult to select appropriate values for thresholds and ranges.

Statistical methods • Disadvantages of parametric approach (cont.) • The false positive rates associated with statistical analysis systems are high, which sometimes leads to users ignoring or disabling the systems. • The false negative rates are also difficult to reduce in these systems.

Statistical methods • Nonparametric measures • One of the problems of parametric methods is that error rates are high when the assumptions about the distribution are incorrect.

Statistical methods • Nonparametric measures (cont.) • When researchers began collecting information about system usage patterns that included attributes such as system resource usage, the distributions were discovered not to be normal. • Then, including normal distribution assumption into the measures led to high error rates.

Statistical methods • Nonparametric measures (cont.) • A way of overcoming these problems is to utilize nonparametric techniques for performing anomaly detection. • This approach provides the capability of accommodating users with less predictable usage patterns and allows the analyzer to take into account system measures that are not easily accommodated by parametric schemes.

Statistical methods • Nonparametric measures (cont.) • The nonparametric approach involves nonparametric data classification techniques, specifically cluster analysis. • In cluster analysis, large quantities of historical data are collected (a sample set) and organized into clusters according to some evaluation criteria.

Statistical methods • Nonparametric measures (cont.) • Preprocessing is performed in which features associated with a particular event stream (often mapped to a specific user) are converted into a vector representation (for example, Xi = [f1, f2, ..., fn] in an n-dimensional state).

Statistical methods • Nonparametric measures (cont.) • A clustering algorithm is used to group vectors into classes by behaviours, attempting to group them so that members of each class are as close as possible to each other while different classes are as far apart as they can be.

Statistical methods • Nonparametric measures (cont.) • In nonparametric statistical anomaly detection, the premise is that a user's activity data, as expressed in terms of the features, falls into two distinct clusters: one indicating anomalous activity and the other indicating normal activity.

Statistical methods • Nonparametric measures (cont.) • Various clustering algorithms are available. These range from algorithms that use simple distance measures to determine whether an object falls into a cluster, to more complex concept-based measures (in which an object is "scored“ according to a set of conditions and that score is used to determine membership in a particular cluster) . • Different clustering algorithms usually best serve different data sets and analysis goals.

Statistical methods • Nonparametric measures (cont.) • The advantages of nonparametric approaches include the capability of performing reliable reduction of event data (in the transformation of raw event data to vectors). • This effect may reach as high as two orders of magnitude compared to the classical approach that does not include vectors.

Statistical methods • Nonparametric measures (cont.) • Other benefits are improvement in the speed of detection and improvement in accuracy over parametric statistical analysis. • Disadvantages involve concerns that expanding features beyond resource usage would reduce the efficiency and the accuracy of the analysis.

Systems with learning • Two phases of system operation: • The learning phase, in which the system is taught what a normal behaviour is. • The recognition phase, in which the system classifies the input vectors according to the knowledge acquired in the learning process. • These systems also include a conversion of raw data into feature vectors.

Systems with learning • Example: Neural networks • Neural networks use adaptive learning techniques to characterize anomalous behaviour. • This analysis technique operates on historical sets of training data, which are presumably cleansed of any data indicating intrusions or other undesirable user behaviour.

Systems with learning • Example: Neural networks (cont.) • Neural networks consist of numerous simple processing elements called neurons that interact by using weighted connections. • The knowledge of a neural network is encoded in the structure of the net in terms of connections between units and their weights. • The actual learning process takes place by changing weights and adding or removing connections.

Anomaly Detection Systems

Anomaly Detection Systems

Presentation Transcript

Global Router-based Anomaly/Intrusion Detection (GRAID) Systems

Data Mining Anomaly Detection

Anomaly Detection

Data Mining Anomaly Detection

Population-Wide Anomaly Detection

Anomaly Detection

Benchmarking Anomaly-based Detection Systems

Anomaly Detection and Mitigation

Anomaly Detection Systems

Single Pass Anomaly Detection

Misuse and Anomaly Detection

Traffic Anomaly Detection

Volume Anomaly Detection

ITEC 810 Entropy based anomaly detection systems

Anomaly Detection: A Tutorial

Global Anomaly Detection Market

Global Anomaly Detection Market

Benchmarking Anomaly-Based Detection Systems

Example of Anomaly Detection

Benchmarking Anomaly-based Detection Systems

Anomaly Detection Industry