User Modeling Through Symbolic Learning: The LUS Method and Initial Results

User Modeling Through Symbolic Learning:The LUS Method and Initial Results Guido Cervone Ken Kaufman Ryszard Michalski Machine Learning and Inference Laboratory School of Computational Sciences George Mason University Fairfax, VA, USA {cervone, kaufman, michalski}@gmu.edu http://www.mli.gmu.edu

Research Objectives The main objectives of this research are: (1) To develop a new methodology for user modeling, called LUS (Learning User Style) (2) To test and evaluate LUS on datasets consisting of real user activity (3) To implement an experimental computer intrusion detection system based on the LUS methodology

Main Features of LUS • User models are created automatically through a process of symbolic inductive learning from training data sets characterizing users’ interaction with computers • Models are in the form of symbolic descriptions based on attributional calculus, a representation system that combines elements of propositional logic, first-order predicate logic, and multiple-valued logic • Generated user models are easy to interpret by human experts, and can thus be modified or adjusted manually • Generated user models are evaluated automatically on testing data sets using an episode classifier

Terminology • An eventis a description of an entity (e.g., a user activity) at a given time or during a given time period, represented by a vector of attribute-values that characterizes the use of the computer by a user at a specific time. • A session is a sequence of events characterizing a user’s interaction with the computer from logon to logoff. • An episode is a sequence of user states extracted from a session that is used for training or testing/execution of user models; it may contain consecutive states or selected states from a session(s). • In the training phase (during which user models are learned) it is generally desirable to use long episodes, as this helps to generate more accurate and complete user models. In the testing(orexecution) phase it is desirable to be able to use short episodes, so that a legitimate or illegitimate user can be identified from as little information as possible.

Approach • System polls active processes every half-second and logs information on the processes and the users responsible for them • Data extracted from the logs takes the form of vectors of values of nominal, temporal and structured attributes • Initial experiments concentrated on one attribute, mode, a derived attribute based on the class of process that was running (e.g., compiler) • Data from successive records are combined into n-grams, e.g., <compiler, print, web, print> • Sets of n-grams comprising an episode are passed to the AQ20 learner

AQ20 Algorithm Application • Each training n-gram is used as an example of the class representing the user whose activity it reflects. • To learn a user’s profile, AQ20 divides the n-grams into positive examples (examples representing the user whose profile is being learned) and negative examples (examples representing other users’ activities) • AQ20 searches for maximal conjunctive rules that cover positive examples, but not negative ones, and selects the best ones according to user-specified criteria • The rule:[User = 1] if [mode1 = compiler] and [mode2 and mode4 = print] will be returned in the form:[User = 1] <=<compiler, print, *, print> • Rules and conditions may be annotated with weights (e.g., p, n, u)

EPICn: Episode Classification User IdentificationMatching Episodes with n-gram Patterns • EPICn matches episodes with n-gram-based patterns of different users’ behavior and computes a degree of match for each user • EPIC employs the ATEST program for matching individual events with patterns • The results from ATEST for each n-gram in the episode are aggregated to give overall episode scores for each class (profile) • EPIC allows flexible classification: all classes whose scores are both above theepisode thresholdand within theepisode toleranceof the best achieved scored are returned as classifications

Experiments • Two sets of preliminary experiments were performed for different training and testing data sizes. • Small: First 7 users (SD) • Large: All 23 users (LD) • Rules were learned with AQ19 and AQ20, using different control parameters (TF, PD, LEF -- 3 different for SD and LD) • EPICn was used to test the learned hypotheses.

Data Used in the Experiments • 24 users for a total of 4,808,024 4-grams. • Each user has different number of sessions, each varying in length. • The data contains many repetitions. • This is by far the largest dataset AQ20 has been applied to.

Distribution of the Sessions for Each User

4-grams for Each User

Experiment 1 A Sample of Results from AQ20 (7 Users) [user = 0] <{explorer,web,office,sql,rundll32,system,time,install}, {explorer,web,logon,rundll32,system,time,install}, {explorer,web,office,logon,printing,rundll32,system,time,install} {web,office,rundll32,system,time,install,multimedia}> : pd=171,nd=52,ud=27, pt=2721, nt=710, ut=160, qd=0.372459, qt=0.60304 [user = 1]  <{netscape,msie,telnet,explorer,web,acrobat,logon,system,welcome,help}, {netscape,msie,telnet,explorer,web,acrobat,logon,rundll32,welcome,help}, {netscape,msie,telnet,explorer,web,acrobat,logon,printing,welcome,dos,help}, {netscape,msie,telnet,explorer,web,acrobat,logon,welcome,dos,help}> : pd=260,nd=54,ud=28,pt=20713,nt=132,ut=2019,qd=0.610064,qt=0.986564 ...................

Distribution of Positive and Negative Events In the Training Set for Each User(80% of total data; the rest 20% constituted the testing dataset)

Predictive Accuracy of User ModelsGenerated Using PD mode and LEF1 (MaxNewPositives,0; MinNumSelectors,0)

Predictive Accuracy of User ModelsGenerated Using PD mode and LEF2 (MaxQ,0;MaxNewPositives,0; MinNumSelectors,0)

Predictive Accuracy of User ModelsGenerated Using PD mode and LEF3 (MaxTotQ,0; MaxNewPositives,0; MinNumSelectors,0)

Sample Results Using TF

Sample Rules for User 0PD mode LEF1 • # -- This learning took: # -- System time 10.45 # -- User time 10 # -- Number of stars generated = 46 # -- Number of rules for this class = 42 # -- Average number of rules kept from each stars = 1 # -- Size of the training events in the target class: 345 # -- Size of the training events in the other class(es): 5236 # -- Size of the total training events in the target class: 3573 # -- Size of the total training in the other class(es): 616828 [User = 0] • <{mail,office,printing,rundll32,system,time,install}{web,rundll32,system,time,install}{explorer,web,mail,office,logon,rundll32,system,install,multimedia} {explorer,web,office,logon,sql,rundll32,system,help,install,multimedia}> : pd=149,nd=20,ud=22,pt=2490,nt=75,ut=37,qd=0.377406,qt=0.676398 • <{explorer,web,office,sql,rundll32,system,time,install,multimedia} {explorer,web,office,logon,sql,rundll32,system,time,install} {web,office,logon,sql,printing,rundll32,system,time,install}{web,rundll32,system,time,install}> : pd=136,nd=30,ud=8,pt=2481,nt=1148,ut=14,qd=0.318267,qt=0.473443 • <{explorer,web,rundll32,system,multimedia}{explorer,system,time,install}{explorer,rundll32,system,time,install,multimedia}{explorer,rundll32,system,time,install,multimedia}> : pd=107,nd=21,ud=32,pt=2453,nt=930,ut=474,qd=0.255909,qt=0.496713

Experiment 2 • In this experiment hypotheses were generated to describe the behavior of all 24 users • The training set consisted of approximately 4 million 4-grams • The testing set consisted of approximately 1 million 4-grams

Description of Experiment 2 • Experiments were performed using 20% and 100% of the training set (which constituted 80% of the sessions that make up the training set) • Experiments were performed in PD and TF modes • Three different LEFs were used: • LEF1: (TF MODE) <MaxNewPositives,0; MinNumSelectors,0> • LEF2: (TF MODE) <MaxEstimatedPositives, 0; MinEstimatedNegatives, 0; MaxNewPositives,0, MinNumSelectors,0> • LEF3: (PD MODE) MaxQ, 0, MaxNewPositives,0, MinNumSelectors,0

Experiment 2 • When combining all of a user’s testing data into a single long episode, out of the 24 users: • 20 users classified correctly. • 3 users could not be classified because the degrees of match of the best-scoring users were insufficiently separated • 1 user was classified incorrectly

Users 0-2

Users 3-5

Users 6-8

Users 9-11

Users 12-14

Users 15-17

Users 18-20

Users 21-23

Sample Rules for User 0 • # -- This learning took: # -- System time 767.15 # -- User time 768 # -- Number of stars generated = 57 # -- Number of rules for this class = 52 # -- Average number of rules kept from each stars = 1 # -- Size of the training events in the target class: 346 # -- Size of the training events in the other class(es): 71931 # -- Size of the total training events in the target class: 1826 # -- Size of the total training in the other class(es): 3750169[user=0] <- <explorer,install,multimedia,system,time> <multimedia,system> <explorer,install,system> <explorer,install,multimedia,system> : pd=64,nd=31,ud=8,pt=916,nt=404,ut=11,qd=0.124322,qt=0.348035 # 18648 <- <explorer,install,office,rundll32,system,time> <multimedia,system> <install,multimedia,rundll32,system,time> <explorer,install,rundll32,system,time> : pd=68,nd=42,ud=9,pt=919,nt=73,ut=11,qd=0.121131,qt=0.466232 # 24747 <- <explorer,help,install,mail,multimedia,rundll32,system,time,web> <help,install,logon,mail,office,rundll32,system,time,web> <help,install,mail,office,printing,rundll32,system,time,web> <help,install,rundll32,system,time> : pd=140,nd=343,ud=41,pt=1316,nt=701,ut=66,qd=0.1159,qt=0.470102 # 5068 <- <install,office,printing,system> <install,rundll32,time> <install,multimedia,office,sql,system,web> <explorer,install,multimedia,rundll32,system,web> : pd=43,nd=4,ud=2,pt=397,nt=4,ut=2,qd=0.11365,qt=0.215245 # 7642

Best Rule for User 23 • # -- This learning took: # -- System time -39.9073 # -- User time 4256 # -- Number of stars generated = 658 # -- Number of rules for this class = 533 # -- Average number of rules kept from each stars = 1 # -- Size of the training events in the target class: 9712 # -- Size of the training events in the other class(es): 40602 # -- Size of the total training events in the target class: 1337548 # -- Size of the total training in the other class(es): 2063808 • [user=23> <- <ControlPanel,activesync,id,mail,multimedia,netscape,network,spreadsheet,system,wordprocessing> <ControlPanel,activesync,explorer,id,logon,mail,msie,multimedia,netscape,network,printing,spreadsheet,web,wordprocessing> <ControlPanel,activesync,mail,multimedia,netscape,printing,spreadsheet,wordprocessing> <ControlPanel,activesync,mail,multimedia,netscape,spreadsheet,web,wordprocessing> : pd=4685,nd=878,ud=975,pt=1296647,nt=1166,ut=34254,qd=0.388046,qt=0.967985 # 3524022

Experiments with Smaller Test Episodes • In experiments with 150 session-sized testing episodes, some performed with traditional “best matching” and others with threshold-tolerance matching, identification accuracy was as follows: • Traditional ATEST (Rform) scoring, threshold-tolerance matching: 169 classifications, 75 correct, 84 incorrect • Traditional, best only matching: 71 (47.3%) correct • Simple scoring, threshold-tolerance matching: 165 classifications, 117 correct, 48 incorrect • Simple scoring, besst-only matching: 112 (74.7%) correct

Prediction-Based Approach In the prediction-based approach, events characterizing a user are pairs <predecessor, successor>, where: predecessor is a sequence of lb states of the user (in the experiments, modes) that directly precede a given time instance t, and successor is a sequence of lf states of the user (in the experiments, modes) that occur immediately after t. Parameters lb and lf, called look-back and look-forward respectively, are determined experimentally.

An Initial Small Experiment Rules were learned using decomposition model with lookback of 1, 2, 3, and 4. The results provided by EPICp were as follows: CONFUSION MATRIX Data-1 Data-2 Data-3 User 1: 374 86 66 User 2: 202 141 130 User 3: 176 97 557

Topics for Further Research • Comparative study of the n-gram-based methodology for currently available datasets using different control parameters • Study performance degradation on reduced session size • Annotate process tables with window information • Testing the ability to identify unknown users • Development and implementation of a prediction-based approach using a dedicated program sequential pattern discovery (SPARCum) • Employment of multivariate representation, e.g., <mode, process name, time> • Improving the representational space through constructive induction • Handling drift and shift of user models • Coping with incremental growth and change in the user population

Conclusions • LUS methodology uses symbolic learning to generate user signatures • Unlike traditional classifiers, EPICn classifies based on episodes rather than individual events • Initial experiments have been promising, but several real world situations have yet to be addressed in full • Multistrategy approaches may lead to further performance improvement

User Modeling Through Symbolic Learning: The LUS Method and Initial Results