1 / 58

Sequence Classification Using Statistical Pattern Recognition

. Sequence Classification Using Statistical Pattern Recognition. José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis Computer Science Department Universidad Carlos III de Madrid Avda. de la Universidad, 30. 28911 Leganés, Spain {jiglesia, ledezma, masm}@inf.uc3m.es. Outline.

sunila
Télécharger la présentation

Sequence Classification Using Statistical Pattern Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. . Sequence Classification Using Statistical Pattern Recognition José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis Computer Science Department Universidad Carlos III de Madrid Avda. de la Universidad, 30. 28911 Leganés, Spain {jiglesia, ledezma, masm}@inf.uc3m.es

  2. . Outline Motivation and Introduction Sequence classification Our approach Library Creation Classification Target Environment Description Experiments and Results Conclusions and Future Works 1

  3. Outline Motivation and Introduction Sequence classification Our approach Library Creation Classification Target Environment Description Experiments and Results Conclusions and Future Works 1

  4. . Opponent Modeling Pattern Recognition Pattern Detection Base Estrategy Pattern RoboCup Soccer Server No-Pattern LogFile Pattern LogFile Environment Information Advices to Players On-Line Comparing Method Recognized Patterns Pattern Recognized On-Line Detection Off-Line Analysis Motivation Opponent behavior Modelling / Classification (Environment: soccer simulation domain) 2

  5. . Introduction Behavior Classification Behavior as sequence of elements Sequence Classification • Sequence: • “set of elements ordered so that they can be labelled with the positive integers” (Merriam-Webster Dictionary) 3

  6. Outline Motivation and Introduction Sequence classification Our approach Library Creation Classification Target Environment Description Experiments & Results Conclusions and Future Works 4

  7. Sequence classification Given: Classes = {c1, c2, … cn} Sequence E = {e1, e2, … en} Determine: Which class ciЄC does the sequence E belong to. 5

  8. . Outline Motivation and Introduction Sequence classification Our approach Library Creation Classification Target Environment Description Experiments & Results Conclusions and Future Works 6

  9. . Pattern to classify Pattern 3 Pattern 2 Pattern 1 Our approach pwd fs fg … finger more ls ... vi man ls … vi more ls … … SEQUENCE CLASS Classification Result Sequence 1 Class 1 Sequence 2 Class 2 Sequence n Class n Sequence to classify Compare_Patterns On-Line Sequence Classification Compare_Patterns … … Compare_Patterns Pattern Library Library Creation Classification 7

  10. Outline Motivation and Introduction Sequence classification Our approach Library Creation Classification Target Environment Description Experiments & Results Conclusions and Future Works 8

  11. . Library Creation Trie (retrieval) data structure: Special search tree used for storing elements and its prefixes. Every node: represents an element stores useful information (times appeared,…) 9

  12. Library Creation - An example trie pwd vi pwd vi pwd ls Sequence to insert initially in the trie: {pwd  vi  pwd  vi  pwd  ls} Sequence 10

  13. Library Creation - An example trie pwd vi pwd vi pwd ls Sub-sequence length: 3 {pwd  vi  pwd  vi  pwd  ls} Sub-sequences to insert in the trie: {pwd  vi  pwd} and {vi  pwd  ls} Sequence to insert initially in the trie: {pwd  vi  pwd  vi  pwd  ls} Sequence 10

  14. Library Creation- An example trie Root Sub-sequences to insert in the trie: {pwd  vi  pwd} and {vi  pwd  ls} 11

  15. pwd [1] vi [1] pwd [1] Library Creation- An example trie Root Sub-sequences to insert in the trie: {pwd  vi  pwd} and {vi  pwd  ls} 11

  16. pwd [1] vi [1] pwd [1] vi [1] pwd [1] Library Creation- An example trie Root Sub-sequences to insert in the trie: {pwd vi  pwd} and {vi  pwd  ls} 11

  17. pwd [2] vi [1] pwd [1] vi [1] pwd [1] Library Creation- An example trie Root Sub-sequences to insert in the trie: {pwd vi pwd} and {vi  pwd  ls} 11

  18. pwd [2] vi [1] pwd [1] vi [2] pwd [2] ls [1] Library Creation - An example trie Root Sub-sequences to insert in the trie: {pwd vi pwd} and {vi  pwd  ls} 11

  19. pwd [3] vi [1] pwd [1] vi [2] pwd [2] ls [1] ls [1] Library Creation- An example trie Root Sub-sequences to insert in the trie: {pwd vi pwd} and {vi  pwd  ls} 11

  20. pwd [3] vi [1] pwd [1] vi [2] pwd [2] ls [1] ls [1] ls [1] Library Creation - An example trie Root Sub-sequences to insert in the trie: {pwd vi pwd} and {vi pwd  ls} 11

  21. pwd [3] vi [1] pwd [1] vi [2] pwd [2] ls [1] ls [1] ls [1] Library Creation - An example trie pwd vi pwd vi pwd ls Root {pwd vi pwd  vi pwd  ls} 11

  22. Library Creation - Evaluating Dependences Evaluate the relation/dependence between an element and its prefix Two approaches: Frequency-based method. Statistical dependence method. Our approach: Statistical Value used: Chi-square value. This value is stored in every node of the trie 12

  23. . Library Creation - Evaluating Dependences (Rowi Total x Columnj Total) Expected (Eij)= Grand Total (Oij - Eij ) 2 r k X2= ∑ ∑ Eij i=1 j=1 2 x 2 Contingency Table O11: How many times the current node/element is followed by its prefix. O12: How many times the current node/element is followed by a different prefix. O21:How many times a different prefix (of the same length) is followed by the same node. O22: How many times a different prefix (of the same length) is followed by a different node. 13

  24. . pwd [3] vi [1] [5.1] pwd [1] [4.3] vi [2] pwd [2] [3.5] ls [1] [4.3] ls [1] [4.3] ls [2] Library Creation - Evaluating Dependences Sequence Pattern Trie Root • A Sequence Pattern Trie is created for each class. 14

  25. Outline Motivation and Introduction Sequence classification Our approach Library Creation Classification Target Environment Description Experiments & Results Conclusions and Future Works 15

  26. . Pattern to classify Pattern 1 Pattern 3 Pattern 2 Classification pwd fs fg … finger more ls ... vi man ls … vi more ls … TestingTrie … Sequence 1 Class 1 Sequence 2 Class 2 Sequence n Class n Sequence to classify ONLINE SEQUENCE CLASS Compare_Patterns ClassTrie On-Line Sequence Classification Compare_Patterns … … Compare_Patterns Pattern Library Library Creation Classification 16

  27. . Classification – Comparing Process Class Trie Testing Trie Root Root … ls [2] pwd [3] vi [2] pwd [3] vi [2] vi [1] [7.1] pwd [2] [1.5] vi [1] [5.1] who [2] [3.5] pwd [1] [7.3] ls [1] [0.3] who [1] [4.3] • If the node (and its prefix) are in both Tries: • If ( abs(chi2TestingTrie – chi2ClassTrie) ≤ ThresholdValue ): • Similarity between both tries. • Result  [ElementTestingTrie, PrefixTestingTrie, Chi2TestingTrie] 17

  28. . Classification – Comparing Process Class Trie Testing Trie Root Root … ls [2] pwd [3] vi [2] pwd [3] vi [2] vi [1] [7.1] pwd [2] [1.5] vi [1] [5.1] who [2] [3.5] pwd [1] [7.3] ls [1] [0.3] who [1] [4.3] • If the node (and its prefix) are in both Tries: • If (abs(5.1 – 7.1) ≤ ThresholdValue): • Similarity between both tries. • Result  [vi , pwd, 5.1] 17

  29. . Classification – Comparing Process Class Trie Testing Trie Root Root … ls [2] pwd [3] vi [2] pwd [3] vi [2] vi [1] [7.1] pwd [2] [1.5] vi [1] [5.1] who [2] [3.5] pwd [1] [7.3] ls [1] [0.3] who [1] [4.3] • If the node (and its prefix) are only in the Testing Trie: • Differencebetween both tries. • Result  [ElementTestingTrie, PrefixTestingTrie, (Chi2TestingTrie * -1)] 17

  30. . Classification – Comparing Process Class Trie Testing Trie Root Root … ls [2] pwd [3] vi [2] pwd [3] vi [2] vi [1] [7.1] pwd [2] [1.5] vi [1] [5.1] who [2] [3.5] pwd [1] [7.3] ls [1] [0.3] who [1] [4.3] • If the node (and its prefix) are only in the Testing Trie: • Differencebetween both tries. • Result  [who, pwd  vi, (-4.3)] 17

  31. . Root … ls [2] pwd [3] vi [2] vi [1] [7.1] pwd [2] [1.5] pwd [1] [7.3] ls [1] [0.3] Classification – Comparing Process Class Trie Testing Trie Root pwd [3] vi [2] vi [1] [5.1] who [2] [3.5] who [1] [4.3] • If the node (and its prefix) are only in the Testing Trie: • Differencebetween both tries. • Result  [who, vi, (-3.5)] 17

  32. . Classification – Comparing Process Each comparison (ClassTrie, TestingTrie): A comparision value Result: [Element1, Prefix1, Value1] [Element2, Prefix2, Value2] [Element3, Prefix3, Value3] [Element4, Prefix4, Value4] … [Elementn, Prefixn, Valuen] Comparison Value 18

  33. . Classification – Comparing Process Result: [vi, pwd, + 5.1] [who, pwd vi, - 4.3] [who, pwd, - 3.5] - 2.7 Comparison Value 18

  34. . Pattern to classify Pattern 3 Pattern 2 Pattern 1 Classification pwd fs fg … finger more ls ... vi man ls … vi more ls … … Sequence 1 Class 1 Sequence 2 Class 2 Sequence n Class n Sequence to classify ONLINE SEQUENCE CLASS Compare_Patterns comparision value On-Line Sequence Classification Compare_Patterns … comparision value … Compare_Patterns Pattern Library comparision value Library Creation Classification 19

  35. . Pattern to classify Pattern 3 Pattern 2 Pattern 1 Classification pwd fs fg … finger more ls ... vi man ls … vi more ls … … Sequence 1 Class 1 Sequence 2 Class 2 Sequence n Class n Sequence to classify ONLINE SEQUENCE CLASS Compare_Patterns comparision value On-Line Sequence Classification Compare_Patterns … comparision value … Greatest Comparison Value Compare_Patterns Pattern Library comparision value Library Creation Classification 20

  36. Outline Motivation and Introduction Sequence classification Our approach Library Creation Classification Target Environment Description Experiments & Results Conclusions and Future Works 21

  37. Environment – UNIX command line sequences # Start session 1 cd ~/private/docs ls -laF | more cat foo.txt bar.txt zorch.txt > a.txt exit # End session 1 # Start session 2 cd ~/games/ xquake & fg … **SOF** cd <1> ls -laF | more cat <3> > <1> exit **EOF** … one "file name" argument Command histories of 9 UNIX computer usersat over 2 years UCI Repository of ML Database [Newman C., Hettich S., Merz, C. (1998)] three "file name" arguments one "file name" argument 22

  38. Outline Motivation and Introduction Sequence classification Our approach Library Creation Classification Target Environment Description Experiments & Results Conclusions and Future Works 23

  39. . Experiments – UNIX command line sequences 9 files (users) containing from about 10.000 to 60.000 commands each. 1. Extracting Patterns:A trie is created for each user  Pattern Library 24

  40. . Experiments – UNIX command line sequences 2. Classification Algorithm: Sequence to classify(sequences of very different sizes)   Classified in the class with the greatest value (result value). 9 files (users) containing from about 10.000 to 60.000 commands each. 1. Extracting Patterns:A trie is created for each user  Pattern Library 24

  41. . Experiments – UNIX command line sequences 2. Classification Algorithm: Sequence to classify(sequences of very different sizes)   Classified in the class with the greatest value (result value). 9 files (users) containing from about 10.000 to 60.000 commands each. 1. Extracting Patterns:A trie is created for each user  Pattern Library • 3. Evaluating the result: • Calculate: • difference between the greatest value and the second greatest value (+) • difference betweenthe real classification value and the greatest value (-) • (The greater the difference, the better the classification) 24

  42. . Results – UNIX command line sequences Unix Commands Classification – User 6 Classification Value average of 25 simulation results Length of the Sequence to classify 25

  43. . Results – UNIX command line sequences Minimum length for classifying a UNIX Computer User correctly Length of the Sequence to classify Unix Computer User (Class) 26

  44. Outline Motivation and Introduction Sequence classification Our approach Library Creation Classification Target Environment Description Experiments & Results Conclusions and Future Works 27

  45. Conclusions A threshold must be found Long time for creating the tries Results depend on the length of the sub-sequences used to create the trie 28

  46. Conclusions Effective method to classify UNIX users If a behavior can be represented by sequences, the proposed classification method can be used If a new class is added, only its trie must be created (the others are not modified) This method could be used for other tasks: sequence prediction, sequence clustering… RoboCup Coach 2006 Competition (succesfully results) 29

  47. Future Works Pattern Library  One Trie for all classes (users). Classification method without threshold value Analysis comparing our approach to others (HMMs) 30

  48. . Thank you! Sequence Classification Using Statistical Pattern Recognition José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis Computer Science Department Universidad Carlos III de Madrid Avda. de la Universidad, 30. 28911 Leganés, Spain {jiglesia, ledezma, masm}@inf.uc3m.es

  49. . Questions Sequence Classification Using Statistical Pattern Recognition José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis Computer Science Department Universidad Carlos III de Madrid Avda. de la Universidad, 30. 28911 Leganés, Spain {jiglesia, ledezma, masm}@inf.uc3m.es

  50. . Related to Questions... Sequence Classification Using Statistical Pattern Recognition José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis Computer Science Department Universidad Carlos III de Madrid Avda. de la Universidad, 30. 28911 Leganés, Spain { jiglesia, ledezma, masm}@inf.uc3m.es 29

More Related