Download
occt a one class clustering tree for implementing one to many data linkage n.
Skip this Video
Loading SlideShow in 5 Seconds..
OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage PowerPoint Presentation
Download Presentation
OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage

OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage

183 Vues Download Presentation
Télécharger la présentation

OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Ben-Gurion University of The Negev Faculty of Engineering Sciences Department of Information Systems Engineering OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage Ma'ayanGafny, AsafShabtai, LiorRokach, Yuval Elovici

  2. Definitions

  3. Definitions

  4. Definitions TA: TB: r(a) r(b) A = {a1,a2,a3,…,an} |A| = n |TA| = num of records in TA r(a) = a record from TA B={b1,b2,b3,…,bm} |B|=m |TB| = num of records in TB r(b) = a record from TB

  5. Definitions TA: TB: r=(r(a) , r(b)) TA x TB :

  6. Definitions TA x TB : TAB TAB

  7. Definitions TA x TB : TAB TAB

  8. Definitions

  9. Definitions Ad⊆A– the subset of attributes of TA that were already selected as splitting attributes in the path from the root of the tree to node d. Ad4 = {a1,a2} Ad2 = {a1}

  10. Running Examples

  11. The data set

  12. The data set – cont.

  13. Coarse Grained Jaccard

  14. Coarse Grained Jaccard – Splitting the root of the tree Three candidates for split: • Request location • Request day of week • Request part of day

  15. CGJ– Splitting the root of the tree * W1 = 16/31 Score1=1/23 + reqLocation = Bonn reqLocation = Berlin reqLocation = Hamburg d d d Score(SplitreqLocation) = 0.0561 • reqLocation !=Hamburg • reqLocation != Berlin • reqLocation != Bonn Score2=2/23 W2 = 9/31 * + Score3=1/23 W3 = 6/31 *

  16. CGJ– Splitting the root of the tree * W1 = 7/31 Score1=3/15 + * Score2=5/15 W2 = 5/31 dayOfWeek= Wednesday dayOfWeek = Friday dayOfWeek = Thursday dayOfWeek = Friday dayOfWeek= Monday d d d d d + • dayOfWeek!= Wednesday • dayOfWeek!= Thursday • dayOfWeek!= Friday • dayOfWeek!= Friday • dayOfWeek!= Monday Score(SplitdayOfWeek) = 0.260 * Score3=3/15 W3 = 3/31 + * Score4=5/15 W4 = 9/31 + * Score5=3/15 W5= 7/31

  17. CGJ– Splitting the root of the tree Score1=4/23 partOfDay= Morning d partOfDay= Afternoon Score(SplitpartOfDay) = 0.173

  18. Coarse Grained Jaccard – Splitting the root of the tree Three candidates for split: • Request location 0.0561 • Request day of week 0.260 • Request part of day 0.173 The split in the root

  19. Fine Grained Jaccard

  20. Fine Grained Jaccard – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin

  21. Least Probable Intersection

  22. LPI – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin

  23. Req. Location = Berlin Req. Location != Berlin

  24. LPI – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin

  25. Maximum Likelihood Estimation

  26. MLE – Splitting the root of the tree Cust. City Cust. Type Cust. City Cust. Type Cust. City Cust. Type p(Cust. City|Cust. Type) p(Cust. Type|Cust. City)