OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage

# OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage

Télécharger la présentation

## OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Ben-Gurion University of The Negev Faculty of Engineering Sciences Department of Information Systems Engineering OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage Ma'ayanGafny, AsafShabtai, LiorRokach, Yuval Elovici

2. Definitions

3. Definitions

4. Definitions TA: TB: r(a) r(b) A = {a1,a2,a3,…,an} |A| = n |TA| = num of records in TA r(a) = a record from TA B={b1,b2,b3,…,bm} |B|=m |TB| = num of records in TB r(b) = a record from TB

5. Definitions TA: TB: r=(r(a) , r(b)) TA x TB :

6. Definitions TA x TB : TAB TAB

7. Definitions TA x TB : TAB TAB

8. Definitions

9. Definitions Ad⊆A– the subset of attributes of TA that were already selected as splitting attributes in the path from the root of the tree to node d. Ad4 = {a1,a2} Ad2 = {a1}

10. Running Examples

11. The data set

12. The data set – cont.

13. Coarse Grained Jaccard

14. Coarse Grained Jaccard – Splitting the root of the tree Three candidates for split: • Request location • Request day of week • Request part of day

15. CGJ– Splitting the root of the tree * W1 = 16/31 Score1=1/23 + reqLocation = Bonn reqLocation = Berlin reqLocation = Hamburg d d d Score(SplitreqLocation) = 0.0561 • reqLocation !=Hamburg • reqLocation != Berlin • reqLocation != Bonn Score2=2/23 W2 = 9/31 * + Score3=1/23 W3 = 6/31 *

16. CGJ– Splitting the root of the tree * W1 = 7/31 Score1=3/15 + * Score2=5/15 W2 = 5/31 dayOfWeek= Wednesday dayOfWeek = Friday dayOfWeek = Thursday dayOfWeek = Friday dayOfWeek= Monday d d d d d + • dayOfWeek!= Wednesday • dayOfWeek!= Thursday • dayOfWeek!= Friday • dayOfWeek!= Friday • dayOfWeek!= Monday Score(SplitdayOfWeek) = 0.260 * Score3=3/15 W3 = 3/31 + * Score4=5/15 W4 = 9/31 + * Score5=3/15 W5= 7/31

17. CGJ– Splitting the root of the tree Score1=4/23 partOfDay= Morning d partOfDay= Afternoon Score(SplitpartOfDay) = 0.173

18. Coarse Grained Jaccard – Splitting the root of the tree Three candidates for split: • Request location 0.0561 • Request day of week 0.260 • Request part of day 0.173 The split in the root

19. Fine Grained Jaccard

20. Fine Grained Jaccard – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin

21. Least Probable Intersection

22. LPI – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin

23. Req. Location = Berlin Req. Location != Berlin

24. LPI – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin

25. Maximum Likelihood Estimation

26. MLE – Splitting the root of the tree Cust. City Cust. Type Cust. City Cust. Type Cust. City Cust. Type p(Cust. City|Cust. Type) p(Cust. Type|Cust. City)