1 / 26

OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage

This study presents a novel approach for one-to-many data linkage through a One-Class Clustering Tree (OCCT). We define key concepts and framework for the methodology, emphasizing splitting attributes using Coarse-Grained and Fine-Grained Jaccard measures. By analyzing request location, day of the week, and part of the day, we demonstrate how to optimally split data sets to improve linkage accuracy. Our goal is to enhance data integration processes in various engineering applications, offering robust solutions for complex data landscapes.

dessa
Télécharger la présentation

OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ben-Gurion University of The Negev Faculty of Engineering Sciences Department of Information Systems Engineering OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage Ma'ayanGafny, AsafShabtai, LiorRokach, Yuval Elovici

  2. Definitions

  3. Definitions

  4. Definitions TA: TB: r(a) r(b) A = {a1,a2,a3,…,an} |A| = n |TA| = num of records in TA r(a) = a record from TA B={b1,b2,b3,…,bm} |B|=m |TB| = num of records in TB r(b) = a record from TB

  5. Definitions TA: TB: r=(r(a) , r(b)) TA x TB :

  6. Definitions TA x TB : TAB TAB

  7. Definitions TA x TB : TAB TAB

  8. Definitions

  9. Definitions Ad⊆A– the subset of attributes of TA that were already selected as splitting attributes in the path from the root of the tree to node d. Ad4 = {a1,a2} Ad2 = {a1}

  10. Running Examples

  11. The data set

  12. The data set – cont.

  13. Coarse Grained Jaccard

  14. Coarse Grained Jaccard – Splitting the root of the tree Three candidates for split: • Request location • Request day of week • Request part of day

  15. CGJ– Splitting the root of the tree * W1 = 16/31 Score1=1/23 + reqLocation = Bonn reqLocation = Berlin reqLocation = Hamburg d d d Score(SplitreqLocation) = 0.0561 • reqLocation !=Hamburg • reqLocation != Berlin • reqLocation != Bonn Score2=2/23 W2 = 9/31 * + Score3=1/23 W3 = 6/31 *

  16. CGJ– Splitting the root of the tree * W1 = 7/31 Score1=3/15 + * Score2=5/15 W2 = 5/31 dayOfWeek= Wednesday dayOfWeek = Friday dayOfWeek = Thursday dayOfWeek = Friday dayOfWeek= Monday d d d d d + • dayOfWeek!= Wednesday • dayOfWeek!= Thursday • dayOfWeek!= Friday • dayOfWeek!= Friday • dayOfWeek!= Monday Score(SplitdayOfWeek) = 0.260 * Score3=3/15 W3 = 3/31 + * Score4=5/15 W4 = 9/31 + * Score5=3/15 W5= 7/31

  17. CGJ– Splitting the root of the tree Score1=4/23 partOfDay= Morning d partOfDay= Afternoon Score(SplitpartOfDay) = 0.173

  18. Coarse Grained Jaccard – Splitting the root of the tree Three candidates for split: • Request location 0.0561 • Request day of week 0.260 • Request part of day 0.173 The split in the root

  19. Fine Grained Jaccard

  20. Fine Grained Jaccard – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin

  21. Least Probable Intersections

  22. LPI – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin

  23. Req. Location = Berlin Req. Location != Berlin

  24. LPI – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin

  25. Maximum Likelihood Estimation

  26. MLE – Splitting the root of the tree Cust. City Cust. Type Cust. City Cust. Type Cust. City Cust. Type p(Cust. City|Cust. Type) p(Cust. Type|Cust. City)

More Related