1 / 35

Data Fusion

Data Fusion. Jens Bleiholder and Felix Naumann Presented by Aaron Stewart. Data Integration. Schema mapping Duplicate detection Data fusion. Complete / Concise. Like recall/precision Complete: coverage of real-world objects Concise: avoid duplicates. Conflicts. Schematic conflicts

marenda
Télécharger la présentation

Data Fusion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Fusion Jens Bleiholder and Felix Naumann Presented by Aaron Stewart

  2. Data Integration • Schema mapping • Duplicate detection • Data fusion

  3. Complete / Concise • Like recall/precision • Complete: coverage of real-world objects • Concise: avoid duplicates

  4. Conflicts • Schematic conflicts • Identity conflicts • Data conflicts • Uncertainty • Contradiction

  5. Data Fusion Strategies

  6. Uniqueness • Uniqueness-preserving • Uniqueness-enforcing

  7. Value preservation • Value-preserving • Non-value-preserving • Object-preserving

  8. Motivating Example

  9. Joins • Equi-join • Natural join • Full outer join • Key join • Left join • Right join

  10. Equi-join SELECT U1.Name, U2.Name, U1.Age, U2.Age, U1.Status, U2.Status, U1.Address, U2.Address, U1.Field, U2.Field, U1.Library, U2.Phone FROM U1 JOIN U2 ON U1.Name=U2.Name

  11. Equi-join Result SELECT U1.Name, U2.Name, U1.Age, U2.Age, U1.Status, U2.Status, U1.Address, U2.Address, U1.Field, U2.Field, U1.Library, U2.Phone FROM U1 JOIN U2 ON U1.Name=U2.Name

  12. Natural Join SELECT U1.Name, U1.Age, U1.Status, U1.Address, U1.Field, U1.Library, U2.Phone FROM U1 JOIN U2 ON U1.Name=U2.Name AND U1.Age=U2.Age AND U1.Status=U2.Status AND U1.Address=U2.Address AND U1.Field=U2.Field

  13. Natural Join Result SELECT U1.Name, U1.Age, U1.Status, U1.Address, U1.Field, U1.Library, U2.Phone FROM U1 JOIN U2 ON U1.Name=U2.Name AND U1.Age=U2.Age AND U1.Status=U2.Status AND U1.Address=U2.Address AND U1.Field=U2.Field

  14. Full Outer Join SELECT U1.Name, U2.Name, U1.Age, U2.Age, U1.Status, U2.Status, U1.Address, U2.Address, U1.Field, U2.Field, U1.Library, U2.Phone FROM U1 FULL OUTER JOIN U2 ON U1.Name=U2.Name

  15. Full Outer Join Result SELECT U1.Name, U2.Name, U1.Age, U2.Age, U1.Status, U2.Status, U1.Address, U2.Address, U1.Field, U2.Field, U1.Library, U2.Phone FROM U1 FULL OUTER JOIN U2 ON U1.Name=U2.Name

  16. Full Disjunction • Generalizes outer join to more than two tables

  17. Information Systems for Data Fusion • Conflict resolution • Conflict avoidance • Conflict ignorance • No conflict handling

  18. Architecture • Database management system (DBMS) • Multidatabase management system (MDBMS) • Mediator-wrapper (MW) • Multi-agent system (MAS) • Stand-alone application (APP)

  19. Integration Model • Global-as-view (GaV) • Local-as-view (LaV) • Global-Local-as-view (GLaV)

  20. 1. Conflict-Resolving Systems • Multibase • Hermes • Fusionplex • HumMer • Ajax

  21. Multibase • C. 1983 • Solution: • Outer join • Aggregation (min, max, sum, choose, etc.)

  22. Hermes • HEterogeneous Reasoning and MEdiator System • C. 1996 • Mediator-specified conflict resolution • Created by an expert

  23. Fusionplex • Multiplex, Fusionplex, Autoplex • Classifies quality of data • User-prioritized feature “importance” • Able to incorporate new/unknown databases

  24. HumMer • Humboldt-Merger • C. 2006 • Handles conflicts in schema, identity, data • Clusters duplicates • User-defined aggregation functions

  25. Ajax • Format and unit conversion • User-defined cleansing process • Compiled to Java

  26. 2. Conflict-Avoiding Systems • TSIMMIS • SIMS and Ariadne • Infomix • HIPPO • ConQuer • Rainbow

  27. Conflict-Ignoring Systems • Pegasus • Nimble • Carnot • InfoSleuth • Potter’s Wheel

  28. Other Systems • Research Systems • Trio • Information Manifold • Garlic • Disco (Distributed Information Search Component) • Papyrus, Nomenclature • DIOM, KOMET, Infomaster, Occam, SIMS, Internet Softbot • Singapore, Magic, Observer • Lore, Tukwila • SIRIUS-DELTA, DDTS, Mermaid, UNIBASE • MRDSM, OMNIBASE, CALIDA, DQS

  29. Other Systems • Commercial • IBM, Oracle, Microsoft, others • IBM Information Server (IIS) • Microsoft SQL Server Integration Services (SSIS)

  30. Other Systems • Peer Data Management Systems • Orchestra • Hyper

  31. Analysis • Weaknesses • Difficult to show utility of a tool on paper • Strengths • Covered a lot of theory • Covered a lot of systems

More Related