1 / 6

Streamlining XML Data Cleaning with XClean: Methodologies and Applications

XClean is an innovative XML data cleaning system designed to address common data quality issues such as typos, inconsistent data formats, missing data, contradictions, and duplicates. This system employs a clear methodology with defined processing stages to enhance data quality. XClean's modular and readable architecture utilizes XQuery and offers a set of cleaning operators for efficient data management. Explore how XClean can eliminate dirty data and support your data integrity needs through its Java plugin and comprehensive functionalities demonstrated at CIDR 2007.

abby
Télécharger la présentation

Streamlining XML Data Cleaning with XClean: Methodologies and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XClean in Action Melanie Weis, HPI Potsdam, Germany Ioana Manolescu, INRIA Futurs, France CIDR 2007 05.11.2006 |

  2. What is XClean? • XClean is an XML data cleaning system. • Types of errors that require data cleaning: • Typos • Different data formats (e.g., date, abbreviations, language) • Missing data • Contradictory data • Duplicates Melanie Weis, Hasso Plattner Institut Potsdam, 18.01.2007

  3. Where do we find Duplicates? False Duplicate Melanie Weis, Hasso Plattner Institut Potsdam, 18.01.2007

  4. How do we get rid of dirty data? • Quick fix (get glasses) • Start over again next year(get new, expensive glasses) • Clear methodology(Clearly defined processing stages that combine) • Possibility to reuse (parts of) a solution No! Yes! Melanie Weis, Hasso Plattner Institut Potsdam, 18.01.2007

  5. Data Cleaning with XClean • XClean/PL • Declarative • Modular • Readable XQuery CleanXMLdata DirtyXMLdata XQuery Processor Set of clearly defined cleaning operators. Melanie Weis, Hasso Plattner Institut Potsdam, 18.01.2007

  6. Come see the demo! • XClean Java plugin • Supports • Writing XClean/PL • Compiling XClean/PL to XQuery • Executing XQuery to obtain clean data Melanie Weis, Hasso Plattner Institut Potsdam, 18.01.2007

More Related