260 likes | 377 Vues
This document discusses the challenges and solutions in Linked Open Data (LOD) quality assessment. It highlights the necessity of publishing quality datasets for effective interlinking in the Semantic Web. The existing methods for data quality evaluation are critiqued for their limitations, primarily focusing on syntactical rather than inherent quality. Our solution introduces a metrics-driven framework for inherent quality assessment of datasets prior to their integration into the LOD cloud, utilizing the Linked Open Data Quality Model (LODQM) with various metrics to ensure consistency and accessibility.
E N D
Metrics-Driven Approach for LOD Quality Assessment 2014-May-07
Outline • What is the problem? What have others done? What is our solution? Does it work?
What is the problem? • Linked Open Data (LOD): • Realizing Semantic Web by interlinking existing but dispersed data • Main components of LOD: • URIs to identify things • RDF to describe data • HTTP to access data
What is the problem? Datasets: 295 Triples:over 30,000,000,000 (30 B) Links:over 500,000,000 (500 M)
What is the problem? Inclusion Criteria for publishing and interlinking datasets into LOD cloud • resolvable http/https URIs • Presented in one of the standard formats of Semantic Web (RDF, RDFa, RDF/XML, Turtle, N-Triples) • Contains at least 1000 triples • Connected via at least 50 RDF links to the existing datasets of LOD • Accessible via RDF crawling, RDF dump, or SPARQL endpoint Is dataset ready to publish?
What is the problem? Idea of the LOD: Publishing first, improving later Results in: quality problems in the published datasets Missing link: Data Quality evaluation before release
What have others done? Data quality in the Context of LOD Validators Quality Assessment of Published data • General Validators • Parsing and Syntax • Accessibility / Dereferencability • Classifying quality problems of LOD • Using metadata for quality assessment • filtering poor quality data (WIQA) • Semantic Annotation using ontologies
What have others done? Limitations of related works: • Syntax validation, not quality evaluation • Not scalable • Not full automated • Evaluation after publishing
What is our solution? Proposing a set of metrics for Inherent quality assessment of datasets before interlinking to LOD cloud
2. Proposing Metrics Example: Goal: Assessment of the consistency of a dataset in the context of LOD Question: What is the degree of conflict in the context of data value? Metric: The number of functional properties with inconsistent values
3. Developing LODQM • LODQM: Linked Open Data Quality Model • 6 Quality dimensions • 32 Metrics
5. Empirical Evaluation 5.1 5.2 5.3 5.4 5.5 5.6 5.7
5. Empirical Evaluation √ √ √ • Result: • Three pairs of metrics are correlated: • {IFP, Im_DT} • {Im_DT, Sml_Cls} • {Inc_Prp_Vlu, IF} • The others are independent
5. Empirical Evaluation √ √ √ √
5. Empirical Evaluation √ √ √ √ √ √
5. Empirical Evaluation √ √ √ √ √ √ √ • Result: • Only one pair of quality dimensions is correlated: • {Interlinking, Syntactic accuracy} • The others are independent
6. Quality Prediction Result: 20 out of 32 metrics are selected • Using Neural Network Method: • MultiLayerPerceptron
Appreciative of your Attention and Comments