40 likes | 151 Vues
This paper explores the transformation of web pages into accessible semantic web data structures. It highlights the challenges of extracting information from web content, particularly product catalogs, and providing effective storage in databases or semantic web portfolios. The work proposes a novel data structure estimation method using machine learning alongside a naïve algorithm. This approach focuses on reducing complexity while maintaining relevant functional dependencies, ultimately improving the usability of extracted data in structured formats such as RDF and OWL.
E N D
Transforming Current Web Sources for Semantic Web Usage Martin Řimnáčrimnacm@cs.cas.cz SOFSEM 2006 Měřín Czech Republic January 21-27, 2006 Institute of Computer Science Czech Republic
Data Structure Estimation - Motivation Web pages contain a lot of information. Many of them present a view into database systems (product catalogues). The work aim: - Extract information from a web page - Store it back by an effective way into: - a database system - a semantic web portfolio (portal) To provide this: - A data structure has to be known.(Normal Database Form, RDF/OWL structure)
Y X y Z x z Data Structure Estimation a machine learning method using a global criterion Proposed Algorithm Naïve Algorithm Reduces complexity by - Consideration of a model skeleton - Functional dependency hierarchy - as simple description as possible - Polynomial complex operations - only an attribute decomposition is not The result includes: - trivial functional dependencies - functional dependencies carrying no information The result includes: - only functional dependency carrying information
Y X y Z x1 z x2 Interpretation of Model for SW RDF: OWL: ... <dse:fdix=”x1” y=”y” /> <dse:fdix=”x2” y=”y” /> <dse:fdiy=”y” z=“z” /> ... No support for complex attributes Idea based on a “touple activations”