230 likes | 439 Vues
TANGO. Table Analysis for Generating Ontologies David W. Embley (BYU) & George Nagy (RPI) under NSF Awards 0414644 and 0414854 INFORMATION & KNOWLEDGE MANAGEMENT Dr. Maria Zemankova (a) Table Interpretation (b) Query by Table. TABLE. TANGO STEPS. Wang Notation Tool. INTERPRETED TABLE.
E N D
TANGO Table Analysis for Generating OntologiesDavid W. Embley (BYU) & George Nagy (RPI)under NSF Awards 0414644 and 0414854 INFORMATION & KNOWLEDGE MANAGEMENTDr. Maria Zemankova (a) Table Interpretation (b) Query by Table NSF TANGO BYU/RPI
TABLE TANGO STEPS Wang Notation Tool INTERPRETED TABLE Wang Notation & XML MINI ONTOLOGY Ontology Editor GROWING ONTOLOGY Annotated Semantic Web Pages Standard Ontology Language (OWL) Ontology Based Web Services Form Based Specification Extraction Ontologies Relational Databases Query By Table NSF TANGO BYU/RPI
TABLE This presentation Wang Notation Tool INTERPRETED TABLE Wang Notation & XML MINI ONTOLOGY Ontology Editor GROWING ONTOLOGY Annotated Semantic Web Pages Standard Ontology Language (OWL) Ontology Based Web Services Form Based Specification Extraction Ontologies Relational Databases Query By Table NSF TANGO BYU/RPI
(a) Table Interpretation Confirm or correct HTML web pages Extract table Matlab table XMLtable Wang Notation Construct Wang notation Confirm orcorrect Mini Ontology NSF TANGO BYU/RPI
Median Income tablehttp://www40.statcan.ca/l01/cst01/famil108a.htm?sdi=median%20income NSF TANGO BYU/RPI
Median Income table displayed from Canada Statistics displayed in TANGO Wang Notation Tool NSF TANGO BYU/RPI
Wang Notation • Abstract table is specified by ordered pair (C,) - (category,delta) • C is a finite set of labeled domains (header, sub headers of tables, etc) • represents each individual value within a table corresponding to C. NSF TANGO BYU/RPI
Categories • Two categories in previous table. • CATEGORY 1: (Region_Virtual,{(Canada,phi), (Newfoundland and Labrador,phi), (Prince Edward Island,phi), (Nova Scotia,phi), (New Brunswick,phi), (Quebec,phi), (Ontario,phi), (Manitoba,phi), (Saskatchewan,phi),(Alberta,phi),(British Columbia,phi),(Yukon Territory,phi), (Northwest Territories,phi), (Nunavut,phi)}) • CATEGORY 2: (Year_Virtual, {(2001,phi), (2002,phi), (2003,phi), (2004,phi), (2005,phi)}) NSF TANGO BYU/RPI
Content (leaf) cells • Delta Notation for two (of 15) rows: delta({Year_Virtual.2001,Region_Virtual.Canada})=53,500 delta({Year_Virtual.2002,Region_Virtual.Canada})=55,000 delta({Year_Virtual.2003,Region_Virtual.Canada})=56,000 delta({Year_Virtual.2004,Region_Virtual.Canada})=58,100 delta({Year_Virtual.2005,Region_Virtual.Canada})=60,600 delta({Year_Virtual.2001,Region_Virtual.Newfoundland and Labrador})=41,400 delta({Year_Virtual.2002,Region_Virtual.Newfoundland and Labrador})=43,200 delta({Year_Virtual.2003,Region_Virtual.Newfoundland and Labrador})=44,800 delta({Year_Virtual.2004,Region_Virtual.Newfoundland and Labrador})=46,100 delta({Year_Virtual.2005,Region_Virtual.Newfoundland and Labrador})=47,600 NSF TANGO BYU/RPI
XML Representation:Schema for (1) table (2) categories (3) data cells (4) augmentation <InterpretedTable xsi:noNamespaceSchemaLocation="G:\RPI\XML\02_TableInterface.XS.070803.xml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Table TableOID="Table2" Number="2" DocumentCitation="Wang's Thesis" Title="Wang table" Caption="Grades in 1991 and 1992"> <CategoryNodes> <CategoryNode CategoryNodeOID="C1" Label="Median Total Income"></CategoryNode> <CategoryNode CategoryNodeOID="C11" Label="Canada"></CategoryNode> <CategoryNode CategoryNodeOID="C12" Label="Newfoundland and Labrador"></CategoryNode> <CategoryNode CategoryNodeOID="C13" Label="Prince Edward Island"></CategoryNode> <CategoryNode CategoryNodeOID="C14" Label="Nova Scotia"></CategoryNode> <CategoryNode CategoryNodeOID="C15" Label="New Brunswick"></CategoryNode> <CategoryNode CategoryNodeOID="C16" Label="Quebec"></CategoryNode> <CategoryNode CategoryNodeOID="C17" Label="Ontario"></CategoryNode> <CategoryNode CategoryNodeOID="C18" Label="Manitoba"></CategoryNode> <CategoryNode CategoryNodeOID="C19" Label="Saskatchewan"></CategoryNode> <CategoryNode CategoryNodeOID="C110" Label="Alberta"></CategoryNode> <CategoryNode CategoryNodeOID="C111" Label="British Columbia"></CategoryNode> <CategoryNode CategoryNodeOID="C112" Label="Yukon Territory"></CategoryNode> <CategoryNode CategoryNodeOID="C113" Label="Northwest Territories"></CategoryNode> <CategoryNode CategoryNodeOID="C114" Label="Nunavut"></CategoryNode> <CategoryNode CategoryNodeOID="C2" Label="Year (Virtual)"></CategoryNode> <CategoryNode CategoryNodeOID="C21" Label="2001"></CategoryNode> <CategoryNode CategoryNodeOID="C22" Label="2002"></CategoryNode> <CategoryNode CategoryNodeOID="C23" Label="2003"></CategoryNode> <CategoryNode CategoryNodeOID="C24" Label="2004"></CategoryNode> <CategoryNode CategoryNodeOID="C25" Label="2005"></CategoryNode> </CategoryNodes> </Table> <CategoryParentNodes> <CategoryParentNode CategoryParentNodeOID="C1"> <CategoryNodes> … … XML file for this table has ~350 lines of Object Identifier tags NSF TANGO BYU/RPI
Verification tool: category headers for a selected content cell NSF TANGO BYU/RPI
Verification tool:content cells for a selected header NSF TANGO BYU/RPI
Verification tool:hierarchical category structure for a selected content cell NSF TANGO BYU/RPI
(b) Query by Table Income 2002 $4500 2003 $3300 2004 $1240 2005 $3400 Income 2002 2003 2004 2005 QBT InterpretQuery Table Database Ontology from many tables NSF TANGO BYU/RPI
Query Table Composed in MS-Excel by a person seeking information from an ontology compiled from many web tables NSF TANGO BYU/RPI
Display of automatically processed Query Table for human verification NSF TANGO BYU/RPI
Wang notation for Query Table NSF TANGO BYU/RPI
QBT identifies requested data NSF TANGO BYU/RPI
URLs of tables in the Example Database • Median Total Income : http://www40.statcan.ca/l01/cst01/famil108a.htm?sdi=median%20income* • Number of Induced Abortions: http://www40.statcan.ca/l01/cst01/health40a.htm?sdi=abortions • Number of Divorces: http://www40.statcan.ca/l01/cst01/famil02.htm?sdi=number%20divorces • Infant Mortality Rate: http://www40.statcan.ca/l01/cst01/health21a.htm?sdi=infant%20mortality%20rate* • Trips By Canadians in Canada: http://www40.statcan.ca/l01/cst01/arts26a.htm • Number of Homicides:http://www40.statcan.ca/l01/cst01/legal12a.htm?sdi=homicide • Population:http://www40.statcan.ca/l01/cst01/demo02a.htm?sdi=population • Number of Persons with Diabetes: http://www40.statcan.ca/l01/cst01/health54a.htm?sdi=diabetes • Number of Persons with Asthma: • http://www40.statcan.ca/l01/cst01/health50a.htm?sdi=asthma • University Degrees Awarded to Males: http://www40.statcan.ca/l01/cst01/educ51b.htm • University Degrees Awarded to Females: http://www40.statcan.ca/l01/cst01/educ51c.htm • Food services and drinking places (13 tables):http://www40.statcan.ca/l01/cst01/serv24j NSF TANGO BYU/RPI
Fields in the Example Database • IDENTIFIER • REGION • YEAR • NUMBER_OF_ABORTIONS • ABORTION_RATE • NUMBER_OF_DIVORCES • INFANT_MORTALITY_RATE • NUMBER_OF_TRIPS • MEDIAN_TOTAL_INCOME • POPULATION • NUMBER_OF_HOMICIDES • GENDER • INCIDENCE_OF_DIABETES • UNIVERSITY_DEGREES_AWARDED • INCIDENCE_OF_ASTHMA • RESTAURANT_OPERATING_REVENUE • RESTAURANT_OPERATING_EXPENSES • RESTAURANT_OPERATING_PROFIT_MARGIN • RESTAURANT_OPERATING_WAGES NSF TANGO BYU/RPI
QBT fills in requested data from Example Database NSF TANGO BYU/RPI
A current puzzle How can QBT tell that these two query tables represent the same request? NB: Although plausible, both of these tables exemplify poor layout. NSF TANGO BYU/RPI
Next steps • Complete the conversion of Wang/XML table descriptions to mini ontologies • Improve the interface for generating cumulative ontology from mini ontologies • Implement database generation from ontology • Embed logging routines for statistical evaluation of time/error trade-offs NSF TANGO BYU/RPI