160 likes | 262 Vues
Explore the impact of structured databases for powerful data queries and discover the potential of the structured web. Learn how WebTables offer easy data analysis through auto-synonym discovery and structure autocomplete. Realize the promise of a database of everything on the web.
E N D
Querying The Web Database Michael J. Cafarella University of Michigan CS4HS August 18, 2010
Two kinds of databases • Structured databases (your bank) • Expensive, hard to use • Few sources of data • Powerful queries • “Who lives in Ypsilanti and has a balance between $800 and $1400?” • Unstructured databases (the Web) • Cheap, easy to use • Many sources of data • Very boring “topic” queries • britney spears, etc.
The Structured Web? • What if we had a structured-data version of everything on the Web? • “A Database of Everything” • “List all scientists from Belgium who were left-handed” • “Which heart surgeon in Michigan has the highest success rate?” • “List Miami hotels with hot tubs near a beach”
This page contains 16 distinct HTML tables, but only one structured database
WebTables Schema Statistics Applications • WebTables system automatically extracts dbs from web crawl • An extracted database is one table plus labeled columns • Estimate that our crawl of 14.1B raw HTML tables contains ~154M good structured dbs Raw crawled pages Raw HTML Tables Recovered Databases
Easy Data Analysis • Knowledge worker queries for“city population”[VLDB08, “WebTables: Exploring…”, Cafarella et al]
Conclusions • The Structured Web exists in raw form today, but tools largely ignore it • Information Extraction helps gather structural information from existing Web info • These techniques bring the promise of the Structured Web much closer