110 likes | 125 Vues
Pedro DeRose University of Wisconsin-Madison. The DBLife Prototype System in The Cimple Project on Community Information Management. Community Information Management. Numerous Web communities database researchers, movie fans, legal professionals, bioinformatics, etc.
E N D
Pedro DeRose University of Wisconsin-Madison The DBLife Prototype System inThe Cimple Project onCommunity Information Management
Community Information Management Numerous Web communities database researchers, movie fans, legal professionals, bioinformatics, etc. enterprise intranets, tech support groups Each community = many data sources + many members Members often want to integrate data, query, and discover community information any interesting connection between researchers X and Y? find all citations of this paper in the past one week on the Web what is new in the past 24 hours in the database community? what are current hot topics? who has moved where?
Cimple Project @ Wisconsin/Yahoo! Research Structured community portal, driven by extraction + integration + mass collaboration Keyword search SQL querying Question answering Browse Mining Alert/Monitor News summary Jim Gray Jim Gray Researcher Homepages Conference Pages Group Pages DBworld mailing list DBLP Web pages * * * * give-talk * * * SIGMOD-04 SIGMOD-04 * * * * * * * * Text documents Personalize system, provide feedback
The Research Team • Core Members • Pedro DeRose • Warren Shen • AnHai Doan • Raghu Ramakrishnan • Supporting Members • Fei Chen • Yoonkyong Lee • Doug Burdick • Mayssam Sayyadian • Xiaoyong Chai • Ting Chen
Prototype System: DBLife Integrate data of the DB research community Live at dblife-labs.cs.wisc.edu 1,075 data sources 463 researcher homepages 103 department homepages 54 conference homepages 99 faculty hubs 56 database group pages 203 project homepages 85 colloquia 11 event pages DBWorld DBLP Crawled daily, 11000+ pages = 160+ MB / day
Data Integration Raghu Ramakrishnan co-authors = A. Doan, Divesh Srivastava, ...
Resulting ER Graph “Proactive Re-optimization write write write Pedro Bizarro Shivnath Babu coauthor coauthor advise David DeWitt advise coauthor Jennifer Widom PC-member PC-Chair SIGMOD 2005
Summary • Community Information Management • increasingly crucial problem • The Cimple project • sample challenges: information extraction data integration mass collaboration • extends the footprints of DB technologies to Web data • develops new DB technologies • DBLife prototype • more at dblife.cs.wisc.edu, latest features (e.g., wiki) at dblife-labs.cs.wisc.edu • research/education tool, community service,benchmark, challenge problem