1 / 5

Managing Unstructured Data

Managing Unstructured Data. AnHai Doan University of Wisconsin-Madison. Unstructured Data. Appears in many forms emails, Web pages, memos, call center text record, etc. Is pervasive 80% of the world data, and is growing Managed by many players

aderes
Télécharger la présentation

Managing Unstructured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing Unstructured Data AnHai Doan University of Wisconsin-Madison

  2. Unstructured Data ... • Appears in many forms • emails, Web pages, memos, call center text record, etc. • Is pervasive • 80% of the world data, and is growing • Managed by many players • SIGIR/WWW/KDD/AAAI, Google/Yahoo/Microsoft/IBM We should work on it, or risk missing the boat! But what sets us apart from the above guys?

  3. Structure + System Focus! • Make it very easy to extract structures from raw data • in raw form  keyword search / bag analysis • many apps want to go beyond that, they want structure • we should encourage this  back to our play ground • not just DB + IR, but DB + IR + IE • Instead of working on isolated research problems, lets build end-to-end UDMS • should repeat what we did with System R / Ingres: system blueprint, followed by 20 years of rapid progress • unifies & accelerate our research efforts • keeps work grounded, make impact

  4. What Does this System Look Like? DB + IR + IE + II, in a best-effort, Web 2.0 fashion Joe Hellerstein Flexible modes of interaction Extraction + Integration Joe Six-Pack Mass collaboration Best-effort, pay-as-you-go, improving over time Scale up to huge data (by running over clusters)

  5. Broader Impacts • Great for many current applications • e-science, business, personal data, Web data, etc. • Great for many current research topics • IR, integration, PIM, data spaces • user interfaces, HCI, mashup • provenance, uncertainty • cluster management • query processing • monitoring, handling changes, pub/sub systems • Raises novel research issues • mass collab, best-effort, extraction, helping Joe Six-Pax • Helps define data mgt principles in broader contexts

More Related