1 / 35

UIUC People Finder

UIUC People Finder. Info. University of Illinois at Urbana Champaign Advanced Database Management Systems CS511 Instructor ChengXiang Zhai Sena Lee (senalee2@uiuc.edu) Heewon Jung (hjung20@uiuc.edu) Seung Pyo Lee (slee232@uiuc.edu) Ricardo Redder (rredder2@uiuc.edu)

liam
Télécharger la présentation

UIUC People Finder

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UIUC People Finder

  2. Info University of Illinois at Urbana Champaign Advanced Database Management Systems CS511 Instructor ChengXiang Zhai Sena Lee (senalee2@uiuc.edu) Heewon Jung (hjung20@uiuc.edu) Seung Pyo Lee (slee232@uiuc.edu) Ricardo Redder (rredder2@uiuc.edu) John Laipple (laipple@uiuc.edu)

  3. Agenda • Problem • Motivation • Common problem • Definition • Challenges • Solution • Implementation • Retrieval • Interpretation • Decision • Demo • Future work

  4. Motivation • For a given a person • The information about the person stored in relational databases is very limited.e.g.: name, age, address, etc. • There is a lot of information about he or she in the internet.e.g.: web-pages, papers, blogs, pictures • Use the best of both worlds

  5. Common problem ChengXiang Zhai Search

  6. Phonebook ChengXiang Zhai Search

  7. Google Images

  8. Search engines

  9. Entity retrieval • Given: • a set of entities E • a relational table where each tuple describes some aspects of an entity • a set of documents • A who is interested in an entity ei, pose a query (Q), and expects the tuple which represents ei, and the documents associated with ei.

  10. Our example • Query = keywords (usually name) • Table = Phonebook • Documents = Results from search engines

  11. Challenges • Semantic problem • It is different from finding a document that is mathematically similar to the query • It is subjective, the final target is in our mind, and it is not expressed by a function

  12. Solving • Use the information from the relational database to improve the documents search • The information from the phonebook is reliable, it is very accurate • The search engines are more generic, a simple search for a name might not be useful.

  13. Our example again ChengXiang Zhai Search

  14. Sequence • User type a query • User click the Search button • Application searches in the Phonebook • Application retrieve the information from the Phonebook • Application searches in the search engines, using the previous information

  15. Implementing the idea • How to retrieve the information and documents from web? • How to interpret the results? • How to decide whether a given document relates to the entity or not?

  16. How to retrieve the information and documents from web?

  17. Web-sites as functions • Search engines • User types the text • Click on the button • Read the results • Click on the results • UIUC People Finder • Application send the text to the search engine (1, 2) • Store the results (3, 4)

  18. Using exposed HTTP interface • Search engines • Uses GET or POST methods to receive information • Send the results in HTML • Application • Convert the query to a GET or POST method, and send it • Read the HTML

  19. Wrappers • Receive the text • Build the appropriate URL • Connect to the URL • Read the response Query text Wrapper HTML Example: http://www.google.com/search?hl=en&q=chengxiang+zhai&btnG=Google+Search

  20. How to interpret the results?

  21. HTML – good for humans

  22. HTML – hard to computers

  23. How do we interpret? • Visual language • Different styles  different meanings • Underline  Links • Useful information  Center

  24. Extraction from HTML • HTML is Tag based < > • Different styles • <font size =…> • <h2> • <bgcolor =…> • Links • <a href = …> • Center • <body>

  25. How to decide whether a given document relates to the entity or not?

  26. How do we decide? • Look for related information • Context • Names • Other information

  27. Application • Search for keywords found in the Phonebook. • Search for the name • Search for the department • Search for the address • etc. • Rank the pages • Name  +100 points • Departament  +50 points • Email  +250 points

  28. Problem • Performance • Problem: Search engines return thousands, or millions of results • Solution: Limit the number of retrieved web-pages • Problem: Even limiting the number of analyzed web-pages, many pages are accessed • Solution: Cache

  29. Final architecture www online Google Yahoo Phonebook Searchers Information Picture Documents cache Query text offline

  30. Demo

  31. Demo

  32. Demo

  33. Future work • Extend to other domains • MySpace, ACM, Papers, Blogs, etc… • Automatic link extraction • Better ranking function • User feedback • Owner feedback

  34. Questions

  35. Thank you

More Related