1 / 45

Summarization and Personal Information Management

Summarization and Personal Information Management. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. Announcements. Questions? Homework 2 due! Only 2 people turned it in so far… Homework 3 assigned! Due before Spring Break Plan for Today

jantonio
Télécharger la présentation

Summarization and Personal Information Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summarization and Personal Information Management Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

  2. Announcements • Questions? • Homework 2 due! • Only 2 people turned it in so far… • Homework 3 assigned! • Due before Spring Break • Plan for Today • Summarizing Web Searches • Pirolli chapter

  3. Homework 3 • Start with AnnotatedData.csv • Load it into SIDE • Train a model to predict Summary class • Create a summary to select the top 5 instances of the POS class and return that as the summary • Take a screen shot of the generated summary • Create a Pie chart of the Summary class • Take a screen shot • Turn in your two screen shots

  4. SIDE • Download from http://www.cs.cmu.edu/~cprose/SIDE.html • Download the code and the documentation • Note they are not in the same file! • Video lecture: • http://fatman-vm.isri.cmu.edu/CourseCast/Viewer/Default.aspx?id=add722f7-b566-479a-a564-495f56474747 • Email elijah@cmu.edu if you have trouble • Data file is at http://www.cs.cmu.edu/~cprose/AnnotatedData.csv

  5. SIDE

  6. Goals • SIDE is a development environment for building summarization systems • Design inspired by Teufel and Moens • Architecture allows you to add your own plugins: • Feature extractors • Classification algorithms • Ranking metrics • Presentation algorithms • Visualizations

  7. Important note!! • Documentation gives instructions for using the error analysis interface • You’re not required to use it, but it would be helpful!!

  8. Summarizing Web Search or Summarization in the Process of Web Search

  9. What is the problem we are trying to solve here?

  10. Summarizing Web Search

  11. Data Set Description • 2,000 users, each assigned 1 of 10 search tasks • We have a log of all of their click behavior • URLs for every page they looked at • A write-up summary of what they found • 20 “gold standard” users • 3 different people did each of the 10 tasks • Gold standard click behavior • Gold standard write-ups • Recently annotated pages as relevant or not relevant – ask Naman Gupta for this (nkgupta@andrew.cmu.edu)

  12. Information ForagingNote that you can find a more extensive journal article linked to Peter’s homepage! • Also a psychological approach • Examines both the landscape (the WWW) and a cognitive model of human foraging behavior • Lab studies with tasks that are like real information seeking tasks • Based on “critical incident analysis”

  13. “The results illustrate how the structure of the Web environment and the goals and heuristics of human information foragers mutually shape foraging behavior.” • But what accounts for the Web’s structure? • What are “decentralized social evolutionary processes?”

  14. Web Organization • A hierarchy of patches • Link structure mirrors directory structure as well as similarity structure • Probably also mirrors organizational structure • Also reflects the role in scientific discourse • Hubs have many outbound links (e.g., review articles) • Authorities have many inbound links (e.g., important, seminal work) • Related to “page rank” measure used by search engines

  15. Critical Incident Technique ** How do you think this approach might bias the results?

  16. What are people doing on the web?

  17. What do people want to know about?

  18. Surprise? • “What is surprising about these results is that the Web is mainly aimed at helping users find specific pieces of information (e.g., through search engines), and this suggests a latent demand for tools to support these broader sense-making activities.” ** Do you agree with this statement? Why or why not?

  19. Information Foraging Behavior

  20. Student Quote • Exposure to extremely similar information is something that may be indicative of convergence on an optimal solution for comparative searching, but is likely something that will create frustration or falsely signal the terminus of useful information availability on the topic.

  21. Is there such a thing as a successful search strategy in the abstract? • The point I'm trying to make is that search activities #1 and #2 are very different, and the strategies applied to one are likely to generate unsatisfactory outcomes in the other.

  22. Student Quote • The www is indeed structural in the sense as discussed in the chapter. It has 'information patches' and 'hubs' and 'authorities' which are exploited by search engines to refine their search results. This structure of the web can help in summarization tasks since foraging is one of the most important aspects of summarization. The more relevant results one can get, the more relevant will be the summarization.

  23. Student Quote • The correlation between the link structure and the topical similarity of the web pages have also been discussed which is quite fascinating.

  24. Student Quote • The results from information scent …I think thats an actual problem with user face in using search engines where a person just drills down into a website and is presented with information patches with low information scent. This problem can be averted by giving the user a glimpse of information patches (summary snippet) along with the main page result which is presented to the user. This would help alleviate the problem of going through all the low information bearing pages in the same website.

  25. Questions?

More Related