Summarization and Personal Information Management
400 likes | 419 Vues
Explore the field of summarization and learn how to effectively manage personal information overload. Topics include text mining, text categorization, topic modeling, and sentiment analysis.
Summarization and Personal Information Management
E N D
Presentation Transcript
Summarization and Personal Information Management Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
What’s New about This Course? • Information Science and HCI • Understand what problem are we trying to solve • Linguistics • Rhetorical style and strategies • Integrating Multiple Language Technologies • Summarization • Text mining • Text categorization • Topic modeling • Sentiment analysis
Overwhelmed with data… Innovation to the rescue! www.powerfulinformation.org
Who we are and how to find us… Carolyn Penstein Rosé Course Instructor GHC 4515 cprose@cs.cmu.edu Mahesh Joshi Teaching Assistant GHC 5517 maheshj@cs.cmu.edu
What is this course about?
Information Overload … more harmful than smoking marijuana?
Personal connection… Now 86,703!!!
Industry Relevance
Summarization and my research
Zooming in and Out of Text • Using statistical text compression models • Exploring the use of syntactic dependency features to increase robustness at severe levels of compression • In-depth analysis of how humans compress text
Analysis of What Happened in a Conversation • Conversational Roles • Positioning in Negotiations • Exchanging social support • Socialization processes • Knowledge integration processes
Topic Time Processing conversational data
Supporting Project Course Instructors (Rosé et al., 2007; Gweon et al., In Press) • Interviews with 9 project course instructors • 3 Important types of Assessment Categories • Group processes most important
Course Objectives • Explore summarization from a needs-focused perspective • Broaden the definition of summarization • Let’s revolutionize the field! • Explore a variety of analytical and technical approaches • Learn from your fellow students in addition to learning from your instructor • Gain practical experience while doing a cool project!
Course Requirements • Reading Assignments + Postings • 1 in-class paper presentation • 3 (short!) Homework Assignments • Summary Design • Rhetorical analysis • SIDE exercise • Term Project (Poster and 4-page Report) • Final Exam (critique)
Term Project • Multi-document summarization of scientific literature • Summarizing Web searches • Text Compression and Summarization for handhelds • Summarizing Social Interactions Grand Challenges
SIDE: Summarization Integrated Development Environment Annotate Data Define Summaries Train automatic annotators Visualize Annotated Data SIDE facilitates rapid prototyping of summarization systems
SIDE • Download: www.cs.cmu.edu/~cprose/SIDE.html • Documentation: www.cs.cmu.edu/~emayfiel/SIDE-documentation.pdf • If you need help: elijah@cmu.edu • We’re adding support for topic modeling and text compression
Data set from Rajiv Gandhi University for Knowledge Technologies • Student population: The highest achieving kid in each rural village • All computer based instruction: Every kid is given a laptop • All English medium instruction • 2,000 students • Mostly grew up with Telugu medium instruction • 10 different search tasks
Investigating how low English literacy affects information seeking • Students were given a search task in English • We collected logs of their search behavior and the result of their search • Analysis • We examined the results of their search • We modeled their search strategy • We looked for connections between these two things Search Task: Imagine that you have uncle in Pittsburgh who recently went to a dentist and was diagnosed with an abscess in his tooth. He had to undergo a painful treatment for the infection. You have to search for the necessary information on the Internet, in order to prevent your uncle from having a recurrence of the abscess or any other tooth disease in general.
Example • Student didn’t understand “uncle” • Searches for uncle and finds page about “calling uncle” • Concludes that uncle means “child” • Find page about infant tooth problems • Thinks she has found the correct information Search Task: Imagine that you have uncle in Pittsburgh who recently went to a dentist and was diagnosed with an abscess in his tooth. He had to undergo a painful treatment for the infection. You have to search for the necessary information on the Internet, in order to prevent your uncle from having a recurrence of the abscess or any other tooth disease in general.
Problems we can address… • Evidence that students ignore portions of tasks that they don’t understand • Students frequently found information about abscesses but not prevention • Trouble with query formation • MY UNCLE HAS A TOOTH PROBLEM WT CAN I DO FOR HIM • Evidence that students don’t know how to “recover” from an unsuccessful attempt • Repeated queries • Queries with minimal modifications
Grading • Class participation (10%) • Homework Assignments (10% each) • Class paper presentation (10%) • Term project (40%) • Final exam (10%)
For next time! • On Drupal: Read one or the other and post to discussion board • Kim, K., Lustria, M., & Burke, D. (2007). Predictors of cancer information overload: findings from a national survey, Information Research, Vol 12, No 4. • Janssen, R. & de Poot, H. (2006). Information overload: Why some people seem to suffer more than others, Proceedings of NordiCHI.