1 / 30

Summarization and Personal Information Management

Summarization and Personal Information Management. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. Announcements. Questions? Homework 3 will be assigned in 1 week Until then: Finalize your group and your plans Collect readings and resources

jreddick
Télécharger la présentation

Summarization and Personal Information Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summarization and Personal Information Management Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

  2. Announcements • Questions? • Homework 3 will be assigned in 1 week • Until then: • Finalize your group and your plans • Collect readings and resources • Plan for Today • Hidden Markov Modeling • Jing, 2002 paper

  3. Getting into Technology Problem Human Behavior * Hidden Markov Models Solution Design Technology Component Technology Tool for understanding the process of generating summaries Today’s focus Problem?

  4. Hidden Markov Modeling

  5. Hidden Markov Modeling • Different from typical markov models because states not directly observable • From one sequence of observations, more than one sequence of states is possible • Viterbi search is used at decoding time to identify the most likely sequence of states

  6. Hidden Markov Modeling • Pattern • y1 y1 y1 y3 y4 • State Sequences • x1 x2 x1 x2 x1 • x1 x2 x1 x2 x3

  7. Resources in Wikipedia

  8. Jing, 2002

  9. Simplistic Summarization • Select a subset of sentences from the source document or documents • Present them in the same order in which they appeared in the source

  10. Less Simplistic Summarization • Select a subset of sentences from the source document or documents • Paraphrase those sentences • Present them in the same or different order in which they appeared in the source

  11. Advantages of Solving the Decomposition Problem • Gain insight into desirable generation techniques for summarization • They could have provided more analysis to this end • Automatically produce training data for extraction based summarization approaches

  12. Paraphrase Operations • Sentence reduction • Sentence combination • Syntactic transformation • Lexical paraphrasing • Generalization or specification

  13. Student Quote • They say "based on careful analysis of human-written summaries", which suggests that they sat in a room by themselves reading summaries and original texts, trying to figure out what human summarizers do. Why didn't they just go out and talk to some real people?

  14. If someone asked you how you generate a summary what would you tell them?

  15. Paraphrase Operations • Sentence reduction • Sentence combination • Syntactic transformation • Lexical paraphrasing • Generalization or specification • Can you think of another way of paraphrasing that is not mentioned here?

  16. Sentence Reduction • Non-essential phrases are removed • What counts as non-essential?

  17. Sentence Combination • Merge sentences, typically after reducing both • How you merge depends on overlap between sentences • When is it advantageous to merge?

  18. Syntactic Transformation • Changing the syntactic structure • Which syntactic transformations are allowed? • Do these two sentences mean the same thing?

  19. Lexical Paraphrasing • Replacing a phrase with something that means the same thing • “hits the nail on the head” versus “fit squarely into” • What counts as a lexical paraphrase?

  20. Generalization or Specification • Similar to lexical paraphrasing

  21. Problem Formulation • Identify the most likely position in the document (if any) of each summary word • Then apply the decomposition operations

  22. Example

  23. Evaluations • Alignment • How accurately can this approach align summary sentences with document sentences • Only tests the HMM • Decomposition • Humans judged whether decomposition was correct • Only tests decomp operators • Portability evaluation – test of generality

  24. Alignment • Used 10 documents paired with human written summaries • Other humans looked at the pairs and matched summary sentences to document sentences • Precision, Recall, and F-measure can be computed by comparing these extracts with the automatic ones • Error analysis: problems with creative rewordings or when irrelevant sentences contain summary words

  25. Alignment

  26. Decomposition • 50 summaries from telecommunications corpus • Ran decomposition program • 93.8% of sentences were correctly decomposed • Seems like a weak definition of correct decomposition • Correct pairing between sentences • Correctly identified where phrases came from

  27. Portability • Test on a new type of data • Performed well

  28. But what did we learn about how humans generate summaries? • Analyzed 300 human written summaries • 19% of summary sentences did not have a matching sentence • 42% matched a single sentence • Often along with sentence reduction • 36% were created by combining 2 or 3 sentences • 3% created by combining more than that

  29. What would be interesting next steps?

  30. Questions?

More Related