1 / 32

Summarizing documents based on cue-phrases and references

Summarizing documents based on cue-phrases and references. Goal: coherent focused summaries. What is a focused summary? - reveals on short what the document tells about the key entity (focus), within the context of the whole document Why focused summaries?

kass
Télécharger la présentation

Summarizing documents based on cue-phrases and references

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summarizing documents based on cue-phrases and references

  2. Goal: coherent focused summaries What is a focused summary? - reveals on short what the document tells about the key entity (focus), within the context of the whole document Why focused summaries? For example, when searching the web about an entity: • avoid browsing tremendous list of links to documents mentioning that entity (as given by a normal search engine) • read abstracts that mention the searched entity • if of minor importance in a document, the searched entity will not appear in a normal abstract

  3. The idea summary discourse structure cue-phrases anaphoric references VT

  4. The proposed method (1) Preparatory phases: • POS-tagging • Syntactic tagging done by an FDG parser • NP-tagging

  5. 1 2 4 3 5 6 7 8 9 The proposed method (2) Step 1: segmentation into elementary discourse units (edu-s) Maria went alone to the marketbecauseSimon had to stay at home with the baby.Simon is a good friend of mineandhe also helped me in a number of situations.For instancehe was very helpfulwhenI had the problem with the car.I think she has a lot of trust in him to let him alone with the child.You know how Maria is: she is not very hurried to give credit to anybody.

  6. 1 2 4 3 5 6 7 8 9 – for instance – – when – The proposed method (2) Step 2: building-up sentence level discourse trees (sdt-s) Maria went alone to the marketbecauseSimon had to stay at home with the baby.Simon is a good friend of mineandhe also helped me in a number of situations.For instancehe was very helpfulwhenI had the problem with the car.I think she has a lot of trust in him to let him alone with the child.You know how Maria is: she is not very hurried to give credit to anybody. 5 6

  7. The proposed method (2) Step 3: anaphora resolution Maria went alone to the marketbecauseSimon had to stay at home with the baby.Simon is a good friend of mineandhe also helped me in a number of situations.For instancehe was very helpfulwhenI had the problem with the car.I think she has a lot of trust in him to let him alone with the child.You know how Maria is: she is not very hurried to give credit to anybody.

  8. pdti pdti-1 sdti * foot node The proposed method (2) Step 4: integration of sdt-s in a global structure

  9. The proposed method (2) Step 5: generating the summary

  10. Step 1: Text segmentation method • Identification of finite verbs • Extraction of the FDG-sub-tree rooted in each finite verb • (If FDG tagging is correct, then every sub-tree will represent a clause) • Grouping clauses, if necessary, into discourse units Maria went alone to the marketbecauseSimon had to stay at home with the baby.Simon is a good friend of mineandhe also helped me in a number of situations.For instancehe was very helpfulwhenI had the problem with the car.I think she has a lot of trust in him to let him alone with the child.You know how Maria is: she is not very hurried to give credit to anybody.

  11. Cue words or phrases (markers) Inter-edu-s local dependencies Inner nodes labeled with markers Sentence level discourse trees Terminal nodes labeled with edu-s Step 2: Inference of the sdt-s (1)

  12. 1 4a 2 3 4b – so – because –, – – and – Step 2: Inference of the sdt-s (2) Cue-phrases usually suggest patterns of displacement of the connected arguments, nuclearity included Ambiguities: [1] so[2,3,4] because [2,3,4], [2,3,4] [1,2]and[3,4] Inferring the sdt = finding the proper arguments and nuclearities John is determined to pass the NLP examso,becausehe has missed many coursesandwas only vaguely implicated at the working sessions,he will have a hard time until summer. 1 so,because 2 and 3, 4

  13. markeri … … markerj Step 2: Inference of the sdt-s (3) Consistency constraints for elementary sdt-s The “nesting-arguments” rule If an edu xsub-tree(markeri)sub-tree(markerj) with ij, then one marker is in the other one’s sub-tree. This rule states that it is impossible to have two inner nodes of the tree, which cover crossing text spans on the terminal frontier

  14. text AR-engine AR-model1 AR-model2 AR-model3 anaphoric links Step 3: Anaphora resolutionThe AR-engine AR-engine is a general framework for anaphora resolution, able to accommodate different AR-models.

  15. text layer ……………………….………………………………………… REa REb REc REd REx PSx projection layer ……………………………………………… DE1 DEm DEj semantic layer ……………………………………… The three-layered anaphora resolution process Reference expressions (RE) Projected structures (PS) Discourse entities (DE)

  16. text layer ……………………….………………………………………… REa REb REc REd REx knowledge sources PSx projection layer ……………………………………………… primary attributes DE1 DEm DEj semantic layer ……………………………………… heuristics/rules domain of referential accessibility What is an AR-model?

  17. Types of anaphorae resolved • Common nouns referring proper nouns • Common nouns with different lemmas • Pronominal references

  18. for instance because and when : 7 1 2 3 4 5 6 8 9 b b c b b d b d b d d a b c a a a Step 4: Compiling the final discourse structure (1) A discourse structure tree must be derived by combining the sdt-s (a = Maria, b = Simon, c = the child, d = I, empty = any other REs)

  19. 2 because + => b c because and and 1 … 1 2 3 3 4 4 a b c b b b b d d b b d d a pdt1 = sdt1 sdt2 pdt2 Step 4: Compiling the final discourse structure (2)

  20. because 2 2 1 … because b b c c a and and 1 … 3 4 3 4 for instance a b b d b d b b d b d for instance when 5 6 when 5 6 b d b d Step 4: Compiling the final discourse structure (3) + => pdt2 pdt3 sdt3

  21. because because 2 2 1 … 1 … b b c c a and a … 3 4 4 for instance and b b d b b d d when 3 for instance 5 6 b b d when 7 7 b d 5 6 b b d d a a b b c c b d Step 4: Compiling the final discourse structure (4) + => pdt3 pdt4 sdt4

  22. because because 2 2 1 1 … … b b c c a a … … … 4 4 and b b d d and 3 for instance : : 3 for instance b b d when 7 7 8 8 9 9 b b d 5 6 when b b d d a a b b c c a a a a 5 6 b d b d Step 4: Compiling the final discourse structure (5) + => pdt4 pdt5 sdt5

  23. Step 5: Generating the summary (1)Veins Theory • Head expression of a node: the sequence of the most important units within the corresponding span of text: • the head of a terminal node: its label • the head of a non-terminal node: the concatenation of the head expressions of the nuclear children • the important units are projected up to the level where the corresponding span is seen as a satellite

  24. because H=1 … 1 … … H=2 H=7 2 and 7 : H=9 H=3 H=8 3 for instance 8 9 H=4 4 when H=6 H=5 5 6 Step 5: Generating the summary (2)Computing head expressions H=1 H=2,7 H=7 H=2 H=8,9 H=3,4 H=4 H=5

  25. Step 5: Generating the summary (3)Veins Vein expression of a node: the sequence of units that are required to understand the span of text covered by the node, in the context of the whole discourse to understand a piece of text in the context of the whole discourse one needs the significant units within the span together with other surrounding units

  26. V=v V=v H=h V=v V=seq(h, v) V=v Step 5: Generating the summary (4)Computing vein expressions Vein expressions are computed top-down starting with the root (vein expression of the root is its head expression)

  27. because … 1 … … 2 and 7 : 3 for instance 8 9 4 when 5 6 Step 5: Generating the summary (5)Vein expressions V=1 V=1,2,7 V=1 V=1,2,7 V=1,2,7 V=1,2,3,4,7 V=1,2,7 V=1,2,7 V=1,2,7,8,9 V=1,2,7,8,9 V=1,2,3,4,7 V=1,2,7,8,9 V=1,2,3,4,7 V=1,2,3,4,7 V=1,2,3,4,5,7 V=1,2,3,4,5,6,7 V=1,2,3,4,5,7

  28. The summaries • Maria is referred in edu-s 1,7,8,9 => • summary focused on Maria {1,2,7,8,9} Maria went alone to the marketbecauseSimon had to stay at home with the baby.I think she has a lot of trust in him to let him alone with the child.You know how Maria is: she is not very hurried to give credit to anybody. • Simon is referred in edu-s 2,3,4,5,7=> • summary focused on Simon {1,2,3,4,5,7} • The child is referred in edu-s 2,7 => • summary focused on the child {1,2,7} Maria went alone to the marketbecauseSimon had to stay at home with the baby.I think she has a lot of trust in him to let him alone with the child. • I is referred in edu-s 3,4,6,7 => • summary focused on I {1,2,3,4,5,6,7}

  29. Results Segmentation step: The results show that, if the input contained errors made by the FDG parser, the precision and recall of the segmentation method would be around 75%. If the input was corrected (that means if all words were properly related between them), the precision and recall would be of 100%. Anaphora resolution step: The best results proved 100% precision and values of recall in range 70% to 100%. These figures should be taken with care, because of the small dimension of the corpus we used.

  30. Conclusions • The method proposed is based on an earlier investigation which showed a correlation between references and vein structure (antecedents can be found along veins - 99,1% references obey this conjecture) • It is a deterministic method in the sense that only one tree is obtained • Degrees of non-determinism show up at: -building sdt-s due to different cue-phrase patterns - combining sdt-s into a final discourse tree

  31. Further work • Identify the overall trust in the method • Improve the method of building the global structure (scores for the types of antecedents) • Transform it by using CT into a beam-search type of processing • Derive more sophisticated sdt integration rules by learning • Represent only vein expressions, not the entire tree

  32. Thank you!

More Related