1 / 24

Cross-lingual projection of Semantics

Cross-lingual projection of Semantics. Sebastian Pado IGK Colloquium Dec 16th 2004. Overview. Background: Role Semantics Semantic Projection Current and Future Work. Framework: Role semantics. Predicate-argument structure, Theta roles, who did what to whom. Agent. Recipient. Theme.

pembroke
Télécharger la présentation

Cross-lingual projection of Semantics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross-lingual projection of Semantics Sebastian Pado IGK Colloquium Dec 16th 2004

  2. Overview • Background: Role Semantics • Semantic Projection • Current and Future Work

  3. Framework: Role semantics Predicate-argument structure, Theta roles, who did what to whom Agent Recipient Theme Peter gives Mary a book NB. No treatment of discourse relations, modality, negation, etc.

  4. Flavours of role semantics • Top-down approach: common, intuitively defined roleset for all verbs • give: is Mary Recipient or Goal or Patient? • resemble: Subj vs. Obj • Bottom-up approach: Frame Semantics • Frames: Conceptual rep of a situationStatement, Giving, Transaction • Each frame is introduced by a targetsay, give, buy • Roles are frame-specific

  5. Frame Semantics • An Example Frame: Giving • Targets: give, hand out, receive • Roles: Donor, Recipient, Theme • The Berkeley FrameNet Project • English Frame Lexicon • ~ 200 Frames, ~ 2.500 words (V/N/Adj) • Typically 3-6 roles per frame • Corpus of ~ 60.000 annotated instances

  6. Frame Semantics: An Example

  7. What do Role Semantics buy us? • Surface-independent representation • Solves the paraphrase problem Peter gives the book to Mary Mary receives the book from Peter • Flexible basis for QA, Inference etc. • Aljoscha Burchardt’s PhD • Common cross-lingual semantic rep

  8. Semantic Role Assignment • Task: Automatic tagging of roles on free text • Important for NLP applications • Linking (syntax-semantics interface) • Statistical modelling (as classification) • Frame = semantically coherent targets • Targets show linking idiosyncrasies • Give:Sub - Donor, Dobj - Theme, To-PP/Iobj - Rec • Get: Sub - Rec, Dobj - Theme, From-PP - Donor • Needs lots of training data

  9. Moving to another language… • SALSA: Manual creation and use of a German corpus with semantic annotation • Basis: TIGER newspaper corpus, 1.5m words • English frames (mostly) work for German • Frame concept language-independent • But: Annotation slow and error-prone • Total effort: > 10 person years Can we use the English data for German?

  10. Overview • Background: Role Semantics • Semantic Projection • Current and Future Work

  11. Central idea: Semantic Projection • Find a large, parallel bilingual corpus • E/G part of EUROPARL (25m words) • Assign semantic roles on English side • Train automatic tagger on English data • Project semantics over to German • Step 1: Find semantic equivalences via word alignment • Step 2: Project frame • Step 3: Project roles Result: Large German annotated corpus

  12. Projection: Example Three assumptions to make this work Arriving Arriving Peter comes home Peter kommt nach Hause

  13. Assumption 1 Semantic representation is parallel Arriving Arriving Peter comes home Peter kommt nach Hause

  14. Semantic (im-)parallelism • Frame definition based on realisable roles • German and English typologically similar • Mostly, same frames evoked • Aspect is problematic • Proper differences We finish by 12 o’clock Activity_finish Wir sind um 12 Uhr fertig Activity_done_state • Same aspect, lexicalised differently I finish by saying Abschliessend sage ich

  15. Assumption 2 There is always parallel lexical material that is semantically equivalent Arriving Arriving Peter comes home Peter kommt nach Hause

  16. (Im)parallelism of lexical material • We only need semantic parallelism, only for targets and roles • Don’t care about discourse, modality, etc. • Don’t care about exact wording • Insights from translation science • Translation = Recreation of text based on content and target language norms • Frame structures ~ propositional content • Specific register • Specific domain (no cultural differences)

  17. Assumption 3 Word Alignment provides semantic equivalence Arriving Arriving Peter comes home Peter kommt nach Hause

  18. Word Alignment as Semantic Equivalence • Current Word Alignment models use co-occurrence to determine alignment • But co-occurrence != semantic equivalence decide entscheiden Entscheidung treffen insist bestehen darauf Problems: Phrasal verbs, Idioms, Support Verbs (Funktionsverbgefuege), Noise proper

  19. Overview • Background: Role Semantics • Semantic Projection • Current and Future Work

  20. Current Work (1) • Empirical assessment of assumptions • Manual annotation of parallel corpus sample • Independent annotation of German / English • Evaluation of semantic parallelism • Evaluation of lexical parallelism • Evaluation of automatic word alignment

  21. Current Work (2) • Token-wise word alignment too noisy • decide - treffen: Deciding? • Instead: Find reliable type equivalences • Statistics over complete corpus, filtering • Removal of German collocations • Result: German frame lexicon • Target x can evoke frames a,b,c • Project frame only if licensed by German lexicon

  22. Current Work (3) • Projection of roles: Find equivalences between constituents • Define pairwise similarities • Efficiently identify best match • Graph matching • Probabilistic model • Choice points: • Definition of similarities • Bijective correspondence, yes or no? • Implementation

  23. Future Work • Thorough Evaluation • Filtering • Projection will be noisy • Training a German semantic tagger • Evaluation wrt coverage, accuracy • Combination with manually annotated data (SALSA) • Using another language • English/French part of EUROPARL

  24. Conclusion • Automatic creation of semantically annotated data for a new language • Projection of annotation from known languageusing a word-aligned parallel corpus • Theory in place • Potential Problems: • Semantics may diverge • Lexical material may diverge • Word Alignment noisy • Empirical evaluation underway

More Related