580 likes | 820 Vues
Ontology-driven Predicate-Argument Structure Analysis for Event Annotation. Paul Buitelaar and Anette Frank Language Technology Lab DFKI GmbH, Saarbrücken. Overview. Motivation Background Application: Knowledge-based QA in the SmartWeb Project Resources: Domain Ontology and Data Set
 
                
                E N D
Ontology-driven Predicate-Argument Structure Analysis for Event Annotation Paul Buitelaar and Anette Frank Language Technology Lab DFKI GmbH, Saarbrücken
Overview • Motivation • Background • Application: Knowledge-based QA in the SmartWeb Project • Resources: Domain Ontology and Data Set • Approach: Ontology-based Information Extraction (OBIE) • Text to Knowledge Mapping • LingInfo: A Lexicon Model for Ontologies • Ontology-based IE: Use of LingInfo in a Type-driven IE System (SProUT) • Extension of LingInfo to Predicate Argument Structure (PAS) • LingInfoPAS: Mapping Argument Structure to Ontologies • Local Argument Structures: Extending Shallow IE with PA Structure • Automatic Acquisition of LingInfo Instances and Extraction Rules • Conclusions and Future Work
Motivation • Automating Semantic Web Annotation of Textual Data • Semantic Web vision only feasible with large-scale automatic (semantic) annotation Web
Motivation • Automating Semantic Web Annotation of Textual Data • Semantic Web vision only feasible with large-scale automatic (semantic) annotation Knowledge Markup Web
Motivation • Automating Semantic Web Annotation of Textual Data • Semantic Web vision only feasible with large-scale automatic (semantic) annotation Knowledge Markup Ontologies Web
Motivation • Automating Semantic Web Annotation of Textual Data • Semantic Web vision only feasible with large-scale automatic (semantic) annotation Knowledge Markup Ontologies
Motivation Semantic Web Services • Automating Semantic Web Annotation of Textual Data • Semantic Web vision only feasible with large-scale automatic (semantic) annotation Knowledge Markup Ontologies
Motivation Semantic Web Services • Automating Semantic Web Annotation of Textual Data • Semantic Web vision only feasible with large-scale automatic (semantic) annotation Knowledge Markup Ontologies Intelligent Man-Machine Interface
Motivation • Improved Event Extraction through Deeper Semantic Modeling • Aim: Understanding Complex Event Sequences Wie wurde das Führungstor für Ecuador vorbereitet? Das Führungstor für Ecuador durch Lara fiel nach einer Vorlage des technisch ausgezeichneten Nicer Reasco, der einen langen und zu ungenauen Pass des Argentiniers Carlos Tévez in den gegnerischen Strafraum abfangen konnte.
Motivation • Improved Event Extraction through Deeper Semantic Modeling • Aim: Understanding Complex Event Sequences How was Ecuador’s leading goal prepared? The goal by Lara giving Ecuador the lead was scored after a delivery from the skilled Nicer Reasco, who intercepted a long and inaccurate cross into the penalty area by the Argentine player Carlos Tévez.
Motivation • Improved Event Extraction through Deeper Semantic Modeling • Aim: Understanding Complex Event Sequences How was Ecuador’s leading goal prepared? • e1: Pass: [CommittedBy Tévez] The goal by Lara giving Ecuador the lead was scored after a delivery from the skilled Nicer Reasco, who intercepted [a long and inaccurate crosse1 into the penalty area [by the Argentine player Carlos Tévez]].
Motivation • Improved Event Extraction through Deeper Semantic Modeling • Aim: Understanding Complex Event Sequences How was Ecuador’s leading goal prepared? • e1: Pass: [CommittedBy Tévez] • e2: Intercept: [CommittedBy Reasco, CommittedOn Tévez] The goal by Lara giving Ecuador the lead was scored after a delivery from [the skilled Nicer Reasco, who interceptede2 [a long and inaccurate crosse1 into the penalty area [by the Argentine player Carlos Tévez]]].
Motivation • Improved Event Extraction through Deeper Semantic Modeling • Aim: Understanding Complex Event Sequences How was Ecuador’s leading goal prepared? • e1: Pass: [CommittedBy Tévez] • e2: Intercept: [CommittedBy Reasco, CommittedOn Tévez] • e3: Assist: [CommittedBy Reasco] The goal by Lara giving Ecuador the lead was scored after a deliverye3 from [the skilled Nicer Reasco, who intercepted a long and inaccurate cross into the penalty area by the Argentine player Carlos Tévez].
Motivation • Improved Event Extraction through Deeper Semantic Modeling • Aim: Understanding Complex Event Sequences How was Ecuador’s leading goal prepared? • e1: Pass: [CommittedBy Tévez] • e2: Intercept: [CommittedBy Reasco, CommittedOn Tévez] • e3: Assist: [CommittedBy Reasco] • e4: ScoreGoal: [CommittedBy Lara, Team: Ecuador] • e1 < e2 < e3 < e4 [The goale4 by Lara giving Ecuador the lead] was scored after a delivery from the skilled Nicer Reasco, who intercepted a long and inaccurate cross into the penalty area by the Argentine player Carlos Tévez.
Background Application Knowledge-based QA Resources Domain Ontology and Data Set Approach Ontology-based Information Extraction
Application: SmartWeb Project • SmartWeb Project http://www.smartweb-projekt.de/ • Large, German BMBF project around World-Cup 2006 • Intelligent, Mobile Information Services • Application scenarios for pedestrians (Deutsche Telekom), motorbike (BMW) and car (DaimlerChrysler) • Provide Information on Entities and Events, e.g. • WorldCup: match results, team players, scoring events, fouls, etc. • Open-Domain: touristic sights, navigation information, weather, etc. • Integrates • IR-based Question Answering • Accessing unstructured Web resources • Ontology-Based Question Answering • Accessing a structured knowledge base that is automatically generated from unstructured Web resources through Ontology-Based Information Extraction
Resources: Domain Ontology • SmartWeb Integrated Ontology (SWIntO) covers • Foundational (DOLCE) and general (SUMO) knowledge • Domain- and task-specific knowledge • Football (soccer) entities and events • Navigation, discourse, multimedia • Modeling of (Semantic) Web Services SmartDOLCE:Entity … … SmartSUMO:Attribute SmartSUMO:Proposition SmartSUMO:SocialRole … … SportEvent:FootballPlayer SportEvent:FootballOrganizationPerson … SportEvent:Goalkeeper SportEvent:FootballClubPresident … … …
Resources: Data Set Text KB Images with Captions Semi-Structured Data
Resources: Data Integration Integration of heterogeneous data (tables, match reports, images with captions) over so called ‚CrossRef files‘:
Approach: Semi-Structured Data semistruct#Uruguay_vs_Bolivien_29_Maerz_2000_19:30:sportevent#LeagueFootballMatch [ externalRepresentation@(de) ->> "Uruguay vs. Bolivien (29. Maerz 2000 19:30)"; dolce#"HAPPENS-AT" -> semistruct#"29. Maerz 2000 19:30_interval"; sportevent#heldIn -> semistruct#"Montevideo_Centenario_29_Maerz_2000_19_30_Stadium"; sportevent#team1Result -> 1; sportevent#team2Result -> 0; sportevent#attendance ->49811; sportevent#team1 -> semistruct#"Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Uruguay_MatchTeam"; sportevent#team2 -> semistruct#"Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Bolivien_MatchTeam"; (…) ] semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Bolivien_MatchTeam:sportevent#FootballMatchTeam [ externalRepresentation@(de) ->> "Bolivien"; sportevent#name -> "Bolivien"; sportevent#lineup -> semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Jose_FERNANDEZ_PFP"; sportevent#lineup -> semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Juan_PENA_PFP"; sportevent#lineup -> semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Marco_SANDY_PFP"; sportevent#lineup -> semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Vladimir_SORIA_PFP"; sportevent#lineup -> semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Luis_RIBEIRO_PFP"; sportevent#lineup -> semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Luis_CRISTALDO_PFP"; (...) ] semistruct#"Uruguay_vs_Bolovien_29_Maerz_2000_19 :30_Luis_CRISTALDO_PFP":sportevent#FieldMatchFootballPlayer [ externalRepresentation@(de) ->> "Luis CRISTALDO (8)"; sportevent#number -> 8; sportevent#impersonatedBy -> semistruct#"Luis_CRISTALDO" ]. semistruct#"Luis_CRISTALDO":dolce#"natural-person" [ externalRepresentation@(de) ->> "Luis CRISTALDO"; dolce#"HAS-DENOMINATION" -> semistruct#"Luis_CRISTALDO_NaturalPersonDenomination" ]. semistruct#"Luis_CRISTALDO_NaturalPersonDenomination":dolce#"natural-person-denomination" [ externalRepresentation@(de) ->> "Luis CRISTALDO"; dolce#LASTNAME -> "CRISTALDO"; dolce#FIRSTNAME -> "Luis ] Wrapping of HTML Tables to SWIntO-aligned XML XML2FLogic (semantic integration)
Approach: Image Captions (Text) semistruct#Uruguay_vs_Bolivien_29_Maerz_2000_19:30 [ sportevent#matchEvents -> soba#ID25 ]. soba#ID25:sportevent#Foul [ sportevent#commitedBy -> semistruct#Uruguay_vs_Bolivien_(…)_Luis_CRISTALDO_PFP ]. mediainst#ID67:media#Picture [ media#URL -> "http://fifaworldcup.yahoo.com/06/de/photos/124155.jpg"; media#shows -> ID25 ]. linguistic/semantic annotation (SProUT) XML2FLogic (semantic integration)
Approach: Text Data linguistic/semantic annotation with SProUT XML2FLogic (semantic integration) semistruct#Uruguay_vs_Bolivien_29_Maerz_2000_19:30 [ sportevent#matchEvents -> soba#ID11 ]. soba#ID11:sportevent#Ban [ sportevent#commitedBy -> semistruct#Uruguay_vs_Bolivivien_(…)_Luis_CRISTALDO_PFP ].
Text to Knowledge Mapping LingInfo: A Lexicon Model for OntologiesLingInfo for Ontology-based Information Extraction
Motivations for LingInfo • Pragmatic Motivations: Information Extraction from Text • Providing a lexicon for Ontology-based Information Extraction, i.e. most ontologies to date are missing lexical information • General Motivations: Semiotic Triangle • Ogden & Richards, 1923 • Based on structural linguistics studies (de Saussure, 1916) • Adopted in Knowledge Representation (e.g. Sowa, 1984)
LingInfo for OBIE - I • DFKI IE System SProUT (Drozdzynski et al., 2004) • Finite-state transduction with Typed Feature Structure Unification • Rich functionality with declarative semantics • Generic rules by exploitation of inheritance • Mapping the Ontology Class Hierarchy to a SProUT Type Hierarchy, e.g.: PlayerAction :< SportMatchAction. SingleFootballPlayerAction :< PlayerAction. FootballTeamAction :< PlayerAction. GoalKeeperAction :< SingleFootballPlayerAction. AnyPlayerAction :< SingleFootballPlayerAction. • Typehierarchy defined in TDL - Type Description Language (Krieger and Schäfer 1994)
LingInfo for OBIE - II • SWIntO properties are translated into TDL attributes, e.g. SingleFootballPlayerAction := swinto_out & [COMMITTEDBY FootballPlayer]. • Property inheritance is mirrored in the SProUT TDL hierarchy • Multilingual terms encoded as LingInfo instances in SWIntO are compiled to SProUT TDL lexical types: “Teamaktion” :< FootballTeamAction. “Spieleraktion” :< PlayerAction. “Torwartaktion” :< GoalkeeperAction. “Gesperrt” :< Banned. • SProUT extraction patterns can now be triggered by lexical types as defined in SWIntO, and define output structures that correspond to classes and properties of the SWIntO ontology
Extension of LingInfo to Predicate-Argument Structure Extending Shallow IE with Syntactic Argument Structure The LingInfo Model for Predicate-Argument Structure Acquisition of LingInfo Instances and Extraction Rules
PredArg Structure in Shallow IE Systems • Strengths of Shallow IE Systems Robust and precise event extraction using local context clues • Token Level: surface forms (für, durch, des) • Morphological Level: uninflected lemma forms, case, tense features, … Das Führungstor fürEcuadorTeamdurchLaraPlayer fiel nach einer Vorlage des The goal by Lara giving Ecuador the lead was scored after a delivery from the technisch ausgezeichneten Nicer ReascoPlayer, der einen langen und zu ungenauen skilled Nicer Reasco, who intercepted a long and inaccurate Pass des Argentiniers Carlos TévezPlayer in den gegnerischen Strafraum abfangen konnte. cross by the Argentine player Carlos Tévez into the penalty area.
PredArg Structure in Shallow IE Systems • Limits of Shallow IE SystemsRecognition of arguments in complex linguistic structures How was Ecuador’s leading goal prepared? Did Reasco defend successfully? • e1: Pass: [CommittedBy Tevez] • e2: Intercept: [CommittedBy Reasco, CommittedOn Tevez] • e3: Assist: [CommittedBy Reasco] • e4: ScoreGoal: [CommittedBy Lara, Team: Ecuador] • e1 < e2 < e3 < e4 Das Führungstor für EcuadorTeam durch LaraPlayer fiel nach einer Vorlage [ des technisch The goal by Lara giving Ecuador the lead was scored after a delivery from the ausgezeichneten Nicer ReascoPlayer [ [ der ] [einen langen und zu ungenauen Pass skilled Nicer Reasco, who intercepted a long and inaccurate cross [des Argentiniers Carlos TévezPlayer] [in den gegnerischen Strafraum]]] abfangene2konnte. by the Argentine player Carlos Tévez into the penalty area.
drückte: SUBJ NP Herrera OBJ PPER ihn MO PP per Kopf MO PP zu Mitspieler MO ADV herunter kam: MO ADV dort SUBJ NP Herrera MO PP an Ball Integrating Syntactic Argument Structure from Deep Parsing • Local Argument Structures vs. Complex Hierarchical Structure • Robustness and flexibility Dort kamHerrera an den Ball und drückteihn per Kopf zu seinem Mitspieler mit der Nr 10 herunter. There came Herrara to the ball and pushed it down with his head towards his co-player with no. 10
Hybrid Processing in HoG • Heart-of-Gold: System Architecture for Integrated, Hybrid Processing • Callmeier et al. 2004, Schäfer 2006 Heart-of-Gold Text Parsing Extraction of local argument structure OBIE: SProUT
A shallow hierarchy of syntactic objects syn arg head subj iobj noun verb ... obj pobj_mod act_subj pass_subj SProUT: Extended Type Hierarchy top token morph syn-arg CAT cat INFL infl .... STEM string CSTART string CEND string SURFACE string ... CSTART string CEND string HEAD syn CAT cat LB lb STEM string PASSIVE boolean CSTART string CEND string ARGS args < syn CAT cat LB gf HEAD string CSTART string CEND string .... syn CAT cat LB gf HEAD string CSTART string CEND string >
Extended Event Recognition Parser Input: Local Argument Structure • machte Endzustand perfekt: • Subj “Hektor Piti Altamirano” • Obj “den Endzustand” • Mod-PP “in der 77. Minute” • Mod-PP “nach Kopfballvorlage von Borgetti” • Mod-PP “mit einem unhaltbaren Schuß” • Kopfballvorlage: • Mod-PP “von Borgetti” s_playeraction : SPORTACTIONTYPE scoregoal SPORTACTIONDESC “machte Endzustand perfekt” COMMITTEDBY s_footballplayer : IMPERSONATEDBY ne-person : NAME Altamirano
Extended Event Recognition Parser Input: Local Argument Structure • machte Endzustand perfekt: • Subj “Hektor Piti Altamirano” • Obj “den Endzustand” • Mod-PP “in der 77. Minute” • Mod-PP “nach Kopfballvorlage von Borgetti” • Mod-PP “mit einem unhaltbaren Schuß” • Kopfballvorlage: • Mod-PP “vonBorgetti” s_playeraction : SPORTACTIONTYPE assist SPORTACTIONDESC “Vorlage” COMMITTEDBY s_footballplayer : IMPERSONATEDBY ne-person : NAME Borgetti
Event Recognition Rules • Manually Coded Rules • Mapping head and arguments to ontology concepts and roles • Mapping encodes LingInfo Instances for PredArg Structure scoregoal :> syn_args & [HEAD syn_verb & [SYN_STEM goalscore & #descr, SYN_CSTART #cs, SYN_CEND #ce], ARGS #args] -> s_playeraction & [SPORTACTIONDESCR #descr, SPORTACTIONTYPE scoregoal, COMMITTEDBY s_footballplayer & [IMPERSONATEDBY ne-person & [NECSTART #start, NECEND #end, SURFACE #lemma]], NECSTART #cs, NECEND #ce], where #act_subj = InList(act_subj, #args), #start = FeatVal("SYN_CSTART", #act_subj), #end = FeatVal("SYN_CEND", #act_subj), #lemma = FeatVal("SYN_STEM", #act_subj).
Event Recognition Rules • Manually Coded Rules • Mapping head and arguments to ontology concepts and roles • Mapping encodes LingInfo for PredArg Structure • Automation Techniques • Automatic generation of event recognition rules from LingInfo instances • Automatic acquisition of LingInfo instances using domain-specific external resources
Automated Acquisition of Predicate Argument Structure – Ontology Mappings Acquisition from Existing (Shallow) Annotations Generalisations using Linguistic and Ontological Structure Learning Mapping Rules from Semi-Structured Data
Induction from Annotated Texts • Aligning Existing (Shallow) SProUT Annotations with Syntactic Argument Structure Information • Extraction of head-to-concept and argument-to-property (role) mappings SProUT XML Output: Semantic Annotation Syntactic Argument Structure (Parser Input) Alignment via Character Positions • Map Info: • Head to Concept Mapping • Argument to Property Mapping • Generation of • LingInfoPAS Instances • OBIE Extraction Rules
Induction from Annotated Texts Paul Kakai erzielte das erste Tor für die Salomon-Inseln, doch schon bald sorgte der aufstrebende Star Veresa Toma für den Ausgleich, sodass sich beide Mannschaften zur Pause mit einem Unentschieden in die Kabinen begaben.
Generalisation • Using Linguistic and Ontological Information to induce Novel Argument-Role Mappings sportevent_class_Y: CommittedBy CommittedOn sportevent_class_X: CommittedBy scoregoal CommBy: player assist CommBy: CommOn: foul CommBy: player CommOn: player intercept CommBy: CommOn: pass CommBy: CommOn: Valency Lemma: behindern Frame: subj_obj Arg: act_subj Arg: obj Valency Lemma: verwandeln Frame: subj Arg: act_subj Valency Lemma: abfangen Frame: subj_obj Arg: act_subj Arg: obj Valency Lemma: abspielen Frame: subj_pobj Arg: act_subj Arg: pobj_auf
Exploiting Semi-Structured Data Cross-referenced Semi-Structured and Textual Data on Football Matches semistruct#Uruguay_vs_Bolivien_29_Maerz_2000_19:30:sportevent#LeagueFootballMatch [ externalRepresentation@(de) ->> "Uruguay vs. Bolivien (29. Maerz 2000 19:30)"; dolce#"HAPPENS-AT" -> semistruct#"29. Maerz 2000 19:30_interval"; sportevent#heldIn -> semistruct#"Montevideo_Centenario_29_Maerz_2000_19_30_Stadium"; sportevent#team1Result -> 1; sportevent#team2Result -> 0; sportevent#attendance ->49811; sportevent#team1 -> semistruct#"Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Uruguay_MatchTeam"; sportevent#team2 -> semistruct#"Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Bolivien_MatchTeam"; (…) ] semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Bolivien_MatchTeam:sportevent#FootballMatchTeam [ externalRepresentation@(de) ->> "Bolivien"; sportevent#name -> "Bolivien"; sportevent#lineup -> semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Jose_FERNANDEZ_PFP"; sportevent#lineup -> semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Juan_PENA_PFP"; sportevent#lineup -> semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Marco_SANDY_PFP"; sportevent#lineup -> semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Vladimir_SORIA_PFP"; sportevent#lineup -> semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Luis_RIBEIRO_PFP"; sportevent#lineup -> semistruct# Uruguay_vs_Bolivien_29_Maerz_2000_19:30_Luis_CRISTALDO_PFP"; (...) ] Wrapping of HTML Tables to SWIntO-aligned XML XML2FLogic (semantic integration)
Learning New Argument Mappings • Learning LingInfo PAS for „Certified Facts“ in KB from Identified Fact Mentions in Textual Data erhöhen :< ScoreGoal. [Landon Donovan] erhöhte [in der Schlussphase] [vor gut 25.000 Zuschauern] noch [auf 2:0], als derGegner nur noch mit zehn Mann spielte. semistruct#"Mexiko_vs_USA_Montag__17__Juni_2002_15_30_Landon_DONOVAN65_Score":sportevent#ScoreGoal [sportevent#committedBy -> semistruct#"Mexiko_vs_USA_Montag__17__Juni_2002_15_30_Landon_DONOVAN_PFP"; dolce#"HAPPENS-AT" ->semistruct#"--__timepoint+65_RelativeTimePoint"; sportevent#scoreAfterGoal ->"0:2"].
Detecting New Concept Mappings • Alignment via Argument Named Entities only • Extracting hypotheses for new head-to-concept and arg-to-role mappings, using the density of alignment as confidence measure • Detect new head-concept (and role) mappings for existing concepts • Nur zwei Minuten später war es Moreno, der plötzlich an den Ball kam und zum2:0einschoss. (“shot in to the 2:0”) • Daraus machte Nakamura in der 29. Minutedas3:0. (“… made the 3:0 in the 29th minute”) • In der ersten Halbzeit war Hongkong durch Ng Wai-chiu mit 1:0in Führung gegangen. (“went into leadership”)
Conclusions • Ontology-based Event Extraction • Extending the LingInfo lexicon model for ontologies to PredArg structure • Integrating syntactic argument structure in a type-based OBIE system • Strategies for Automatic Acquisition of LingInfo Lexical Base and Event Recognition Rules • Learning from text annotations • Generalisations over existing valence – ontology mappings • Alignment of text annotations and semi-structured data • Future Work • Application and refinement of acquisition strategies • Extending event annotation in SProUT to TimeML framework