140 likes | 225 Vues
LIRICS WP3: Morpho-syntactic and syntactic annotations. Thierry Declerck declerck@dfki.de DFKI-LT - Saarbrücken 23rd May 2006. Summary of the presentation. WP3: 1st year objectives objectives and work done in T3.1, T3.2 status of deliverables in T3.1, T3.2
LIRICS WP3: Morpho-syntactic and syntactic annotations Thierry Declerck declerck@dfki.de DFKI-LT - Saarbrücken 23rd May 2006
Summary of the presentation • WP3: 1st year objectives • objectives and work done in T3.1, T3.2 • status of deliverables in T3.1, T3.2 • Main Achievements: SynAF NWI accepted and Working Draft discussed at ISO level • Next Steps: Data Categories for Syntactic Annotation • Synergies with other LIRICS WPs, ISO activities, meetings
Task 3.1: Survey Task 3.1: Evaluation of initiatives for morpho-syntactic and syntactic annotation . Objectives:Evaluate past and current initiatives dealing with standardisation, guidelines and recommendations for morpho-syntactic and syntactic annotation. Background for starting ISO standardization activity on syntactic annotation (model + data categories) Results: Deliverable 3.1 (V1 and V2).
D.3.1 D.3.1 Updates D.3.1Evaluation of initiatives for morpho-syntactic and syntactic annotation . Task 3.1 Deliverables
Task 3.2: ISO Proposal for a standard for Syntactic Annotation Submit to ISO TC37/SC4 a New Work Item on Syntactic Annotation, and go through the ISO procedure till the final document. Objectives: Proposal for morpho-syntactic annotation meta-model standard together with a wide coverage of data category selection formorpho-syntatic annotation Results: New Work Item on Syntactic Annotation (SynAF) succesfully submitted to ISO TC37/SC4. Ongoing Discussion on the Working Draft
Task 3.2: Deliverables Interim Report: WD of morpho-syntatic annotation standard for CD ballot. CD of morpho-syntatic annotation standard for ISO DIS ballot
The SynAF Proposal Syntactic Annotation has 2 Functions in NLP • 1) To represent linguistic constituencies, like Noun Phrases (NP), describing a structured sequence of morpho-syntactically annotated items, where we consider also constituents built from non-contiguous elements, and • 2) To represent dependency relations dependency information can exist between morpho-syntactically annotated items within a phrase (an adjective is the modifier of the head noun within an NP) or describe a specific relation between syntactic constituents at the clausal and sentential level (i.e. an NP being the "subject" of the main verb of a clause or sentence). In the first case we speak of an internal dependency and in the second case we speak of an external dependency. But the dependency relation can also be stated including empty elements (like the pro-drop property in romance languages)
The SynAF Proposal (2) SynAF is concerned thus with a meta-model that covers both dimensions of syntactic constituency and dependency, and SynAF is proposing a multi-layered annotation framework that allows the combined and interrelated annotation of language data along both lines of consideration.Also the data-categories to be proposed to ISO standardization will be about the basic annotation concerning both dimensions.
The SynAF Proposal (3) A main starting point: Tiger The Tiger annotation framework foresees 2 types of annotation: for constituency (represented than by a node labelin the annotation framework) and for dependency (represented as an edge label in the annotation framework).
The SynAF Proposal (4) Another starting point: ISST The approach followed in the ISST (Italian Syntax Semantic Treebank) framework, is similar to the one proposed in Tiger, in the sense that a multi-layered syntactic annotation strategy is proposed: One level for constituency and one level for dependency, with a pointing mechanism for referring from the second level to the first one.
<body> <s id="s5"> <graph root="s5_504"> <terminals> <t id="s5_1" word="Die" pos="ART" morph="Def.Fem.Nom.Sg"/> <t id="s5_2" word="Tagung" pos="NN" morph="Fem.Nom.Sg.*"/> <t id="s5_3" word="hat" pos="VVFIN" morph="3.Sg.Pres.Ind"/> <t id="s5_4" word="mehr" pos="PIAT" morph="--"/> <t id="s5_5" word="Teilnehmer" pos="NN" morph="Masc.Akk.Pl.*"/> <t id="s5_6" word="als" pos="KOKOM" morph="--"/> <t id="s5_7" word="je" pos="ADV" morph="--"/> <t id="s5_8" word="zuvor" pos="ADV" morph="--"/> </terminals> <nonterminals> <nt id="s5_500" cat="NP"> <edge label="NK" idref="s5_1"/> <edge label="NK" idref="s5_2"/> </nt> <nt id="s5_501" cat="AVP"> <edge label="CM" idref="s5_6"/> <edge label="MO" idref="s5_7"/> <edge label="HD" idref="s5_8"/> </nt> <nt id="s5_502" cat="AP"> <edge label="HD" idref="s5_4"/> <edge label="CC" idref="s5_501"/> </nt> ….. </nonterminals> Example for Tiger
<frase id="0" morfofile="sole.morph026" rs="Presentato un libro bianco del Governo Major ."> <nodo tipo="F3"> <nodo tipo="SV3" id="0"> <foglia lemma="presentare" href="mw_001"/> <nodo tipo="COMPT" id="1"> <nodo tipo="SN" id="2"> <foglia lemma="un" href="mw_002"/> <foglia lemma="libro" href="mw_003"/> <nodo tipo="SA" id="3"> <foglia lemma="bianco" href="mw_004"/> </nodo> <nodo tipo="SPD" id="4"> <foglia lemma="di" href="mw_005"/> <nodo tipo="SN" id="5"> <foglia lemma="governo" href="mw_006"/> <nodo tipo="SN" id="6"> <foglia lemma="major" href="mw_007"/> </nodo> </nodo> </nodo> </nodo> </nodo> </nodo> <foglia lemma="." href="mw_008"/> </nodo> </frase> Example for ISST
Precise the Meta Model for Syntactic Annotation Establish a list of data categories for syntactic annotation. Create a profile for this in the Data Category Registry at Loria. Actual Work and Next Steps
WP3 Activities, Meetings, Synergies LIRICS WPs Meetings: • CNR-ILC – DFKI, 5.5.2005: convergences between morpho-syntactic and syntactic data; issues for the submission of the N W I on Syntax (SynAF) to ISO • DFKI – Tilburg, 2 meetings in Nov. And Dec. 2005 on the relation betwenn argument structures and semantic roles. • LORIA, CNR-ILC, DFKI, March 2006. Meeting in Paris discussing the data categories (with definition) to be included in the DCR. Redesign of DC, introduction of a profile for syntactic data categories. LIRICS Meetings • Paris, 16-17.3.2005. Progress of work within WP3. Barcelona, 21-22.6.2005. LIRICS Industrial Advisory Board Meeting • Barcelona, 22.6.2005 Presentation of first bulk of information relevant for lexical description • Nancy, 8-9.12.2005. WP4 TDG3 Workshop: convergences between lexico-semantic representation and semantic roles in lexicon and in annotation ISO Meetings • Berlin 8-9.4.2005. ISO TC37/SC4 WG4 Meetings • Warsaw 21-26.08.05. Plenary meeting of ISO TC37/SC4. Presentation of SynAF proposal (which has been accepted as NWI in July 2005) • ISO TC37/SC4 meeting in Jeju, Korea (January 2006). Discussion of SynAF with Asian and North American representative of standardisation bodies • Joint ISO TC37/SC4, TDG3 and LIRICS meeting in Marina Del Rey: discussing with American partners the issue of semantic annotation, incuding the relation between syntactic and semantic annotation