1 / 16

SemAF – Basics: Semantic annotation framework

TC 37/SC 4/WG 2 Kiyong Lee, convenor. SemAF – Basics: Semantic annotation framework Harry Bunt Tilburg University isa -6 Joint ISO - ACL/SIGSEM workshop Oxford, 11 - 12 January 2011.

frieda
Télécharger la présentation

SemAF – Basics: Semantic annotation framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TC 37/SC 4/WG 2 Kiyong Lee, convenor SemAF – Basics: Semantic annotation framework Harry Bunt Tilburg University isa-6 Joint ISO - ACL/SIGSEM workshop Oxford, 11 - 12 January 2011

  2. Outline • Background: ad-hoc Task Domain Group TDG 3; LIRICS; SemAF part 1 (time and events); part 2 (dialogue acts); ... • General (ISO, LAF) considerations on annotation standards • Specific LAF requirements • Additional or elaborated methodological requirements • Principle of Additivity (Complementarity) • Abstract versus concrete syntax • Semantics for abstract syntax • Requirements on representation formats • Metamodel and abstract syntax • Core entities, extensions and subschemas • Layers and integrated annotation/representation • Conclusion: How to move further?

  3. Aims • Make explicit what is, or should be, common to the various parts of SemAF (24617) • Ensure consistency of the various parts of SemAF (24617): • Their aims • Their methodology • Their annotation schemes • Their representation schemes • Provide guidelines for future parts of SemAF

  4. General requirements on linguistic annotation standards • Media independence (common mechanisms should be provided to handle all media types, including text, audio, video, etc.) • Data integrity (use standoff rather than inline representation format) • Machine processibility (representations must be machine readable and interpretable; the burden of interpretation should not be left to the processing software) • Human readability (representations must be human readable, at least for creation and editing)

  5. LAF requirements: • Distinguish annotation from representation. • An annotation is certain linguistic information that is added to language data, independent of its representation. • A representation is the format into which annotation is rendered, independent of its content. • Distinguish systematically between content and reference in annotation representations • Uniform and TEI-compliant way of referring to relevant segments of source data • Uniform way of cross-referencing between different layers of annotation

  6. SemAF-specific requirements (1) • Semantic additivity (semantic annotations should add semantic information to source data (rather than, e.g., ‘flag’ semantic phenomena)) • Semantic explicitness (information in an annotation scheme must be explicit: the burden of interpretation should not be left to the processing software) • Conceptual consistency (concepts used in annotations in different SemAF-parts should have the same meaning; related concepts in different SemAF-parts should be semantically consistent; underlying meta models should be mutually consistent) • Representational consistency (a single mechanism should be used to represent the same type of information; there must be a consistent underlying data model)

  7. SemAF-specific requirements (2) • Methodological consistency (Bunt, ICGL-2 Hong Kong, January 2010; Ide & Bunt, LAW-IV, Uppsala, July 2010): • Conceptual analysis: metamodel • Abstract syntax: extended formal specification of metamodel • Definition of formal semantics of abstract syntax • Concrete syntax: definition of ‘ideal’ representation format • Core entities; extensions; subschemas • Relation to Data Category Registry

  8. Additivity and Explicitness • Annotations (ad notare ≈ adding notes to) add information to portions of source text (cf. LAF); semantic annotations add semantic information to source text. • Semantic annotations can only count as such if they have a formal semantics (Bunt & Romary, 2002), which makes them machine-interpretable.

  9. Conceptual consistency • ISO-TimeML: events subdivided into transitions, processes, and states; ISO-Semantic Roles? • ISO-TimeML: event-time relations like AT, DURING; DURATION; ISO-Semantic Roles: temporal semantic roles • ISO-Space: event-location relations; ISO-Semantic Roles: semantic roles relating motion events to locations etc. (Location, Source, Goal, Distance,..) • ISO-Dialogue Acts: rhetorical relations between dialogue acts like Explanation, Justification Exemplification; ISO-DS: similar discourse relations

  10. Abstract and concrete syntax of an annotation language • Abstract syntax is a formal specification of the categories of objects and relations in a metamodel, describing how these elements may be combined to form annotations, defined as set-theoretical constructs; • Concrete syntax specifies a particular format for the representation of annotations.  The abstract/concrete syntax distinction implements the fundamental distinction between annotations and representations made by LAF.

  11. Semantics, abstract and concrete syntax • Semantics of semantic annotations should be defined for abstract syntax, rather than for some concrete representation format. • Advantage: every representation format for the same abstract syntax has the same semantics

  12. Requirements on representation formats • Expressive adequacy: each annotation structure can be represented in this format; • ‘Unambiguity’: each representation encodes a unique annotation structure. A representation format that satisfies these requirements is called ideal (Bunt, ICGL-2, Hong Kong, January 2010)  Representations in one ideal format can be converted in a meaning-preserving way to any other ideal format.

  13. Ideal concrete syntax F1 abstract syntax ideal concrete syntax-1 F1-1 C21 F2-1 C12 F2 ideal concrete syntax-2 Ia semantics

  14. Core concepts, extensions, and subschemas; and the DCR • A standard specifies: • core concepts; • principles for adding elements to the set of core concepts; • principles for subschemas of a standard annotation schema. • Core concepts should be entered into the ISO DCR

  15. Things that cut across SemAF parts • Overlaps, e.g. • Events and their classification (ISO-TimeML, ISO-Space, Semantic roles) • Time and place (ISO-TimeML, ISO-Space, Semantic roles, ISO-NE) • Rhetorical and other coherence relations in dialogue and discourse (ISO-Dialogue acts, ISO-DS) • Cutting across: • Negation; modality • Quantification; modification

  16. References Bunt, Harry (2010) A methodology for designing semantic annotation languages. In Proceedings of the 2nd International Conference on Global Interoperability for Language Resources (ICGL-2), Hong Kong, January 2010, pp. 29-46. Bunt, Harry (2011) Multifunctionality in dialogue. Computer, Speech and Language 25, 225-245. Ide, Nancy and Harry Bunt (20100 Anatomy of semantic annotation schemes: Mappings to GrAF. In Proceedings of the 4th Linguistic Annotation Workshop (LAW-IV), Uppsala, July 2010.

More Related