1 / 7

Finalizing Linguistic Annotation Framework: Addressing Media, Entities, and Document Formatting

This document is in its nearly final version and requires finalization for linguistic annotation frameworks to be applicable to various media. It includes addressing issues related to anchors, regions, layers, and resource headers, among others. Suggestions for wording and improvements are solicited to enhance the application to different media types.

lavi
Télécharger la présentation

Finalizing Linguistic Annotation Framework: Addressing Media, Entities, and Document Formatting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linguistic Annotation Framework ISO TC37 SC4 Working Group 1 4/11/11 Brandeis University

  2. Status of LAF document • Nearly final version • Needs to be finalized in next few weeks • Prior document has been distributed for comment from member country groups and changes have been made on this basis • GrAF schemas have been extensively implemented in two major corpora • Open American National Corpus (OANC) and Manually Annotated Sub-Corpus (MASC)

  3. Remaining Issues • Definitions of • Anchors, regions (segmentation)** • Layers • Media • Need to verify that these are adequate for all media, including speech, image, etc. Would appreciate suggestions for wording concerning application to media

  4. Remaining Issues • Resource Header • Definitions of various entities • Consistency of attribute names etc. • Consistency of reference from annotation documents etc.

  5. Remaining Issues • Replace <tagUsage> in annotation document header with means to provide annotation categories used (and number of times used) Possibilities: • List of the categories (or ISOCat references) and frequencies • need to define an element for this • External document with the information • XML? • Specification of the categories without frequencies, e.g. documentation of the scheme • ???

  6. Remaining Issues • Rewording of feature structure specification to reflect change in fs spec that accommodates GrAF format for <f> • <f name=“FE” value=“perceiver”/>

  7. Remaining Issues • Document format • Placement of examples

More Related