70 likes | 195 Vues
This document is in its nearly final version and requires finalization for linguistic annotation frameworks to be applicable to various media. It includes addressing issues related to anchors, regions, layers, and resource headers, among others. Suggestions for wording and improvements are solicited to enhance the application to different media types.
E N D
Linguistic Annotation Framework ISO TC37 SC4 Working Group 1 4/11/11 Brandeis University
Status of LAF document • Nearly final version • Needs to be finalized in next few weeks • Prior document has been distributed for comment from member country groups and changes have been made on this basis • GrAF schemas have been extensively implemented in two major corpora • Open American National Corpus (OANC) and Manually Annotated Sub-Corpus (MASC)
Remaining Issues • Definitions of • Anchors, regions (segmentation)** • Layers • Media • Need to verify that these are adequate for all media, including speech, image, etc. Would appreciate suggestions for wording concerning application to media
Remaining Issues • Resource Header • Definitions of various entities • Consistency of attribute names etc. • Consistency of reference from annotation documents etc.
Remaining Issues • Replace <tagUsage> in annotation document header with means to provide annotation categories used (and number of times used) Possibilities: • List of the categories (or ISOCat references) and frequencies • need to define an element for this • External document with the information • XML? • Specification of the categories without frequencies, e.g. documentation of the scheme • ???
Remaining Issues • Rewording of feature structure specification to reflect change in fs spec that accommodates GrAF format for <f> • <f name=“FE” value=“perceiver”/>
Remaining Issues • Document format • Placement of examples