1 / 12

CLARIN? ISOCAT! Ineke Schuurman ISOcat content co ö rdinator CLARIN-NL Amsterdam 30-08-2012

CLARIN? ISOCAT! Ineke Schuurman ISOcat content co ö rdinator CLARIN-NL Amsterdam 30-08-2012. 1. ISOcat general use in CLARIN An example Your task wrt ISOcat. Overview. 2. ISOcat: Data Category Registry defining widely accepted data categories (DCs) http://www.isocat.org

clark
Télécharger la présentation

CLARIN? ISOCAT! Ineke Schuurman ISOcat content co ö rdinator CLARIN-NL Amsterdam 30-08-2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLARIN? ISOCAT! Ineke Schuurman ISOcat content coördinator CLARIN-NL Amsterdam 30-08-2012 1

  2. ISOcat general use in CLARIN An example Your task wrt ISOcat Overview 2

  3. ISOcat: Data Category Registry defining widely accepted data categories (DCs) http://www.isocat.org Registry that stores DCs for language resources and their metadata, together with properties of the DCs (definition, administration, examples, etc.) ISOcat 3

  4. Use in CLARIN what is in resource A meant with DC X ? There may be several (valid) definitions !!! Does X have the same meaning in resources A and B ? In CLARIN needed first and foremost for tools (so that they ‘know’ what the meaning of elements in resources are) Especially important for: search in data and metadata But also for other tools that apply to data (cf. last talk on TTNWW) Human use is only secondary, but … humans must after all fill the ISOcat registry, and make the right mappings 4

  5. Have a look at these two tags: WW(pv,tgw,ev) N(soort,ev,dim,onz,stan) All parts of such tags, like ev, are to be included in ISOcat. The full tags are to be included as well. ev, enkelvoud, sg, sing, singular, singulier, … An example with ‘ev’

  6. All these representations can be mapped on one DC: singular -DC-4918 word form indicating that one entity is involved In full: http://www.isocat.org/datcat/DC-4918 singular

  7. Other cats ISOcat: defining DCs ongoing RELcat: relating DCs started SCHEMAcat: a registry of Schemas, a schema being a description of the structure of your dataformat just started 7

  8. Call 4 projects Each call 4 project must check, for each DC used in your resource or its metadata, whether a corresponding DC exists in ISOcat If not, extend ISOcat with such a DC, with all its properties (definitions, examples, etc.) create a schema with a mapping that maps each DC used in the resources and metadata to an ISOcat DC All this will be explained in tutorials 8

  9. Call 4 projects do NOT underestimate this ISOcat task! Good news: DCs used in some common formats are already included in ISOcat CGN / D-Coi tagset TEI header elements Many DCs concerning metadata Contact ASAP a CLARIN-centre to help you with this OR contact the helpdesk (helpdesk@clarin.nl) 9

  10. Thank you for your attention. Any questions? CLARIN-NL 10

  11. CGN CGN-format <pw ref=“fn000248.20.4” w=“is” pos=“WW(pv,tgw,ev)” lem=“zijn” … pq=“man” /> VU-DNC FoLiA-format <w xml:id=“BAObi1.s.5.w.18”> <t>is</t> <lemma class=“zijn”/> <pos class=“WW(pv,tgw.ev)”> … <pos/> <t class=“ocroutput’>is</t> </w> XML-format

More Related