1 / 14

CMD and TEI

CMD and TEI. CMDI interoperability workshop 2013-06-04 - Utrecht Matej Ďu r č o, ICLTT, Vienna. TEI at ICLTT. AAC – Austrian Academy Corpus diachronic corpus ~ 500 mil. tokens being converted into TEI C4 – distributed corpus of german of 20 th century Basel, Berlin, Bozen , Wien

keely
Télécharger la présentation

CMD and TEI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMD and TEI CMDI interoperabilityworkshop2013-06-04 - UtrechtMatej Ďurčo, ICLTT, Vienna

  2. TEI at ICLTT • AAC – Austrian Academy Corpus • diachronic corpus ~ 500 mil. tokens • being converted into TEI • C4 – distributed corpus of german of 20th century • Basel, Berlin, Bozen, Wien • harmonized format (TEI/teiHeader) • Dict-Gate • TEI encoded multilingual lexicons (persian, arabic, german, english) • however described with LexicalResourceProfile • Abacus – Austrian Baroque Corpus • 3 (5) historical texts encoded in TEI • elaborate teiHeader

  3. TEI (andfriends?) in CMD • overviewofcurrentlyexistingTEIish CMD-profiles

  4. teiHeader(ICLTT) size = reuse in otherprofiles

  5. teiHeader(DTA) size = countelements in instancedata

  6. datcats in teiHeader(DTA)

  7. TEI andISOcat • a special DCS: TEi Header (2.1.0) • Windhouwer, 2012 • a datcatforeveryelementoftheteiHeader (135 datcats) • based on an ODD-file (ODD2DCIF.xsl and DCIF2ODD.xsl available) • owedto CLARIN-NL projectsusing TEI header • a enriched schema was generated = annotated with these new data categories (dcr:datcat-attribute) put in SCHEMAcat: http://lux13.mpi.nl/schemacat/schema/teiHeader • define relations between TEI and other data categories in RELcat(the relation registry)

  8. Next Step(s) ? • create (oradaptexisting) teiHeaderprofile • as a unionoftheexistingprofiles ? • based on theenrichedschema • i.e. linkingtothenew TEI datacategories • define a relationset in RELcatbetween TEI andISOcat (anddublincore) datacategories

  9. profile: data (LINDAT) dublincore + metashare

  10. profile: data (LINDAT) resourceInfo-component

  11. dublincore I • 2 profileswith dc-terms (55 datacategories) • 2 profileswith dc-elements (called „dc-terms“) asof 2013-01

  12. dublincore II currently (2013-06)4 DCMI-terms profiles

  13. dublincore III (almost) all datcatssharedby all

  14. dublincore IV 1 profilehasextra component:DANS-DC-metadata example:language

More Related