1 / 17

A Common Standard for Data and Metadata: The ESDS Qualidata Document Type Definition (DTD)

A Common Standard for Data and Metadata: The ESDS Qualidata Document Type Definition (DTD). Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation and Web Standards Centre Point, London 15 November 2005. Why another DTD?. need a standard

tom
Télécharger la présentation

A Common Standard for Data and Metadata: The ESDS Qualidata Document Type Definition (DTD)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Common Standard for Data and Metadata: The ESDS Qualidata Document Type Definition (DTD) Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation and Web Standards Centre Point, London 15 November 2005

  2. Why another DTD? • need a standard • that includes both file-level metadata and content-level metadata • enables more precise searching/browsing • extends to linking between sources (e.g. text, annotations, analysis, audio etc) • need one customised to social science research that: • meets generic needs of varied data types • is more ‘analytical’ than ones adapted from TEI speech schema (e.g. oral history projects) • is less granular than ones for conversational analysis (highly detailed)

  3. What does a DTD enable? • marking up data to an XML standard for data providers to publish to online systems, such as ESDS Qualidata Online (formerly Edwardians) • meet needs of researchers requesting a standard they can follow • encourage more qualitative data analysis software companies to pursue XML- outputs (and import/export tools) based on this standard

  4. Hybrid of two standards for the metadata – the DDI Standard for study, file and variable level • Level 1: DDI Document description • Level 2: DDI Study description • Level 3: DDI Data file description • file contents; format; data checks; processing; software) • Level 4: DDI Variable description: • for study survey data (mixed methods) or numeric outputs from qualitative data: • demographic profile of sample • other quantified responses to qualitative data (attributes or thematic classifications often assigned (coded) in CAQDAS software) • Level 5: DDI Other study related materials • Level 6: TEI-based qualitative content

  5. DDI mark-up of metadata |----2.0 stdyDscr+ (ATT == ID, xml-lang, source, access) | |----2.1 citation+ (ATT == ID, xml-lang, source, MARCURI) | | |----2.1.1 titlStmt (ATT == ID, xml-lang, source) | | | |----2.1.1.1 titl (ATT == ID, xml-lang, source) Study Name | | | |----2.1.1.2 subTitl* (ATT == ID, xml-lang, source) … | | |----2.1.4 distStmt? (ATT == ID, xml-lang, source) | | | |----2.1.4.1 distrbtr* (ATT == ID, xml-lang, source, abbr, affiliation, URI) | | | |----2.1.4.2 contact* (ATT == ID, xml-lang, source, affiliation, URI, email) | | | |----2.1.4.3 depositr* (ATT == ID, xml-lang, source, abbr, affiliation) Depositor … |----3.0 fileDscr* (ATT == ID, xml-lang, source, URI, sdatrefs, methrefs, pubrefs, access) | | | |----3.1 fileTxt* (ATT == ID, xml-lang, source) | | | | | |----3.1.1 fileName? (ATT == ID, xml-lang, source) | | |----3.1.2 fileCont? (ATT == ID, xml-lang, source) | | |----3.1.3 fileStrc? (ATT == ID, xml-lang, source, type) | | |----3.1.4 dimensns? (ATT == ID, xml-lang, source) … | | | | | +----3.1.4.5 recNumTot* (ATT == ID, xml-lang,source) filesize? | | |----3.1.5 fileType? (ATT == ID, xml-lang, source, charset) | | |----3.1.6 format? (ATT == ID, xml-lang, source) file format

  6. TEI for content mark-up • standard for text mark-up in humanities and social sciences • elements for the header for a TEI-conformant DTD:<teiheader = type = text/corpus> <fileDesc> <encodingDesc> <profileDesc> <revisionDesc> standard bibliographic ref to text • mandatory = <teiHeader type=text> <fileDesc> <titleStmt> <!-- ... --> </titleStmt> <publicationStmt><!-- ... --> </publicationStmt> <sourceDesc> <!-- ... --> </sourceDesc> </fileDesc> <!-- remainder of TEI Header here --> </teiHeader>

  7. Excerpt from interview transcript

  8. Excerpt with XML mark-up <u n=“31”> … <s n="44"> My father was, in the daytime he was a boilermaker on the old <name type="organisation">North <add place="supralinear">Staffordshire</add> <del type="word change">Circular</del> Railway</name> and then every night he played in the theatre orchestra. </s> <s n="45"> And sometimes <add place="supralinear">even</add> after the theatre he would go on and play for an hour or two at a dance, well they called them balls in those days. </s> <s n="46">And he <add place="supralinear">'d to go to</add> <del>had got to be at</del> work at six the next morning! <note place="end of paragraph">Cornet player.</note> </s> </u>

  9. Four components of a TEI DTD • core tag set – available to all TEI docs • base tag set – transcription of speech <!ENTITY % TEI.spoken 'INCLUDE' > • additional tag sets – optional • linking • analysis • certainty and responsibility • transcription • names and dates • corpora • entity tag sets – not needed

  10. Issues this DTD will resolve • multiple speakers • turn taking • researcher annotations of transcripts • thematic coding (as well as is possible with XML) • name and place references • compatibility with existing XML-enabled qualitative data analysis software (e.g. Atlas.ti output) • as always, formatting elements handled with style sheets, not in the DTD

  11. Much work remains… • further integration of DDI and TEI required elements • define the DTD for an individual case (e.g. transcript) or a collection, or both? • elements selected: not too many, not too few – assign mandatory and optional • how elements are used: follow existing norms, set standard where necessary • need DDI specialist interest group/DDI structural reform group to help define and refine a suitable DTD

  12. Selected elements from Atlas for codes (themes) and pointers <codes size="52"> <code name="A Formula" id="co_5" au="Thomas M" cDate="2003-03-04T14:30:57" mDate="2003-03-07T13:19:42" cCount="0" qCount="1" > </code> <q name="And the name of the star is ca..“ id="q1_1" au="Admin" cDate="1991-03-11T13:27:48“ mDate="1993-10-08T21:45:00" loc="5 @ 27, 98 @ 27"/> </q>

  13. Need for publishing tools • once DTD is more developed, next step is to develop publishing tools to automate as much of mark-up as possible • currently using simple scripts to find and mark <u> and <s>; much work still done manually • looking into options for automatic mark-up of some components (e.g. natural language processing and information extraction): • customising existing NLP tools at Essex and Edinburgh

  14. Collaborators • Oxford Computer Centre (TEI) • NLP team at Sheffield • NLP team at Essex • NLP team at Edinburgh • Atlas.ti developers (Berlin) • Cardiff Ethnography Group • E-social science programme text mining groups • academics in UK who wish to use standard • FSD • US and rest of world? • DDI, IASSIST, CESSDA

  15. Selected references • ESDS Qualidata Online web site www.esds.ac.uk/qualidata/online/ • Barker, E. and Corti, L. (2002) “Enhancing access to qualitative data: Edwardians On-line.” ASLIB Journal, Assignation, 20, pp. 40-43 • Carmichael, P. (2002) “Extensible mark-up language and qualitative data” FSQ 3(2), http://www.qualitative-research.net/fqs-texte/2-02/2-02carmichael-e.htm • Derose, S. (1999) “XML and the TEI.” Computers and the Humanities. 33, pp.11-30. • Kuula, A. (2002) “Making qualitative data fit the ‘Data Documentation Initiative’ or vice versa? FSQ 1(3) www.qualitative-research.net/fqs-texte/3-00/3-00kuula-e.htm • Muhr, T. (2000) “Increasing the reusability of qualitative data with XML.” FSQ 3(1) www.qualitative-research.net/fqs-texte/3-00/3-00muhr-e.htm#g42 • Muller, E. et al. “Using XML for long-term preservation.” http://edoc.hu-berlin.de/etd2003/hansson-peter/HTML/ • Sperberg-McQueen, C.M.. and Burnard, L. (eds.) (2002). TEI P4: Guidelines for Electronic Text Encoding and Interchange. Text Encoding Initiative Consortium. XML Version: Oxford, Providence, Charlottesville, Bergen)

  16. For more information • ESDS Qualidata www.esds.ac.uk/qualidata/introduction.asp • ESDS Qualidata Online www.esds.ac.uk/qualidata/online/

More Related