Enhancing Dutch Regional Dictionaries: D-Square Digital Tools and Methodologies
This paper explores the D-square project aimed at digitizing and enhancing Dutch regional dictionaries. It covers the project phases, conversion to new formats, and end-user access. Key topics include macrostructure and microstructure of the dictionaries, agricultural and craft terminologies, as well as phonetic variations within entries. The methodology emphasizes an XML encoding tailored for dialect dictionaries, promoting flexibility and standards adherence. Examples illustrate the technical specs, focusing on enriched data management, user accessibility, and the conceptual linking challenges faced in the project.
Enhancing Dutch Regional Dictionaries: D-Square Digital Tools and Methodologies
E N D
Presentation Transcript
D-square Digital Databases and Tools for Dutch Regional Dictionaries Folkert de Vriend - Methods XII, Moncton, Canada, 2005 -
Topics • Introduction dictionaries • Overview of project phases • Conversion • End user access
Macrostructure WBD & WLD • Agricultural terminology • Other technical or craft terminologies • Common vocabulary
Microstructure WBD & WLD Linguistic information in entries: • Phonetic (vol. I&II) or lexical (vol. III) variant • Heteronym (functions as headword) • Lexical meaning (word, or description of a concept) Non-linguistic information in entries: • Place name
Overview phases D-square • Conversion to a new format • End user access to data • Enrichment of data • Data management
Raw data FileM Pro Edited data XML Raw data Questionnaires Nijmegen and Leuven Questionnaires (chiefly) Meertens (parts of) Vol. I+II MS-Word Vol. III FileM Pro Enriched data XML (parts of) Vol. I+II MacWrite Deel III MS-Word Vol. III Filing cards Online DB WBD (Polderland) Edited data Vol. I + II Vol. III Website WBD/WLD with tools for searching and cartography Specialized print editions (dialect atlas or local dictionary) SGV on CD (Polderland)
Reasoning behind new encoding • XML, not relational database • Tailored to WBD and WLD • Flexible enough to be used for other dialect dictionaries • Standards: ISO TC 37/SC 4 DCR and LMF. (DCR = flat concept registries)
Example XML-encoding <LEXICON> <ENTRY> <CONCEPT ontol_ID="492"> <WORD lang=“Dutch”>Meikever</WORD> <ILLUSTRATION uri="meikever.jpg"/> <DESCRIPTION></DESCRIPTION> </CONCEPT> <VARIANTS> <VARIANT type="heteronym">Bakkertje <VARIANT type="lexical">Bakkerke <VARIANT type="raw" import="diplomatic" source1="N83"> bakkərkə <LOCATION>K178</LOCATION> </VARIANT> </VARIANT> </VARIANTS> </ENTRY> … </LEXICON>
XML-encoding for WALD <LEXICON> <ENTRY> <CONCEPT ontol_ID="492"> <WORD lang=“Achterhoeks”>Hals</WORD> <DESCRIPTION></DESCRIPTION> </CONCEPT> <VARIANTS> <VARIANT type="heteronym">Hals <LOCATION>Gor</LOCATION> <LOCATION>Harf</LOCATION> <LOCATION>Alf</LOCATION> <LOCATION>Eef</LOCATION> <LOCATION>...</LOCATION> </VARIANT> <VARIANT type="heteronym">Kael <LOCATION>Gor</LOCATION> <LOCATION>Eef</LOCATION> <LOCATION>Wich</LOCATION> <LOCATION>Gen</LOCATION> <LOCATION>...</LOCATION> </VARIANT> </VARIANTS> </ENTRY> </LEXICON>
XML-encoding for WZD <LEXICON> <ENTRY> <CONCEPT ontol_ID="292"> <WORD lang=“Dutch”>de haard</WORD> <DESCRIPTION></DESCRIPTION> </CONCEPT> <VARIANTS> <VARIANT type="raw">'aerd (d'n) <LOCATION>Z.eil</LOCATION> <LOCATION>Z.V.W.</LOCATION> <LOCATION>L.v.Ax</LOCATION> <LOCATION>Hlt</LOCATION> <LOCATION>...</LOCATION> </VARIANT> <VARIANT type="raw">êêrd <LOCATION>Z.V.W.</LOCATION> </VARIANT> </VARIANTS> </ENTRY> </LEXICON>
The trouble with linking • At the conceptual level
The trouble with linking • At the conceptual level • At the level of the variants
Information about D-square www.ru.nl/dialect/d2