1 / 24

Karlheinz Mörth 1 , Stephan Procházka 2 , Ines Dallaji 2

Laying the Foundations for a Diachronic Dictionary of Tunis Arabic A First Glance at an Evolving New Language Resource. Karlheinz Mörth 1 , Stephan Procházka 2 , Ines Dallaji 2 1 Institute of Corpus Linguistics and Text Technology ( Austrian Academy of Sciences )

evonne
Télécharger la présentation

Karlheinz Mörth 1 , Stephan Procházka 2 , Ines Dallaji 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Laying the Foundations for a Diachronic Dictionary of Tunis ArabicA First Glance at an Evolving New Language Resource Karlheinz Mörth1, Stephan Procházka2, Ines Dallaji2 1Instituteof Corpus Linguistics and Text Technology (AustrianAcademyofSciences) 2DepartmentofOrientalStudies (Universityof Vienna) karlheinz.moerth@oeaw.ac.at stephan.prochazka@univie.ac.at ines.dallaji@univie.ac.at

  2. IntroductionTwoprojects Vienna Corpus of Arabic Varieties (VICAV) Linguistic Dynamics in the Greater Tunis Area: A Corpus-based Approach (TUNICO) Text technology + Linguistics

  3. IntroductionVICAV ==> Vienna Corpus of Arabic Varieties Digital language resources of a wide range of spoken Arabic varieties: dictionaries, corpora, bibliographies, language profiles, best practices Cooperation of University of Vienna and the Austrian Academy of Sciences http://corpus3.aac.oeaw.ac.at/vicav2/

  4. IntroductionVICAV

  5. IntroductionVICAV

  6. IntroductionVICAV

  7. IntroductionTUNICO ==> Linguistic Dynamics in the Greater Tunis Area: A Corpus-based Approach Funded by the Austrian Science Fund (FWF, P 25706-G23) Main objectives: Exploration of spoken, contemporary Arabic Two digital language resources Corpus of spoken youth language Dictionary of Tunis Arabic

  8. Arabicdialectlexicography • NocomprehensivedictionaryoftheArabicdialectof Tunis • Basis for diachronicresearch: • Nicolas, A. (1911).Dictionnairefrançais-arabe • Beaussier, M. (2006). Dictionnairepratiquearabe-français(arabemaghrébin) • Quéméneur, J. (1961). “Notes surquelquesvocables du parlerTunisien” • Quéméneur, J. (1962). “Glossaire de dialectal” • Abdellatif, K. (2010). Dictionnaire «le Karmous» du Tunisien • Marçais, W. , Guîga, A. (1958-61). Textes arabes de Takroûna. II:Glossaire

  9. Dictionaryof Tunis Arabic - micro-diachronic and machine-readable - up-to-date and easily accessible lexical information - incorporation of: a) contemporary data from a digital corpus b) various historical sources (e.g. Stumme, H.) - information added is kept traceable to its origin - basis: data taken from didactic materials - 3 other main sources: newly created corpus, interviews and historical publications

  10. Dictionary of Tunis ArabicContemporary sources 1) Corpus of spoken youth language (dialogues, narratives): uncommon approach in Arabic dialectology: dialectological interests in language of older people --> only older forms of particular varieties known focus on modern language, contemporary usage and lexical neologisms 2) Additional interviews to complete the data gained from corpus and historical sources

  11. Dictionary of Tunis ArabicHistorical sources - 800-page grammar of the Medina of Tunis by Hans-Rudolf Singer (1984): evaluation of data, integration of excerpted lexicographic data into dictionary - Verification and completion of collected data with other historical resources - Diachronic dimension helps to understand processes in the development of the lexicon - Material gathered will allowanalysis of recent developments (migration of parents from rural areas, influence by other Arabic varieties, influence of revolution, foreign elements)

  12. Dictionaryof Tunis Arabic

  13. Dictionary of Tunis ArabicTechnical issues Modelling thedata Interoperability TEI P5

  14. Dictionary of Tunis ArabicTechnical issues • Using the TEI dictionary module to encode digitised print dictionaries is a fairly common standard procedure in digital humanities. • The TEI dictionary module needs to be further constrained: • to enhance interoperability • to reduce alternate constructs • to achieve a high degree of compliance with LMF (ISO 24613) • Easy to impose in the creation of digitally born dictionaries.

  15. Dictionary of Tunis ArabicBasic schema <TEI> <teiHeader> ... </teiHeader> <text> <body> <divtype="entries"> <entry>...</entry> <entry>...</entry> <entry>...</entry> ... ... ... </div> </body> </text> </TEI>

  16. Dictionary of Tunis ArabicBasic schema <body> <divtype="entries"> <entry>...</entry> <entry>...</entry> <entry>...</entry> ... ... ... </div> <divtype="examples"> <cittype="example">...</cit> <cittype="example">...</cit> <cittype="example">...</cit> ... ... ... </div> </body>

  17. Dictionary of Tunis ArabicBasic schema <entryid="ktaab_001"> <formtype="lemma"> <orthlang="ar-aeb-x-tunis-vicav">ktāb</orth></form> <formtype="inflected"ana="#n_pl"> <orthlang="ar-aeb-x-tunis-vicav">ktub</orth></form> <gramGrp> <gramtype="pos">noun</gram> <gramtype="root"lang="ar-aeb-x-tunis-vicav">ktb</gram> </gramGrp> <sense> <cittype="translation"lang="en"> <quote>book</quote></cit> <cittype="translation"lang="de"> <quote>Buch</quote></cit> <cittype="translation"lang="fr"> <quote>livre</quote></cit> </sense> </entry>

  18. Dictionary of Tunis ArabicRepresenting diachrony … <bibl> <author>Ritt-Benmimoun</author> <date>2014</date> </bibl> … <bibl> <author>Singer</author> <date>1958</date> <biblScopeunit="page">56</biblScope> </bibl> …

  19. Dictionary of Tunis ArabicTools VienneseLexicographic Editor (VLE) XML editor providing functionalities typically needed in compiling lexicographic data Web-based standalone application Designed to process standard-based lexicographic and terminological data such as LMF, TBX, RDF or TEI. Automating procedures Freely configurable visualisation (via XSLT) Validation: MSXML Schema Client-server architecture (php + mysql) Freely available and easy to setup

  20. Dictionary of Tunis ArabicTools

  21. Dictionary of Tunis ArabicTools Corpus – Dictionary interface

  22. Dictionary of Tunis ArabicTools corpus_shell ... a modular framework of reusable software components to access and publish heterogeneous and distributed language resources such as language corpora, dictionaries, encyclopaedic databases, prosopographic databases, bibliographies, metadata, and schemata. Language Resources Portal clarin.oeaw.ac.at/ccv/corpus_shell. clarin.oeaw.ac.at/ccv/

  23. Dictionary of Tunis ArabicStatus and outlook CLARIN-ERIC (Common Language Resources and Technology Infrastructure). Open access and open source. ~5000 entries

  24. Thank you for your attention! ! شكراً لانتباهكم Karlheinz Mörth1, Stephan Procházka2, Ines Dallaji2 1Instituteof Corpus Linguistics and Text Technology (AustrianAcademyofSciences) 2DepartmentofOrientalStudies (Universityof Vienna) karlheinz.moerth@oeaw.ac.at stephan.prochazka@univie.ac.at ines.dallaji@univie.ac.at

More Related