1 / 27

Дигитализация топонимических данных в Институте эстонского языка

Дигитализация топонимических данных в Институте эстонского языка. Peeter Päll 9th Baltic Division meeting J ūrmala, October 2005. Overview. Existing collections in Estonia Two types of collections (systematic databases; archived materials) Digitization options ETOP, the digital archive

evadne
Télécharger la présentation

Дигитализация топонимических данных в Институте эстонского языка

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Дигитализация топонимических данных в Институте эстонского языка Peeter Päll 9th Baltic Division meeting Jūrmala, October 2005

  2. Overview • Existing collections in Estonia • Two types of collections (systematic databases; archived materials) • Digitization options • ETOP, the digital archive • Data structure, encoding problems • Named features identification • Perspectives

  3. History of names collections • Various initiatives in the 19th century (1888 M. J. Eisen, 1895 J. Jung, 1901 F. Kuhlbars) • Systematic collection since 1920’s by the Mother Tongue Society • Collections by scholarship, field expeditions by researchers and students (1930’s–1990’s)

  4. Current collections • Place Names Archive at the Institute of Estonian Language – over 500,000 cards (integrates also collections of the Mother Tongue Society) • Endel Varep’s archive at EKI (ca. 95,000 A5 pages) • M. J. Eisen’s collections at the National Museum (dispersed; copy in Helsinki in 3 vol.) • Collections at various museums and institutes • Main Estonian-Swedish collections at Uppsala

  5. Place Names Archive at EKI • Collections since 1920’s, arranged by parishes; general alphabetic index • Data on cards: • name in standard spelling and pronunciation • locative case forms (external or internal) • references to other connected names • short description of the named feature (maps usually not included) • background information, examples of name usage • village, parish, informant, collector

  6. Place name card (1930’s)

  7. Alphabetic index card

  8. Place name card (1960’s)

  9. Place name “maps”

  10. Problems • Parish collections are uneven by the amount of cards and their quality • Usually no map references are given • Using the collection for research requires a lot of time

  11. Digital resources • Place Names Database at EKI (KNAB): 90,000 records (244,000 names) • Toponymic Database for historic Võrumaa by the Institute of Võru: ca. 10,000 records • National Place Names Register: test version only • Databases of private publishers (e.g. Regio road map index contains ca. 11,000 names)

  12. KNAB (www.eki.ee/knab/knab.htm)

  13. Võrumaa (www.ekk.ee/avka/)

  14. Two types of collections • 1. Archives • unsystematic • uneven • excessive data • 2. Systematic databases • processed data, based on various sources • standard quality criteria • structured data, even coverage

  15. Digitization of names archives • Aims: • to protect against destruction of data • to give better access to data • to enable further processing • Models: • Finland (partial type-in of card information) • Sweden (scanning + digital headwords) • Norway (type-in and scanning)

  16. ETOP, the digital archive of EKI • Data format: SGML-text • Character format: Unicode-enabled • Uniformity with KNAB, the systematic place names database • Recognition of international standards on digital gazetteers (Alexandria Digital Library, ADL Gazetteer Protocol) • Availability through the Internet (both for entering and accessing the data)

  17. Data fields • <nimi>name [label] {comment}</nimi> • <var>name variants [] {}; …</var> (includes pronunciation in standardized transcription + the original notation) • <koh>use of locative forms</koh> • <vrd>references to other names</vrd> • <sel>description of the feature</sel> • <lkood>feature type</lkood> (new information) • <txt>background, examples, text</txt> • <all>name source (informant)</all> • <khk>parish</khk> • <lähik>village, collection point</lähik> • <aut>collecting person, owner of collection, typist</aut>

  18. Principles of digitization • inclusion of all the content of the original • maximum preservation of original structure • all additions and corrections are clearly marked • no interpretations, but comments are allowed

  19. Phases of digitization • All new cards will be processed first, these are not copied for alphabetic index file any more • Other materials (not in a card format) that can be integrated, obtained from other sources • New contributions may be provided through web digitally • Entering the data of main collections starts from sample parishes • After the main collection is entered, comparison with the alphabetic index (and subsequently this may be discarded)

  20. Perspectives • Estimated time-length of digitization: 10–20 years • Possible additions to data: geocoding • Possible further processing: linking records of identical features • Research continues..

More Related