160 likes | 251 Vues
Learn about data management techniques like creating a catalog, metadocumentation, data versions, transfers, and sharing. Understand metadata types and character encoding to ensure proper organization and preservation of language-related materials.
E N D
Data management(part 2) LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London
Also (for Part 2) • creating a catalogue/inventory/index • metadocumentation • data/file versions • transferring data • sharing data • backup • character encoding
Different types of metadata • there are many types of metadata • different types of materials may have different metadata • eg metadata for photos and videos may have technical parameters, lists of people appearing • e.g. metadata for transcriptions may have date, version, who transcribed, notes on progress
Your collection catalogue • first, define your collection/corpus/project as some coherent (logical) set of materials • your collection catalogue/inventory/index is a type of metadata • this should list and describe all files in your collection • it usually contains the categories of information that are relevant for many files
Your collection catalogue • you could have one large catalogue that covers every file, or • you could have a catalogue that is subdivided according to types of files, and/or groups of resources • there is no “one size fits all” solution!
Making an “active” catalogue • this is not necessary, but may be useful • if you use a spreadsheet, you can embed links to actual files to make using your collection easier
Metadocumentation • you should keep an updated description of the methods, conventions, abbreviations you use • .. so somebody could fully understand (and use) your data and methods in your absence • example
Data/file versions • need to distinguish or keep versions depends on purposes • by suffixing filename, eg • fugu1.txtfugu2.txt, or • fugu_1.txtfugu_2.txt • which of the above methods is better?
Data/file versions • fugu_14022013.txtfugu_20130214.txt14022013_fugu.txt20130214_fugu.txt • which of the above methods would be best? • note: do not rely on system dates!
Data/file versions • do you need to keep every version? • often, fine to keep “original” plus current • if information is regularly updated, corrected you can keep 1 filename and put dates in the document itself, or record dates in a catalogue/metadata file • a series of files may have inherent value, e.g. your transcriptions/annotations, as your understanding and analysis changes, so • date and keep files • use different tiers in ELAN?
Transferring data • ensure your computer is not a “walled garden” • you can use • drives/devices (but avoid DVDs!!) • email • upload (where available) • send links • “cloud” e.g. Dropbox • issues include cost, potential viruses, assuring integrity of copies, but generally little problem
Sharing • can we work in a shared, collaborative space? • Dropbox • Google Docs • blogs, Tumblr etc can have shared “authors”, and contributors with controlled roles
Character encoding • if your document contains anything other than those on a US keyboard, use UTF character encoding • how can I tell if characters in my MS Word document are encoded as UTF8? • save as plain text and check options • copy into plain text editor such as Notepad++
Character encoding • useful tools • Notepad++ http://notepad-plus-plus.org/ • SIL ViewGlyph http://scripts.sil.org/cms/scripts/page.php?item_id=ViewGlyph_home • BabelMap http://www.babelstone.co.uk/software/babelmap.html • ExSite9 http://www.intersect.org.au/exsite9
Your projects • discuss in groups • what are the problems or weaknesses in our “data management plan” or data management methods?