1 / 31

Tools for Language Documentation

Week 1: Overview. Tools for Language Documentation. Claire Bowern Yale University LSA Summer Institute: 2013. Overview, Goals of Class. Tools for documentation. Physical tools: Hardware Software Stimuli Conceptual tools: What makes a good documentary corpus Procedural tools:

tex
Télécharger la présentation

Tools for Language Documentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Week 1: Overview Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013

  2. Overview, Goals of Class

  3. Tools for documentation • Physical tools: • Hardware • Software • Stimuli • Conceptual tools: • What makes a good documentary corpus • Procedural tools: • How to go about documenting a language • Tools for disseminating results

  4. Overview • Week 1: overview, hardware, software • Week 2: elicitation techniques, grammar writing • Week 3: narratives, conversation, corpus building • Week 4: lexicon, archiving

  5. About the class • “How to describe/document a language” • *No practical component* (in that we won’t be working with speakers) • However, there will be time (I hope!) to talk about your own field data • And we will be doing some exercises with existing data • I will provide datasets for exercises (if you don’t have data of your own to use) • You can also use data from the field methods class here at the Institute.

  6. A few assumptions for this class • Not talking about community-oriented materials here (I see documentary materials as feeding into that though) • Assuming that the language doesn’t have a lot of other materials apart from what the linguist will be producing • Assuming that the linguist will be the one doing most of the writing. • Implicitly assuming a grammar/dictionary/texts model (more on this below). • None of these assumptions are crucial, they’re just there so we can limit the topic a bit.

  7. Principles of documentation

  8. What is language documentation? • Documentary Linguistics as its own subfield. • Doing things with linguistic data: • Getting the data • Preserving it • Processing it • (Analyzing it) • Cf Woodbury (2002): Language documentation is the creation, annotation, preservation, and dissemination of transparent records of a language. • Important for both theoretical and empirical branches of linguistics: • typology, historical linguistics, etc

  9. What shapes the language record? • The linguist (i.e. you!) • Their interests • Their abilities • The speakers and their interests! • External circumstances • funding • time available • lucky breaks • unlucky breaks

  10. Language Documentation as a Language Legacy • Particularly relevant for endangered languages. • Your work might be the only substantive record of a language: • few speakers • field might view the language as “done” • speakers might view the language as “done”

  11. Planned Documentation vs “Collect it all” • “making a record of the language” : ‘comprehensive grammar’ • You can’t collect everything. • All documentation is sampling. • Unstructured, unanalyzed corpora usually aren’t very useful • They are hard to use; • They don’t get worked on; • They usually aren’t big enough to test hypotheses computationally; • They require native speakers (or people who are already very familiar with the language) -> fine for languages with a major presence, but what about the quarter of the world’s languages with fewer than 10,000 speakers?

  12. What counts as documentation? • When is a collection big enough to count as language documentation? • Is an article in Linguistic Inquiry language documentation? • creation • annotation • preservation • dissemination • but only a very small fragment of a language.

  13. How much time/space does a documentary corpus take? • Depends on the resources: • Time • Speakers • Money • Levels of Interest

  14. Grammar, Dictionary, Texts • “The Boasian Trilogy” • Structure, Lexicon, Culture • Way to present the analysis and also allow others to recreate it (or challenge it) from the underlying data. • Conceived broadly: • Capture language structure • Capture language in use • Capture lexicon and meaning

  15. Sampling: Documentation as snapshots • A big part of documentation is constructing a good set of “samples”. • To do that, you will need to consider what the purpose of the documentary record is. That is, why are you collecting data on the language? • “to make a lasting record of the language” • “to reclaim the language to future speakers” • “to write a reference grammar” • “to document the culture in the traditional language” • “to investigate a particular aspect of the language” • all of the above… • …

  16. Sampling • Are your “snapshots” representative? • Speakers • Subjects/Topics • Grammatical constructions • Lexicon • …

  17. Planned versus opportunistic collection • Planned: • translated sentences. • grammaticality judgments • etc. • Unplanned (or planning gone wrong): • Speakers reinterpret your prompts and construct narratives from them. • New speaker comes to a session and wants to tell stories. • You find a new (to you) morpheme in your data and want to find out how it works. • You overhear a new construction in conversation.

  18. What constitutes a documentary corpus? • ***Everything*** • sound files • videos • transcripts • (elicitation prompts – part of the annotation) • photographs • maps • (artifacts) • metadata (data about the data) • metametadata • …

  19. Workflow and data types

  20. Workflow: • What do you need to do to document a language? • What order do you need to do it in? • (How will you know if it’s been done right?)

  21. Scaled workflow • Project as a whole (timescale of years) • e.g. “Bardi language documentation” • Immediate tasks (timescale of weeks or months) • e.g. “Bardi learners guide” • Subtasks (timescale of days or weeks) • e.g. “write the section on numbers” • Data gathering (timescale of single session) • e.g. “get data on numerals in use”

  22. Workflow while on fieldwork

  23. Hardware

  24. Sample field kit: • Equipment: • Laptop • Audio recorder • Video recorder • + microphones • + backup means of recording (e.g. from laptop, second recorder) • Media: • backup devices [hard drive, DVDs, etc] • memory cards for recorders • paper! pens! • Other • ways of keeping the equipment clean • carry bag • stills camera (cell phone, ipad, etc) • batteries, other power equipment • tripod • Stimuli/research prompts

  25. Audio • The field has converged on solid state recorders using SD cards • Handy Zoom H2 or H4 (or H6 coming soon!) • Edirol R-09 • Marantz PMD 660 or 670 • And/or laptops • (or laptop plus external sound card/preprocessor) • small/portable • AA batteries • high quality, lossless formats • easy to use • easy to transfer data

  26. Not recommended: • Dictaphones • Cassette recorders • DAT

  27. Video • Less consensus on models • Major component of the documentation or side-project? • Options: • smart phone • ipad • stills camera with video function • dedicated video camera • SD card • mic jack • Problems: • mpeg vs other proprietary video formats • large files • memory-intensive

  28. Microphones • headset vs lapel vs meeting microphone • dynamic vs cardioid • wired vs wireless • SLR vs 1/8” jack • The built-in mics in the Edirol, Handy, etc, are also ok • You get what you pay for, approximately. • Remember that microphone placement and volume monitoring is much more important than the quality of the microphone (far more recordings are ruined through the former than the latter).

  29. Computer • Laptop • Lots of memory • Lots of hard drive space • Usually don’t need ruggedization features • Get cheapest possible and assume it won’t last for more than a season, or try for a higher end model • Special considerations for high altitude, high humidity, or low temperature work. • High altitude: hard drives fail: use solid state • High humidity: condensation issues • Low temperatures: battery issues (See Lanz 2010)

  30. Tablets? • Most language software won’t run on ipads or other tablets. • Great for stimuli, backup recorder, camera, etc. • Too much data

  31. Sample field kit: • Equipment: • Laptop • Audio recorder • Video recorder • + microphones • + backup means of recording (e.g. from laptop, second recorder) • Media: • backup devices [hard drive, DVDs, etc] • memory cards for recorders • paper! pens! • Other • ways of keeping the equipment clean • carry bag • stills camera (cell phone, ipad, etc) • batteries, other power equipment • tripod • Stimuli/research prompts

More Related