1 / 51

From CHILDES to TalkBank

From CHILDES to TalkBank. An International Database of Communicative Interaction. TalkBank. Brian MacWhinney Carnegie Mellon University, Psychology Child Language Data Exchange System CHILDES Steven Bird, Mark Liberman University of Pennsylvania, Linguistics Linguistic Data Consortium, LDC

Antony
Télécharger la présentation

From CHILDES to TalkBank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From CHILDES to TalkBank An International Database of Communicative Interaction

  2. TalkBank • Brian MacWhinney • Carnegie Mellon University, Psychology • Child Language Data Exchange System CHILDES • Steven Bird, Mark Liberman • University of Pennsylvania, Linguistics • Linguistic Data Consortium, LDC • Howard Wactlar • Carnegie Mellon University, Computer Science • Informedia Project

  3. Basic Premise of TalkBank • Human Communication is a unified fact, • but it is studied by 8 disciplines and up to 40 subdisciplines. • Analysis is important, but so is synthesis. • We can put the puzzle back together by focusing all the disciplines on the data.

  4. Some Examples • “My Theory” • Bettino Craxi • Nixon’s Watergate Tapes • MacWhinney’s Lectures • Ross and Mark • Graphics lesson • Bilingual Classroom

  5. My Theory: An Example Special Issue of Discourse Processes edited by Tim Koschmann with articles from • Rogers Hall • Jay Lemke • Annemarie Palincsar • Carl Frederiksen • Commentary by • Judith Green & Marleen McClelland • Jeremy Roschelle

  6. TalkBank Areas • Classroom Discourse - CMU Dec 99 • Conversation Analysis - Odense Oct • Text and Discourse - Santa Barbara July • Child Language Disorders - Madison 2002 • Language and Gesture - CMU October • Child Language Learning - Madison Aug 2002 • Animal Communication - Penn May 2000

  7. More areas …. • Field Linguistics - LSA Dec 99, Penn Dec 2000 • Aphasia • Corpus Linguistics • Signed Language • Second Language Learning • Anthropological Linguistics • Cross-cultural studies

  8. More areas ... • Multilingualism, code-switching - LIDES • Mother-infant interaction • Psychiatry • Conflict Resolution • Management Styles • Small-group Interaction - soon • Human-computer Interaction

  9. More areas ... • Speech Technology - ongoing • Virtual Reality • Guided Robots, Social Robots

  10. Why data-sharing is important • Increasing the size and reliability of the empirical basis • Opening science to the community, practitioners, and students • Opening science to collaborative commentary • Creating transparency across disciplines

  11. Key Features of TalkBank • Multimodal digitized data • Internet access • Defense of confidentiality • Codon: transcription, coding, viewing, and analysis • XML standard for underlying representation • Alliance of databases from many fields

  12. Why TalkBank can be built now • The Internet • Fast computers. big disks, cheap storage • Good audio and video digitization • Advances in web-based database design • Emergence of annotation standards • Maturation of the social sciences

  13. CHILDES: APrototype • Brian MacWhinney - CMU • Leonid Spektor - CMU • Catherine Snow - Harvard • 2000 Members • 400 Active contributors

  14. 1850-1950 Darwin and Diaries • Darwin, Stern, Ament • Emotion, gesture, language, the soul • Card files and shoe boxes

  15. 1950-1984 Tapes • Nagras and TEAC, VHS and Beta • Dittos, mimeo, notes in the margins • Good “raw” data, unclear transcription

  16. 1984 - 1994 PCsCHILDES Concord Massachusetts 1984

  17. 1994 -2001 childes.psy.cmu.edu

  18. 2000 - ? TalkBank

  19. Universals • Are there basic patterns to babbling? • Are early word orders universal? • Does UG give children a universal set of functional categories? • Is the vocabulary spurt universal? The answer requires LOTS of data

  20. Particulars • Do children have individual styles? • Gestalt vs. Analytic • Enactive (1S) vs. Depictive (3S) • Do children respond differentially to parental recasts? • Do children vary in their match to cue validity? Again, we need LOTS of data

  21. Comparisons • How should we match SLI children to normal controls -- MLU? Morphology, TTR • How should we compare language socialization processes across social classes? Between cultures? • How should we compare the course of development across languages? The case of Romance.

  22. Three Components • CHAT -- Transcription System • CLAN -- Programs • Database

  23. CHAT Format @Begin @Participants: CHI Target_Child Sid, MOT Mother *MOT: you want them to go in there? *CHI: yeah. [+ Q] *CHI: yeah. [+ SR] *MOT: okay. *CHI: okay. [+ I] *CHI: look at this. %act: CHI picks up piece of paper @End

  24. CLAN Programs

  25. String Search • Freq • KWAL • Combo • Gem • GemFreq, GemList

  26. Indexes • MLU • MLT • WdLen, MaxWd • VOCD • DSS • IPSyn (in progress)

  27. Profiles • Chains • Cooccur • Dist • CHIP • KeyMap • TimeDur

  28. Phonology • MakeMod • ModRep • PhonFreq • UniCode • Inventory (in progress, LIPP, CompProf) • Process Analysis (in progress)

  29. Utilities • Dates • Rely • Lines • SaltIn • Check

  30. The Database • English - 25 corpora • Non-English - 18 languages • Clinical - 14 corpora, aphasia, SLI, Down, autism, Williams, and other groups • Narrative - Frog stories, Red Balloon • Childhood Bilingualism • Adult Second Language Learning

  31. Morphology • MOR • Post, PostTrain -- Christophe Parisse • Parse -- Kenji Sagae • --> revised DSS, LARSP, IPSyn • MinMor for 14 language • MaxMor for English, Spanish, Italian, Hungarian, Dutch, German

  32. New Technologies • Sonic CHAT • Bullets • QuickTime Movies • Sound editor by wave • Movie editor by dragging • Fast mode editing • Web streaming of audio and video

  33. Sample Topics • Past tense debate • Functional categories, tenseless verbs • Verb frame generalization • Fine-tuning of the input • Theory of mind • Lexical range and communicative context • MLU and vocabulary growth in disorders

  34. Research based on CHILDES • Over 1200 published studies • Syntax • Morphology • Discourse • Lexicon • Narrative, Literacy • Language Impairments • Phonology

  35. Allied Efforts • JCHAT, Chinese, Korean • Dutch, Nordic, Celtic • Romance (Italian, Spanish, Portuguese) • Slavic (Krakow, Vienna) • Bilingualism -- Catalan, Basque • Frogs, Disorders, Code-switching • Classroom discourse

  36. CHILDES/BIB On-Line

  37. Format Babel Alembic Annotator Archivage CA CHAT COCOSDA CSAE CSLU DAISY DAMSL Delta DRI EAGLES Emu Festival FSA’s GATE HIAT Hyperlex Intex ISIP LDC MATE MICASE MPEG MPI Multitext Observer Partitur Praat SABLE SAMPA SGREP SignSTream SIL SLAM SMDL SNACK StandOff SUSANN TalkBank TEI Tipster Transcriber TreeBank TSNLP Unicode UTF

  38. Video ToolsMedia Tagger, CLAN, Digital Lava, Informedia ….

  39. The Script

  40. syncWRITER

  41. SignStream

  42. 41

  43. Audio on the Web

  44. Anthropology on the Web Chagnon’s Yanamamo

  45. Touch and Click for Audio

  46. Pawnee Lexicon

  47. Lexicon -> Cultural Encyclopedia

  48. Cornell Bioacoustics Laboratory

  49. Confidentiality Levels 1 - fully public 2 - copying block 3 - transcripts public, audio/video protected 4 - non-disclosure 5 - non-disclosure, no copying 6 - data-viewing with approval 7 - data-viewing under direct supervision 8 - archived only

More Related