1 / 64

Text to speech to text: a third orality?

Text to speech to text: a third orality?. Lawrie Hunter Kochi University of Technology http://www.core.kochi-tech.ac.jp/hunter. Current state: Fragmentation of knowledge as a result of the ongoing creation of research niches A voracious, yet protective and covetous

garrison
Télécharger la présentation

Text to speech to text: a third orality?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text to speech to text: a third orality? Lawrie Hunter Kochi University of Technology http://www.core.kochi-tech.ac.jp/hunter

  2. Current state: Fragmentation of knowledge as a result of the ongoing creation of research niches A voracious, yet protective and covetous knowledge industry

  3. Current state: Isn’t CALL just a subset of User Experience (UX?)

  4. Hunter (2006)*: Learners are evolving URGENT: Just-in-time learner sociologyURGENT: Near-instant learner profilingUpgrade: Learner => USERUser Experience (UX) practiceUZANTO’s MindCanvas: -user profiling for a large target group in a matter of hours RUMM: rapid user mental modelling GEMS: game emulationThis may be very fruitfully adapted to the foundation explorationsleading to CALL decision-making. The expanding palette: Emergent CALL paradigms (Invited virtual presentation) Antwerp CALL 2006 http://www.core.kochi-tech.ac.jp/hunter/professional/CALLparadigms/index.html

  5. Now text-to-speech and speech-to-text (T2S2T) software have become truly usable in a very practical sense. This blurs the line between speech and text in a very immediate way. http://www.nextuptech.com/ http://www.nuance.com/naturallyspeaking/

  6. UsableT2S2T No more typing. No more reading. No more hands. Composition by speaking...ooh! Information acquisition by listening...ahh! If we do this, we will be in a new orality.

  7. What?? Audio is lame: VIDEO is the game. We are in the youtube era. Get a second life!

  8. T2S will be fully usable in 2 (or x) years; we must assume the future and shift our place of work there.

  9. QUESTION: For second language learning systems development, is audio going out?

  10. TODAY: A search for principles governing the use of voice in CALL

  11. Investigation of voice and cognition

  12. Walter Ong, 1982 Orality and Literacy: The Technologizing of the Word PRIMARY ORAL cultures (no system of writing) think differently from CHIROGRAPHIC cultures

  13. Walter Ong, 1982 Orality and Literacy: The Technologizing of the Word: “Electronic media (e.g. telephone, radio and television) brought about a second orality” [paraphrase] “Both primary and secondary oralities afford a strong sense of membership in a group.” [paraphrase]

  14. Walter Ong, 1982 Orality and Literacy: The Technologizing of the Word: “Electronic media (e.g. telephone, radio and television) brought about a second orality” [paraphrase] “Both primary and secondary oralities afford a strong sense of membership in a group.” [paraphrase] BUT Secondary orality is "essentially a more deliberate and self-conscious orality, based permanently on the use of writing and print," and produces much larger groups.

  15. Kathleen Welch* rejects claims that Ong posits mutually exclusive, competitive, reductive orality-literacy divide. Welch argues that Ong emphasizes -a mingling of these types of consciousness -tenacity of established forms as new ones appear Welch, K. (1999) Electric Rhetoric: Classical Rhetoric, Oralism, and a New Literacy. MIT Press. p. 59

  16. Welch argues that TV's ubiquity has resulted in a new, electronic literacy. We shall not go there today.

  17. Workable T2S2T promises to change the nature of cognitive load constraints in text production/decoding, and hence in language learning task.

  18. Workable T2S2T There is now S2T (Dragon Voice) for Indian English*, British English... but not for Japanese English yet. (Ever?) * http://labnol.blogspot.com/2007/01/dragon-naturallyspeaking-9-speech.html

  19. Workable T2S2T There is now S2T (Dragon Voice) for Indian English*, British English... but not for Japanese English yet. (Ever?) * http://labnol.blogspot.com/2007/01/dragon-naturallyspeaking-9-speech.html So the tech is there for computers to decode human speech better than humans can...?

  20. HOWEVER we don’t know much about how orality works. Perhaps that is because orality is so ingrained in us.

  21. Walter Ong, 1982 Orality and Literacy: The Technologizing of the Word Secondary orality 163 years The three stages of consciousness Literacy 2800 years Primary orality 200,000 years Telegraphy USA, 1844 Invention of phonetic alphabet in 8th century BCE* *Rhys Carpenter (1933) The antiquity of the Greek alphabet. American journal of archaeology 37: 8-29.

  22. WIRED FOR SPEECH* Orality has been part of human life for a long time. After 200,000 years of evolution: “...humans have become voice-activated, with brains that are wired to equate voices with people and to act quickly on that information.” Nass, C. & S. Brave. (2005) Wired for speech. (2005). MITPress.

  23. Writing: ‘a secondary modelling system’ Lotman, J., trans. R. Vroon (1977) The structure of the artistic text. Michigan Slavic Studies, 7. Writing can never exist without orality. p. 8 Speeches that were studied as rhetoric could only be studied if they were transcribed. Ong, W. (1982) Orality and literacy: The technologizing of the word. 1997 reprint: Routledge.

  24. Writing: ‘a secondary modelling system’ Lotman, J., trans. R. Vroon (1977) The structure of the artistic text. Michigan Slavic Studies, 7. “...to this day no concepts have yet been formed for effectively, let alone gracefully, conceiving of oral art as such without reference, conscious or unconscious, to writing.” p.10 Ong, W. (1982) Orality and literacy: The technologizing of the word. 1997 reprint: Routledge.

  25. Psychodynamics of orality “...you know what you can recall.” Ong, W. (1982) Orality and literacy: The technologizing of the word. 1997 reprint: Routledge.

  26. Psychodynamics of orality Pythagoras and the acousmatics The term acousmatic dates back to Pythagoras, who is believed to have tutored his students from behind a screen so as not to let his presence distract them from the content of his lectures. wikipedia.org May 20, 2007: edited from Chion, M.(1994). "Audio-Vision: Sound on Screen", Columbia University Press.

  27. Psychodynamics of orality Pythagoras and the acousmatics In cinema, acousmatic sound is sound one hears without seeing an originating cause - an invisible sound source. Radio, phonograph and telephone, all which transmit sounds without showing the source cause, are acousmatic media. wikipedia.org May 20, 2007: edited from Chion, M.(1994). "Audio-Vision: Sound on Screen", Columbia University Press.

  28. Psychodynamics of orality Acousmatic is ubiquitous in CALL. Aren’t there situations where acousmatic sound is appropriate? and situations where it is not?

  29. Orality and writing production Kellogg: Sentence Production Demands: Verbal Working Memory “Orthographic as well as phonological representations must be activated for written spelling.” o Bonin, Fayol, & Gombert (1997) “Verbal WM is necessary to maintain representations during grammatical, phonological, and orthographic encoding.” o Levy & Marek (1999) o Chenoweth & Hayes (2001) o Kellogg, Olive, & Piolat (2006) Kellogg, R. (2006) Training writing skills: A cognitive developmental perspective. EARLI SigWriting 2006 Antwerp. http://webhost.ua.ac.be/sigwriting2006/Kellogg_SigWriting2006.pdf

  30. Audio sources in life John Thackara* tells of Ivan Illich’s finding that In the 1930s, 9 out of 10 words a man heard by age 20 were spoken directly to him.In the 1970s, 9 out of 10 words a man heard by age 20 were spoken through a loudspeaker. Illich (1982): “Computers are doing to communication what fences did to pastures and what cars did to streets.” * book:In the Bubble blog: http://www.doorsofperception.com/

  31. We are innately orate Human beings “can quickly distinguish one person’s voice from another.” p. 3 *we know these things from differing heartbeat responses Nass, C. & S. Brave. (2005) Wired for speech. (2005). MITPress.

  32. We are innately orate Human beings “can quickly distinguish one person’s voice from another.” p. 3 -even in the womb we can distinguish our mother’s voice from that of another.* *we know these things from differing heartbeat responses Nass, C. & S. Brave. (2005) Wired for speech. (2005). MITPress.

  33. We are innately orate Human beings “can quickly distinguish one person’s voice from another.” p. 3 -even in the womb we can distinguish our mother’s voice from that of another.* -a few days after birth, newborns prefer their mother’s voice to that of others, and can distinguish one unfamiliar voice from another.* *we know these things from differing heartbeat responses Nass, C. & S. Brave. (2005) Wired for speech. (2005). MITPress.

  34. We are innately orate Human beings “can quickly distinguish one person’s voice from another.” p. 3 -even in the womb we can distinguish our mother’s voice from that of another.* -a few days after birth, newborns prefer their mother’s voice to that of others, and can distinguish one unfamiliar voice from another.* -by 8 months of age we can attend to one voice even when another is speaking at the same time. *we know these things from differing heartbeat responses Nass, C. & S. Brave. (2005) Wired for speech. (2005). MITPress.

  35. Humans: experts at extracting social from speech Word choice carries social information. UX work makes choices such as blaming: 1. “Speak up.” 2. “I’m sorry, I didn’t catch that.” 3. “We seem to have a bad connection. Could you please repeat that?” Nass, C. & S. Brave. (2005) Wired for speech. (2005). MITPress.

  36. Humans: experts at extracting social from speech Word choice carries social information. UX work makes choices such as voice quality: Booming deep voice: “Could I possible ask you if you wouldn’t mind doing a tiny favor?” High-pitched, soft voice: “Pick up that shovel and start digging!” Nass, C. & S. Brave. (2005) Wired for speech. (2005). MITPress.

  37. Humans: automatically react socially to ‘voice’ “...the conscious knowledge that speech can have a non-human origin is not enough for the brain to overcome the historically appropriate activation of social relationships by voice [even when voice quality is low and speech understanding is poor].” Nass, C. & S. Brave. (2005) Wired for speech. (2005). MITPress.

  38. Interiority of sound “...in an oral noetic economy, mnemonic serviceability is sine qua non...” p. 70 In other words, oral information must be arranged in a certain way [a visual way] if it is to be remembered. Ong, W. (1982) Orality and literacy: The technologizing of the word. 1997 reprint: Routledge.

  39. Incorporating interiority The eye cannot perceive interiority, only surfaces. Taste and smell are not much help in registering interiority/exteriority. Touch can detect interiority but in the process damages it. Hearing can register interiority without violating it. Sight isolates, sound incorporates. Ong, W. (1982) Orality and literacy: The technologizing of the word. 1997 reprint: Routledge.

  40. Incorporating interiority

  41. Oral memory In primary oral cultures, need for an aide memoire: -heavily rhythmic speech -balanced patterns -epithetic expressions -formulary expressions -standard thematic settings Ong, W. (1982) Orality and literacy: The technologizing of the word. 1997 reprint: Routledge. p. 33

  42. Oral memory In primary oral cultures, thought and expression are additive rather than subordinate. Ong, W. (1982) Orality and literacy: The technologizing of the word. 1997 reprint: Routledge. p. 37 ff.

  43. Tentative observations based on the exploratory hands-on experience of second language users. Innisfree 1 Innisfree 2 Innisfree 3 Coney Island 1 Coney Island 2 Coney Island 3 PhD technical writing class, KUT, May 24, 2007

  44. Tentative observations based on the exploratory hands-on experience of second language users. PhD technical writing class, KUT, May 24, 2007

  45. Tentative observations based on the exploratory hands-on experience of second language users. PhD technical writing class, KUT, May 24, 2007

  46. Tentative observations based on the exploratory hands-on experience of second language users. Self-reported estimates of comprehension of samples. PhD technical writing class, KUT, May 24, 2007

  47. Tentative observations based on the exploratory hands-on experience of second language users. Self-reported estimates of comprehension of samples. PhD technical writing class, KUT, May 24, 2007

  48. How might language learning support systems be influenced by the new T2S2T technological reality?

  49. Articulation at the phrase level In the learner’s awareness: S2T software foregrounds articulation T2S foregrounds intonation, blending, pausing

  50. Articulation at the phrase level Can S2T be used to improve pronunciation? Mitra, S., Tooley, J., Inamdar, P. and Dixond, P. (2003) Improving English Pronunciation: An Automated Instructional Approach. Information Technologies and International Development Volume 1, Number 1, Fall 2003, 75–84. Massachusetts Institute of Technology. http://www.mitpressjournals.org/doi/abs/10.1162/itid.2003.1.1.75

More Related