1 / 36

IX Language and Computer

IX Language and Computer. Contents. 10.0 Introduction 10.1 Computer-assisted language learning 10.2 Machine translation 10.3 Corpus linguistics 10.4 Computer mediated communication. 10.0 Introduction: Computational linguistics.

meadow
Télécharger la présentation

IX Language and Computer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IX Language and Computer

  2. Contents 10.0 Introduction 10.1 Computer-assisted language learning 10.2 Machine translation 10.3 Corpus linguistics 10.4 Computer mediated communication

  3. 10.0 Introduction: Computational linguistics • A branch of applied linguistics, dealing with computer processing of human language (Johnson & Johnson 1999) • 1. The analysis of language data so as to establish the order in which learners acquire various grammatical rules or the frequency of occurrence of some particular item • 2. Electronic production of artificial speech and the automatic recognition of human speech. • 3.Research on automatic translation between natural languages • 4. Text processing and communication between people and computer.

  4. 10.1 computer assisted language learning

  5. 10.1.1 CAI / CAL vs. CALL • CAI—computer-assisted instruction(计算机辅助教学): the use of computer in a teaching program. • 1.A teaching program which is presented by a computer in a sequence. Students---responses—computer—correct or not. • 2.The use of computer to monitor student’s progress, offer directions to students.

  6. CAL—computer-assisted learning (计算机辅助学习): emphasizing the use of computer in both teaching and learning I order to help learners to achieve educational objectives through their own reasoning and practice, a ref;lection of newly advocated autonomous learning. • 1. Leading students through a learning task step by step, checking comprehension and further practice and materials. • 2. Interaction through the exploration of a subject or problem

  7. CALL—computer-assisted language learning (计算机辅助语言学习) • It refers to the use of computer in the teaching or learning of a second or foreign language. • 1. Activities which parallel learning through other media but which use the facilities of computer. • Activities which are extensions or adaptions of print-based or classroom based activities. • Activities which are unique to CALL.

  8. 10.1.2 Phases of CALL development • 1. Large mainframe machines in institution, conventional traditional grammatical explanation, audio-lingualism; with a terminal • 2. Small computers, taps or floppy disks, portable, eclectic, pragmatic and student-oriented • 3. Cognitive problem solving techniques and interactions among students in a group: computer as a trigger • 4. Word-processing enables students to compose and carry out their own writing, spoken and moving video available

  9. 10.1.3 Technology • Customizing, template, and authoring program ---Teachers use the program to design their own lessons which fit their own purposes. • Computer networks ---Local area network: More interaction between teachers and students • Compact disk technology • Digitized sound • USB (universal serial bus)

  10. 10.2 Machine translation • The use of machine to translate texts from one natural lg to another. • Unassisted MT, which takes pieces of text and translate them into output for immediate use with no human involvement. • Assisted MT, where a human translator clean up after, and sometimes before, translation in order to get better quality results. • Philosophical, religious concern; • Political concern • Economical concern

  11. 10.2.1 History of Development • 1. The independent work by MT researchers --early 1950sLimitation: hardware, memory, low access to storage, programming lg, assistance from linguistics. --Crude dictionary-based approach, statistical methods. Low quality, thus human involvement --Both pre-editing and post-editing were required

  12. 2.Towards good quality output • Improved hardware, first programming lg, development in syntactic analysis • Around 1960, good quality is achievable. • Assumption: the goal of MT must be the development of fully automatic systems producing high quality translations and the use of human assistance was regarded as interim arrangement, and post-editing would be less and less. • Emphasis of research was on the search for theories and methods for the achievement of “perfect” translation. • Bar-Hillel: critical of Fully Automatic High Quality Translation, proposed “man-machine symbiosis”

  13. The development of translation tools • Since the 1970s, development continued in three main strands: • 1. Computer-based tools for translators --1960s, real-time interactive computer environment;1970s, word processing;1980s, microcomputer with networking and large storage capacity --dictionaries and terminological databanks, multilingual word processing, management of glossaries and terminology resources, input and output communication • 2. Operational MT systems involving human assistance in various ways • 3.“pure” theoretical research towards the improvement of MT methods

  14. 10. 2.2 Research methods • 1.Linguistic approach --A test-bed for any kinds of linguistic theories which attempt to account for language or grammatical rules • 2. The transfer approach • 3.The interlingual approach • An interlingua between any languages. • 4.The knowledge-based approach --Linguistic knowledge independent of context—semantic features --Linguistic knowledge that relates to context, pragmatic knowledge. --Common sense / real world knowledge (non-linguistic knowledge)

  15. 10.2.3 MT quality: • still poor

  16. 10.2.4 MT and the Internet: • --an accelerating growth of real-time on-line translation on the Internet itself. --Internet with further profound impact on MT: stand-alone PC replaced by Network computers. --Fewer “pure” MT systems but much more computer-based tools and applications where automatic translation is just one component.

  17. 10.2.5 Speech translation: • small-domain natural lg. application.

  18. 10.2.6 MT and human translation • They can and will co-exist in relative harmony. • MT:large scale/rapid translation, repetitive document,cost less, quality of out put is less important • Human translator: non-repetitive linguistically sophisticated texts, one-off texts in specific highly-specialized technical subjects, one-to-one interchange of information, spoken language translation

  19. 10.3 Corpus linguistics

  20. 10.3.1 Definition • Corpus (corpora) : a collection of linguistic data, compiled as written texts or as a transcription of recorded speech. The main purpose of a corpus is to verify a hypothesis about lg---for example, to determine how the usage of a particular sound, word, or syntactic construction varies. • Corpus linguistics deals with the principles and practices of using corpora in lg study. A computer corpus is a large body of machine-readable texts. • --Crystal, David. 1992:85. AN Encyclopedic Dictionary of Language and Languages

  21. Another definition • CORPUS (corpora) (1) a collection of texts, esp. if complete and self-command; the corpus of Anglo-Saxon verses. (2) plural also corpuses. In linguistics and lexicography, a body of texts, utterances or other specimens considered more or less representative of a language, and usu. Stored as an electronic database. • Corpus linguistics studies data in any such corpus.

  22. 10.3.2 Criticism and revival of corpus linguistics • Chomsky: empiricism vs. rationalism --invalidated corpus as a source of evidence in linguistic enquiry. --the description of rules in a language. --emphasis on competence rather than performance --practicability --ungrammatical sentences vs. new sentences

  23. Revival of corpus linguistics • Quirk (191) Survey of English Usage (SEU) • Jan Svartvik (1975) London-Lund corpus (SEU and the Brown corpus) • Jan Svartvik: computerized the SEU

  24. 10.3.3 Concordance (共现检索) • Definition: The way of sorting data, for example, alphabetically of words occurring in the immediate context of the word. --Search for a particular word and retrieve all the examples of it. --This is the tool more often implemented in corpus linguistics to examine corpora. • Usage: comparing different usage of the same word. --Analyzing word frequencies --Finding and analyzing phrases and idioms --Creating indexes and word lists

  25. 10.3.4 Text encoding and annotation • Annotated corpora refer to those corpora which have been enhanced with various types of linguistics information. --The implicit linguistic information has been made explicit through the process of concrete annotation. --Claire_NP1 collects _VVZ shoes_NN2.

  26. Leech (1993): seven maxims in annotation of corpora • 1. Possible to remove • 2. Possible to extract the annotation b itself from the text • 3. Guidelines for the end-user • 4. How/who carried out the annotation • 5. Not infallible but potentially useful tool • 6. Based n agreed and theory-neutral principles • 7. No a priori standard

  27. 10.3.5 Roles of corpus data • Speech research --A wide selection of variables: gender, age, class, etc. generalization --Variation within a spoken lg. --A sample of naturalistic speech --Large scale of quantitative study • Lexcial studies --Dictionaries --Definitions --Word combinations, co-occurring words

  28. Semantics --Objective approach of study of semantics: semantic distinction is context-related, and make it possible to examine the context --Fuzziness and absoluteness: gradable • Sociolinguistics: natural quantitative data • Psycholinguistics:

  29. 10.4 Computer mediated communication (计算机介入的信息交流) • With a focus on lg and lg use in computer networked environment and by its use of methods of discourse analysis to address that focus. • It takes a variety of forms whose linguistic properties vary depending on the kind of messaging system used and the social and cultural context embedding particular instances of use. • Mails and news

  30. PowerPoint: an application which enables one to create slide shows on his or her computer screen. It is a presentation authoring software creating graphical presentations with or without audio. • PowerPoint as a tool can be used to write outlines or create presentation visuals on the slides. • PowerPoint as a text has been broadly understood as the product created visually, graphically, acoustically, or audio-visually. • PowerPoint as a genre refers to a recurring tpe of activities just like a letter, a note, etc.

  31. Blog: mid-1990s • A weblog, or blog for short, is defined by Dan Gilmore as “ an online journal comprised of links and postings in reverse chronological order, meaning the most recent posting appears at the top of the page”. • Features of blogs. 1. Post-centric 2. Arranged in chronological order 3. Serial and cumulative, opened-ended 4. Brief and independent narratives, some fictional, some frame of the narratives 5. Great variety in quality, content and ambition 6. Free-access 7. Style is personal and informal 8. Genuine human passion

  32. Chatroom --A chat room is an online forum where people can chat online. • Emoticons (表情符号) or smileys(笑眯眯) --Less punctuation and acronyms U, 4 (for), r (are), brb (be right back) --Short sentences, informal expressions  : ) :-)  : ( : < : - > :c

  33. Summary • CAI-CAL- CALL • MT • Corpus linguistcs • Concordance • CMC

More Related