1 / 14

Kocaeli University , Izmit, June 12, 2014

Kocaeli University , Izmit, June 12, 2014. Man-Machine Communication Introduction : Language Technologies. Zygmunt Vetulani. Adam Mickiewicz University in Poznań Dept of Computer Linguistics and Artificial Intelligence vetulani@amu.edu.pl. I. PLAN OF THE LECTURE

satin
Télécharger la présentation

Kocaeli University , Izmit, June 12, 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KocaeliUniversity, Izmit, June 12, 2014 Man-MachineCommunication Introduction : Language Technologies Zygmunt Vetulani Adam Mickiewicz UniversityinPoznań Dept of Computer Linguistics and ArtificialIntelligence vetulani@amu.edu.pl

  2. I. PLAN OF THE LECTURE Introduction to Language Technologies - a historical perspective - review of problems, resources and tools Izmit, June 12, 2014

  3. Natural Language; Technology - definitions Definition 1 Natural language: "everyday language" spontaneously spoken by some community. Typically acquired by young members of the community as the mother tongue. It may also be learned by the adult speakers typically from outside of this community. Its primary role is to serve the inter-human communication purposes. The naturallanguage is a resultof evolution. Definition 2 By a technology we mean an organized set of tools, methods, techniques which constitute the "know-how" used to solve problems, to perform specific functions or to produce artifacts or/and information. Human Language Technologies are closely connected with information processing. Izmit, June 12, 2014

  4. Some history Beginnings • The Neanderthal man was (probably) able to speek • Humans started speaking some 50.000 years ago or earlier • Humans invented writting some 7000 (?) years ago • (invention of writting = beginning of the historicaltimes) - Now: most of some 7000 languages spoken on the Earth are doted with a writting system Izmit, June 12, 2014

  5. Dispilio Tablet (Grèce) • Dictionaries • - Egypt- VII c. BC. • IndiaAmarakosha the first lexiconofsanskrit (Vc. BC) • Ancient Greece - Homer (V c. BC) - glossaire • - Ancient Rome - Onomasticon - II c. BC. The first language resources • Grammars • - India - VIs. BC. - Yaska, IV c. BC . - Panini (Sanskrit) • Ancient Greece - III c. BC. • Ancient Rome - I c. BC. (Latin) • Arabic Grammar - VIII c. (Wiki) Izmit, June 12, 2014

  6. Text reproduction technologies Middle Ages: writing is serving the "global" culture of Christian and Muslimworld • Renaissance : The Gutenberg's Revolution (1452-1455) - • the first serial reproduction of the Bible Printing characters: • known in China since XI c. • (Bi Sheng) • XVIII-XIX s. Industrial Revolution - high speed printing machines Izmit, June 12, 2014 (pictures from Wiki)

  7. Recent revolutions Humans' environment changes over the last 7000 years : virgin natural environment -> rural -> urban -> artificial (saturated with artifacts and technologies) - XX c. : Computer revolution - >Text processing and internet Public information accessible to (almost) everybody in form of text, sound, image, inscriptions, messages, publicity, instructions,... Computers makes easy and cheap automatic text processing (spoken and written) -> Computer technologies of natural language appear -> Invention of the Internet New phenomena : In XX century the environment is full of information-bearing artifacts Izmit, June 12, 2014

  8. Presentday revolutions XXI c. : Technologies of the Information Society Epoch A new phenomenon of XX/XXI century : the environment (information rich) becomes interactive with respect to humans. Some examples: • dialogue between users (human) and machines (robots), • automatic voice command recognition, • virtual reality, • sociable (social) robots. (Many science-fiction ideas has been implemented or are close to be implemented) Izmit, June 12, 2014

  9. Some history – present days revolution - XXI c. - A NEW GLOBALISATION EPOCH In the past : Roman, Christian and Islamiccivilizations pretended to be global Characteristic features (typical phenomena): - common practice of a "lingua franca" - LATIN -> English - general knowledge of writing as result of the industrial revolution (XVIII-XIX c.) - common access to the industrial infrastructure (XX c.) - common access to information : Internet (XX/XXI c.) Now: Information is the "force motrice" of the new globalization Globalization needs a technological support, in particular in the domain of information (language) technologies. TECHNOLOGICALEXCLUSION PROBLEM for peopleusing „less-resourced -languages” Izmit, June 12, 2014

  10. Review of problems, resources and tools HIGH LEVEL TECHNOLOGIES (AI) Systems with linguistic competence. Human-machine NL interfaces Software which organizes communication between the user and the computer system on the bases of natural language, i.e. the software doted with the software which emulate the human language competence. ("Computing Machinery and Intelligence") Machine translation(MT)– automatic translation without any human intervention, performed by software uniquely (no human assistance) MT is historically one of the first language technologies. Anticipated by the ideas of Rene Descartes. Couturatet et Leau noted also a lost text by WilhelmL. Rieger : " Zifferen-grammatik, welche mit Hilfe der Wörterbtcher ein mechanisches Ubersetzen aus einer Sprache in alle andere ermöglicht " (Prag, 1903). (also see in Bernd Spillner, in Übersetzung im Umbruch, 1996, p. 209) "Translation" memorandum by Warren Weaver, 1949. (Georgetown Experiment 1954 (60 phrases)) Automatic summarization Automatic generation of a summary (admitting lost of secondary information). Izmit, June 12, 2014

  11. Review of problems, resources and tools MID LEVEL TECHNOLOGIES : NLP Parsing Identification of the structure of text units (typically sentences)  Natural language generation Natural language understanding Discourse analysis Description of the discourse structure (spoken or written) (structural, statistical) Information Processing Information retrieval Search (in a corpus of documents) for documents containing necessary information Search for information about documents (metadata extraction) Information extraction Extraction of a structured information from (non-structured) text documents Izmit, June 12, 2014

  12. Text Processing Extraction of relations /e.g. temporal/ (Relationship extraction) Named-entity recognition Terminology extraction from corpora Part of speech tagging Tagging of words in the text with labels containing information connected with these words Text annotation Tagging text with metadata Text segmentation Morphemes, words, sentences,... Co-reference processing, anaphora Izmit, June 12, 2014

  13. Processing of words Lemmatisation, morphological analysis Word sense disambiguation Sentiment analysis Study and detection of positive/negative connotations of text elements Optical character recognition (OCR). Text as picture -> text file T9 Fast text capturing technology with the 9 keys keyboard (typical cellular phone) Izmit, June 12, 2014

  14. BASIC TECHNOLOGIES Corpora - speech - text - dialogue Grammars Lexicons, thesauri Ontologies (wordnets) SPEECH PROCESSING Speech recognition - recognition of speech captured with a microphone Speech-to-text Speaker identification ( speaker identification) Speech segmentation (phoneidentification) Segmentation of the voice signal Text-to-speech, speech generation Izmit, June 12, 2014

More Related