1 / 0

LINGUISTICA GENERALE E COMPUTAZIONALE, PARTE 2

LINGUISTICA GENERALE E COMPUTAZIONALE, PARTE 2. Lezione 1: Cos’e ’ la Linguistica Computazionale , Introduzione al corso. LINGUISTICA COMPUTAZIONALE.

quito
Télécharger la présentation

LINGUISTICA GENERALE E COMPUTAZIONALE, PARTE 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LINGUISTICA GENERALE E COMPUTAZIONALE, PARTE 2

    Lezione 1:Cos’e’ la LinguisticaComputazionale, Introduzione al corso
  2. LINGUISTICA COMPUTAZIONALE Questa seconda parte di LG&C èun’introduzionealla LINGUISTICA COMPUTAZIONALE (COMPUTATIONAL LINGUISTICS): lo studio dimodellicomputazionaliestatisticidell’INTERPRETAZIONE del linguaggio Normalmentedistintada CORPUS LINGUISTICS (usodimodellicomputazionaliestatisticixanalizzare CORPORA)
  3. QUESTA LEZIONE Riassuntodeiconcettirilevantidilinguisticagenerale Interpretazione: qualisonoiproblemi? Applicazionidilinguisticacomputazionale Piano del corso
  4. Livellidianalisilinguistica – un rapidoriassunto
  5. LIVELLI DI ANALISI LINGUISTICA Foneticaefonologia “cat” = /k/ + /æ/ + /t/ Parole Parti del discorso Morfologia Sintassi Semantica Discorso
  6. PARTI DEL DISCORSO NOMI(tavolo, Simona) VERBI (camminare, mangiare, colpire) AGGETTIVI (rosso, rapido) AVVERBI (probabilmente, subito) PRONOMI (io, lui, ci) ARTICOLI (il, la, un) PREPOSIZIONI (di, a, con) CONGIUNZIONI (e, ma, o) [Italiano]: INTERIEZIONI (ahi! )
  7. MORFOLOGIA Le parole non sonounita’ ‘atomiche’: (in Italianoalmeno) sipossono quasi semprescomporre in unita’ piu’ piccole: i MORFEMI Un MORFEMA e’ “la minima unita’ linguisticadotatadi un significatoproprio”
  8. RE- + PUR- + -IFICARE `ripetizione’ `privo di contaminanti’ `rendere’ DUE ESEMPI REPURIFICARE
  9. STRUTTURA DELLE PAROLE INGLESE: RADICE + AFFISSI RADICE (boy) AFFISSI (-s in boy+s) ITALIANO: TEMA + AFFISSI RADICE (ragazz-) TEMA (radice + vocale tematica – e.g., ragazzo) AFFISSI (-i in ragazz+i)
  10. SINTASSI Words are organized in PHRASES I put THE BAGELS in the freezer I put THE BAGELS THAT WE HAD NOT EATEN in the freezer Phrases are classified according to their main CONSTITUENT, or HEAD: Noun phrases: the bagels, the homeless old man that I tried to help yesterday Mary, she, one of them Verb phrases: Mary went to the store and bought a bagel Adjective Phrases: John is tall / very tall / quite certain to succeed Sentences
  11. Marking Phrase Constituents BRACKETING: [S [NP The children] [VP ate [NP the cake]]] TREES: S NP VP NP AT NNS VBD the children ate AT NN the cake
  12. Sintassi: obiettivo SINTASSI Riconoscere i costituenti Riconoscere una struttura corretta “(io)Nel mezzo del cammin di nostra vita miritrovaiper una selva oscura” [][ ][[ ][ ]] NP PP VP PP Una frase italiana con la struttura (NP PP VP PP) è corretta “[Oscura per mezzo] [nel selva] [del nostra] [mi] [ritrovai] [di cammin vita una]” ?? PP ?? NP VP ?? Una frase italiana con la struttura (?? PP ?? NP VP ??) è scorretta Sintassi
  13. SEMANTICA Due tipi di conoscenza semantica sulle parole: Conoscenza ‘denotazionale’ Conoscenza ‘composizionale’ Quattro tipi di teorie: Referenziale Cognitivo / mentalista Teoria dei prototipi Strutturale / relazionale
  14. Conoscenza denotazionale e conoscenza composizionale ConoscenzaDENOTAZIONALE: conoscenzasulla ‘parola in se’: Il CAVALLO e’ un ANIMALE dallalungacriniera … (Il tipodiconoscenzatipicamentetrovatanelledefinizioni) Conoscenza COMPOSIZIONALE: conoscenzasul come la parolasicombina con altre parole
  15. CONOSCENZA COMPOSIZIONALE Dal punto di vista composizionale si possono fare almeno due distinzioni : Tra PREDICATI ed ARGOMENTI Tra parole FUNZIONALI e parole ‘CONTENUTO’
  16. PREDICATO ARGOMENTI PREDICATI ED ARGOMENTI Maria ha noleggiato una macchina
  17. Discourse Anaphora John arrived late. He always does that. My car didn’t start this morning. There was some problem with the engine fan. Discourse relations: My car didn’t start this morning BECAUSE there was some problem with the engine fan. NLE
  18. Dave Bowman: “Open the pod bay doors, HAL” HAL 9000: “I’m sorry Dave. I’m afraid I can’t do that.”
  19. LA LINGUISTICA COMPUTAZIONALE NEL 2014: DOVE DOVREMMO ESSERE .. Amer. Good afternoon, Hal. How's everything going?Hal. Good afternoon, Mr Amer. Everything is going extremely well.Amer. Hal, you have an enormous responsibility on this mission, in many ways perhaps the greatest responsibility of any single mission element. You are the brain and central nervous system of the ship, and your responsibilities include watching over the men in hibernation. Does this ever cause you any - lack of confidence?Hal. Let me put it this way, Mr Amer. The 9000 series is the most reliable computer ever made. No 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error.Amer. Hal, despite your enormous intellect, are you ever frustrated by your dependence on people to carry out actions?Hal. Not in the slightest bit. I enjoy working with people. I have a stimulating relationship with Dr Poole and Dr Bowman. My mission responsibilities range over the entire operation of the ship, so I am constantly occupied. I am putting myself to the fullest possible use, which is all, I think, that any conscious entity can ever hope to do. ANLE
  20. … E DOVE SIAMO A Febbraio del 2011 ilsistema WATSON sviluppatoda IBM ha vinto a Jeopardy! Battendotredeipiu’ noticampioni del passato http://www.youtube.com/watch?v=otBeCmpEKTs ELN
  21. Modellidiinterpretazionenellalinguisticacomputazionale
  22. LEXICAL PROCESSING SYNTACTIC PROCESSING PREPROCESSING DISCOURSE PROCESSING SEMANTIC PROCESSING INTERPRETAZIONE: IL MODELLO A ‘PIPELINE’ NLE
  23. INTERPRETAZIONE: IL MODELLO A PIPELINE POS TAGGING / WORDSENSE IDENTIFY PHRASES When did Watson won Jeopardy? LEXICAL PROCESSING SYNTACTIC PROCESSING PREPROCESSING PREDICATE/ARGUMENT TOKENIZATION SEMANTIC PROCESSING DISCOURSE ANAPHORA NLE
  24. LA STRUTTURA DI WATSON
  25. TOKENIZZAZIONE C’ERA UNA VOLTA UN PEZZO DI LEGNO. C’ERA | UNA | VOLTA | UN | PEZZO | DI | LEGNO. | C’ | ERA | UNA | VOLTA | UN | PEZZO | DI | LEGNO | . |
  26. PARTI DEL DISCORSO Television/NN has/HVZ yet/RB to/TO work/VB out/RP a/AT living/RBG arrangement/NN with/IN jazz/NN ,/, which/VDT comes/VBZ to/IN the/AT medium/NN more/QL as/CS an/AT uneasy/JJ guest/NN than/CS as/CS a/AT relaxed/VBN member/NN of/IN the/AT family/NN ./.
  27. ANALISI SINTATTICA CON CONTEXT-FREE GRAMMARS The cat sat on the mat S NP VP Det PP N V the cat sat NP Prep N on Det mat the NLP
  28. Processing Steps, IV: Semantic Processing John went to the book store. Johnstore1, go(John, store1) John bought a book. buy(John,book1) John gave the book to Mary. give(John,book1,Mary) Mary put the book on the table. put(Mary,book1,table1) NLP
  29. DOVE STA IL PROBLEMA? Rumore(typos, linguaggiosgrammaticato, etc) Ambiguità Il ruolo del sensocomune
  30. BAD ENGLISH (E ITALIANO) ON THE WEB CHINGLISH: To take notice of safe: The slippery are very crafty (“Take care, slippery”) Note that the level of gap (“Mind the gap”) LANGUAGE CHANGE: I brought two apple's Black is different to white SPAM: Buongiornosonosempre in attesadellevostreinformazioniaffinchépossarapidamente le trasmetta al mioavvocatoperchépossarapidamente fare l’analisidellavostracartellapiùrapidamentecheilpossibile.Grazie rapidamentedi me gliinviati.
  31. AMBIGUITA’ NELLA CLASSIFICAZIONE GRAMMATICALE Molte forme di parola possono essere associate con parti del discorso diverse: STATO sia sostantivo (LO STATO ITALIANO) che verbo (NON SONO STATO IO)
  32. AMBIGUITA’ DI PARTE DEL DISCORSO: LEGGE1 1 Norma, espressa dagli organi legislativi dello Stato, che stabilisce diritti e doveri dei cittadini Legge delega, che viene emessa dal potere esecutivo su delega del potere legislativo entro un ambito ben precisato Legge ponte, emessa in attesa di un'altra più organica A norma, a termini di legge, secondo ciò che la legge prescrive. 2 (est.) Complesso delle norme costituenti l'ordinamento giuridico di uno Stato: la legge è uguale per tutti Essere fuori della legge, non essere garantito dalla legge o non sentirsi a essa soggetto Dettar legge, imporre a tutti la propria volontà. 3 Scienza giuridica: laurea in legge; dottore in legge; facoltà di legge Uomo di legge, specialista nella scienza giuridica.4 Autorità giudiziaria: ricorrere alla legge In nome della legge, formula con cui i rappresentanti dell'autorità giudiziaria intimano a qc. di obbedire a un comando della stessa: in nome della legge, aprite! 5 (est.) Ogni norma che regola la condotta individuale o sociale degli uomini: le leggi della società. 6 (est.) Regola fondamentale di una tecnica, di un'arte e sim.: le leggi della pittura. 7 Relazione determinata e costante fra le quantità variabili che entrano in un fenomeno: le leggi della matematica, della fisica.
  33. LEGGE2 leggere v. tr. (pres. io lèggo, tu lèggi; pass. rem. io lèssi, tu leggésti; part. pass. lètto) 1 Riconoscere dai segni della scrittura le parole e comprenderne il significato: imparare, insegnare a leggere; leggere a voce alta (ass.) Fare lettura, dedicarsi alla lettura: trascorro gran parte della giornata leggendo. 2 Interpretare certi segni convenzionali o naturali: i ciechi leggono con le dita; leggere un diagramma (fig.) Leggere la mano, ricavare dati sul carattere e sul destino di qc. basandosi sulle linee della mano. 3 (lett.) Interpretare uno scritto, un passo: i critici dell'Ottocento leggevano erroneamente questa strofa (est.) Interpretare, valutare scritti, eventi e sim. secondo particolari criteri: leggere un film in chiave ironica. 4 (fig.) Intuire i pensieri e le intenzioni di qc.: gli si legge il terrore sul volto.
  34. STATISTICHE SULL’AMBIGUITA’ NEL B.C. Unambiguous (1tag) 35,340Ambiguous (2-7 tags) 4,100 2 tags 3,760 3 tags 264 4 tags 61 5 tags 12 6 tags 2 7 tags 1 (“still”)
  35. Part of Speech Tagging and Word Sense Disambiguation [verb Duck ] ! [noun Duck] is delicious for dinner I went to the bank to deposit my check. I went to the bank to look out at the river. I went to the bank of windows and chose the one dealing with last names beginning with “d”.
  36. Syntactic Disambiguation Structural ambiguity: S S NP VP NP VP I V NP VP I V NP made her V made det N duck her duck
  37. SemanticsSame event - different sentences John broke the window with a hammer. John broke the window with the crack. The hammer broke the window. The window broke. NLP
  38. Scope ambiguity NLE
  39. IL RUOLO DEL SENSO COMUNE Winograd (1974): The city council refused the women a permit because they feared violence. The city council refused the women a permit because they advocated violence
  40. NLP APPLICATIONS Mature, everyday technology that hardly anybody notices anymore E.g., tokenization, normalization, regular expression search Solid technology that is intensively used but can (and is) still be improved E.g., lemmatization; spelling correctors; IR / Web search; Speech synthesis Used in real applications, but substantial improvements still desired E.g., POS tagging; term extraction; summarization; speech recognition; text classification (e.g., for spam detection); sentiment analysis Spoken dialogue systems for simple information seeking (railways, phone) ‘Almost there’ technology – exists in prototype form E.g., information extraction, generation systems, simple speech translation systems Pie in the sky Full machine translation, more advanced dialogue ANLE
  41. Part I: Mature Technologies Research in NLE has been going on for many years and in many forms – e.g., as part of compiler technology, information retrieval, etc. The results of this work are a number of well-established technologies that are hardly considered ‘research’ anymore ANLE
  42. Basic Word Processing TOKENISATION: StringTokenizer st = new StringTokenizer("this is a test"); while (st.hasMoreTokens()) { System.out.println(st.nextToken()); } prints out:this is a test WORD COUNTING / FREQUENCIES ANLE
  43. Regular Expressions for Search, Validation and Parsing Basics (e.g., search in Google) cat OR dog “Regular * in Java” More advanced (e.g., regular expressions in PERL, Java, etc.) (for advanced search, user input validation, etc.) /[Ww]ordnet/ /colou?r/ /Mas*imo Poesio/ [a-z|A-Z]* [^A-Z] /$[0-9]+\.[0-9][0-9] Note also: SUBSTITUTION s/colour/color/ ELIZA: s/.* YOU ARE (depressed|sad) .*/I AM SORRY TO HEAR YOU ARE \1/ ANLE
  44. Part II: Solid Technology that could still use improvements Over the last ten to twenty years new applications have appeared which are by now fairly well established, but whose results are still not 100% accurate (nor is clear they ever will!) ANLE
  45. Stemming, lemmatization and morphological analysis Stemming: FOXES -> FOX Lemmatization: 'screeching, screeches, screeched,' and 'screech' -> 'screech'+ING 'were' -> 'be‘ +PAST (Sometimes) used for: Information Retrieval But: not in GOOGLE More general morphological analysis: Wissenschaftlichemitarbeiter -> Wissenschaft + mitarbeiter Scientific collaborator (Researcher) Uygarlastiramadiklarimizdanmiscasina -> Uygar +las +tir +ama +dik Civilized +BEC +CAUSE +NEGABLE +PPART +lar +imiz +dan +mis +siniz +casina +PL +P1PL +ABL +PAST +2PL +AsIf `(behaving) as if you are among those whom we could not civilize’ ANLE
  46. Morphological Analysis: the Xerox tools ANLE
  47. Word Prediction Systems that can complete the current word / sentence (e.g., to help people with disabilities) E.g., the Aurora System Or textHelp! ANLE
  48. Spelling correction Word: Gettin -> getting Alway -> always But : olways -/-> always Definittely -/-> definitely Some shells: > set correct = cmd > lz /usr/bin CORRECT>ls /usr/bin (y|n|e|a)? ANLE
  49. Part of Speech Tagging Assign a PART OF SPEECH to each word: ‘dog’ -> NOUN ‘eat’ -> VERB Book that flight VB DT NN Applications: all over the place! IR IE Translation ANLE
  50. TEXT CLASSIFICATION: SPAM DETECTION Dear Hamming Seminar Members The next Hamming Seminar will take place on Wednesday 25th May and the details are as follows - Who: Dave Robertson Title: Formal Reasoning Gets Social Abstract: For much of its history, formal knowledge representation has aimed to describe knowledge independently of the personal and social context in which it is used, with the advantage that we can automate reasoning with such knowledge using mechanisms that also are context independent. This sounds good until you try it on a large scale and find out how sensitive to context much of reasoning actually is. Humans, however, are great hoarders of information and sophisticated tools now make the acquisition of many forms of local knowledge easy. The question is: how to combine this beyond narrow individual use, given that knowledge (and reasoning) will inevitably be contextualised in ways that may be hidden from the people/systems that may interact to use it? This is the social side of knowledge representation and automated reasoning. I will discuss how the formal reasoning community has adapted to this new view of scale. When: 4pm, Wednesday 25 May 2011 Where: Room G07, Informatics Forum There will be wine and nibbles afterwards in the atrium café area. From: "" <takworlld@hotmail.com> Subject: real estate is the only way... gem oalvgkay Anyone can buy real estate with no money down Stop paying rent TODAY ! There is no need to spend hundreds or even thousands for similar courses I am 22 years old and I have already purchased 6 properties using the methods outlined in this truly INCREDIBLE ebook. Change your life NOW ! ================================================= Click Below to order: http://www.wholesaledaily.com/sales/nmd.htm =================================================
  51. SENTIMENT ANALYSIS Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”
  52. SENTIMENT ANALYSIS Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”
  53. SENTIMENT ANALYSIS Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”
  54. Stylometry: Who wrote this? “On the far side of the river valley the road passed through a stark black burn. Charred and limbless trunks of trees stretching away on every side. Ash moving over the road and the sagging hands of blind wire strung from the blackened lightpoles whining thinly in the wind.”
  55. Stylometry: Who wrote this? “On the far side of the river valley the road passed through a stark black burn. Charred and limbless trunks of trees stretching away on every side. Ash moving over the road and the sagging hands of blind wire strung from the blackened lightpoles whining thinly in the wind.” Cormac McCarthy
  56. Speech Synthesis Speech Synthesis (the automatic production of speech from text or other computer-encoded source) is much easier than speech RECOGNITION and is currently a very hot area in industry For a British example, check out Rhetorical Systems US: AT&T ANLE
  57. Parte 3: tecnologiepiùavanzate Datecnologieusate per anni ma ancoraproblematiche, a tecnologie solo disponibili in forma prototipale (Non cioccuperemodiquestetecnologienelcorso)
  58. Speech Recognition Speech Recognition fairly solid (and works very well for digits) E.g., IBM’s Via Voice: http://www4.ibm.com/software/speech/enterprise/dcenter/demo_0.html ANLE
  59. Summarization Summarization is the production of a summary either from a single source (single-document summarization) or from a collection of articles (multi-document summarization) An example is the Columbia Newblaster
  60. Machine Translation Machine translation is one of the earliest attempts at language technology (from the ’40s) Still mostly useful to get a quick idea of the content of a text, but can sometimes works reasonably well An example: Newstran.com
  61. Machine Translation
  62. Machine Translation ANLE
  63. INFORMATION EXTRACTION: REFERENCES TO (NAMED) ENTITIES SITE LOC CULTURE
  64. foodscience.com-Job2 JobTitle: Ice Cream Guru Employer: foodscience.com JobCategory: Travel/Hospitality JobFunction: Food Services JobLocation: Upper Midwest Contact Phone: 800-488-2611 DateExtracted: January 8, 2001 Source: www.foodscience.com/jobs_midwest.html OtherCompanyJobs: foodscience.com-Job1 EXAMPLE OF IE APPLICATION: FINDING JOBS FROM THE WEB
  65. CONTENUTO DEL CORSO Il pacchetto NLTK, implementato in Python, permettedisperimentaretecniche CL anche a chi ha pocaesperienzadiprogrammazione Durante ilcorso introdurremo Python ricapitolandogliaspettidellalinguisticacomputazionalegia’ introdotti in IDUL Useremo NLTK per sperimentare POS tagging Parsing classificazione Seguiremoabbastanzafedelmenteiltestodi Bird Klein & Loper
  66. Il Testo http://www.nltk.org/book
  67. ALTRE INFORMAZIONI Sito: http://clic.cimec.unitn.it/massimo/Teach/ELN/ Esame: Sviluppare un progettino in Python, dapresentareall’orale Ricevimento: Su appuntamento
  68. SCARICARE Python e NLTK Primo compito: Seguire le istruzioni a http://www.nltk.org/download
More Related