Natural Language Processing (NLP) Lecture No 1 Institute of Southern Punjab Multan Department of Computer Science
Computers “see” text in any language but cannot manipulate Because Computers naturally have No common sense, no reasoning capacity and no life experience While humans naturally have all that. SO How to develop ability in computers solution is NATURAL LANGUAGE PROCESSING (NLP)
Natural Language Processing? Definitions • The field of study that focuses on the interactions between human language and computers is called Natural Language Processing, or NLP for short. It sits at the intersection of computer science, artificial intelligence, and computational linguistics (Wikipedia). • NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way. • NLP focus on development of computers that can understand human (“natural”) Language and speak human language.
What does “Natural” languages means It means English, Japanese, French, Urdu, Arabic, Punjabi, Saraiki, …. NOT Java, C++, Perl, … GOAL: Our ultimate goal is Natural human-to-computer communication
Forms of NLP • The input/output of a NLP system can be: • written text • speech • We will mostly concerned with written text (not speech). • To process written text, we need: • lexical, syntactic, semantic knowledge about the language • discourse information, real world knowledge • To process spoken language, we need everything required to process written text, plus the challenges of speech recognition and speech synthesis. BİL711 Natural Language Processing
Subfields of Natural Language Processing (NLP) Natural Language Understanding • Taking some spoken/typed sentence and working out what it means. • Different Levels of analysis required • Morphological Analysis • Syntactic Analysis • Semantic Analysis • Discourse Analysis NLP - Prof. Carolina Ruiz
Subfields of Natural Language Processing (NLP) Natural Language Generation • Taking some formal representation of what you want to say and working out a way to express it in a natural (human) language (e.g., English) • Producing output in the natural language from some internal representation. • Different levels of synthesis required. • Deep Planning (what to say) • Syntactic Generation NLP - Prof. Carolina Ruiz
Why NL Understanding is hard for Computer? 1- Natural language is extremely rich in form and structure, and very ambiguous. • How to represent meaning, • Which structures map to which meaning structures. 2- One input can mean many different things. Ambiguity can be at different levels. • Lexical Ambiguity (Word Level) ---- Different meanings of the words. • Syntactic Ambiguity ----- Different ways to parse the sentence. • Interpreting Partial information ----- how t interpret pronouns • Contextual information ---- context of the sentence may affect the meaning of that sentence 3- Many input can mean the same thing BİL711 Natural Language Processing
Where does it fit in the CS taxonomy? Computers Databases Artificial Intelligence Algorithms Networking Search Robotics Natural Language Processing Information Retrieval Machine Translation Language Analysis Semantics Parsing
Linguistics Levels of Analysis • Speech and text (and sign language) • Levels • Phonology: sounds / letters / pronunciation • Morphology: the structure of words • Syntax: how these sequences are structured • Semantics: meaning of the strings • Pragmatics: what we use language to accomplish • Discourse: preceding sentence affect on interpretation. • Interaction between levels
Linguistics Levels of Analysis • Morphology: Concerns the way words are built up from smaller meaning bearing units. (come(s),co(mes)) • Syntax: concerns how words are put together to form correct sentences and what structural role each word has. • Semantics: concerns what words mean and how these meanings combine in sentences to form sentence meanings. • Pragmatics: concerns how sentences are used in different situations and how use affects the interpretation of the sentence. • Discourse: concerns how the immediately preceding sentences affect the interpretation of the next sentence.
Ambiguity I made him duck. Few Questions to be answered first 1- How many different interpretations does this sentence have? 2- What are the reasons for the ambiguity? 3- The categories of knowledge of language can be thought of as ambiguity resolving components. 4- How can each ambiguous piece be resolved? 5- Does speech input make the sentence even more ambiguous? • Yes – deciding word boundaries BİL711 Natural Language Processing
Ambiguity (cont.) • Some interpretations of : I made him duck. • I cooked duck for him. • I cooked duck belonging to him. • I created a toy duck which he owns. • I caused him to quickly lower his head or body. • I used magic and turned him into a duck. • duck – morphologically and syntactically ambiguous: noun or verb. • him – syntactically ambiguous: dative or possessive. • make – semantically ambiguous: cook or create. • make – syntactically ambiguous: • Transitive – takes a direct object. => 2 • Di-transitive – takes two objects. => 5 • Takes a direct object and a verb. => 4 BİL711 Natural Language Processing
Resolve Ambiguities • We will introduce models and algorithms to resolve ambiguities at different levels. • part-of-speech tagging -- Deciding whether duck is verb or noun. • word-sense disambiguation -- Deciding whether make is create or cook. • lexical disambiguation -- Resolution of part-of-speech and word-sense ambiguities are two important kinds of lexical disambiguation. • syntactic ambiguity – his duck is an example of syntactic ambiguity, and can be addressed by probabilistic parsing. BİL711 Natural Language Processing
Models to Represent Linguistic Knowledge • We will use certain formalisms (models) to represent the required linguistic knowledge. • State Machines -- FSAs, FSTs, HMMs, ATNs, RTNs • Formal Rule Systems -- Context Free Grammars, Unification Grammars, Probabilistic CFGs. • Logic-based Formalisms -- first order predicate logic, some higher order logic. • Models of Uncertainty -- Bayesian probability theory. • We will use algorithms to manipulate the models of linguistic knowledge to produce the desired behavior. BİL711 Natural Language Processing
Applications of NLP • Automatic Text Summarization • Sentiment Analysis • Topic Extraction • Named Entity Recognition • Part of Speech Tagging • Relationship Extraction • Stemming • Text Mining • Machine Translation • Automated Question Answering • Text Classification • Text Categorization • Index and search large texts • Automatic Translation • Speech understanding • Information Extraction • Co-reference resolution • Discourse analysis • Morphological Segmentation • Natural Language generation • Natural Language recognition • Sentence boundary disambiguation • Word Sense disambiguation
How to Develop Applications 1- Developers Use NLP Algorithms - NLP algorithms are typically based on machine learning algorithms Instead of hand-coding large sets of rules. - NLP can rely on machine learning to automatically learn these rules by analyzing a set of examples 2-Developers use Open Source NLP Libraries Apache OpenNLP: a machine learning toolkit that provides tokenizers, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, co- reference resolution and many more. Natural Language Toolkit (NLTK) a Python library that provides modules for processing text, classification, tokenizing, tagging, parsing and many more Stanford NLP A suite of NLP tool that provides Part of speech tagging, named entity recognizer, co-reference resolution system, sentiment analysis and many more Mallet A Java package that provides document classification, clustering, topic modeling, information extraction and many more.