Natural Language GenerationAn Introductory Tour Anupam Basu Dept. of Computer Science & Engineering IIT Kharagpur Summer School on Natural Language Processing and Text Mining 2008
Natural Language Understanding Natural Language Generation Speech Recognition Speech Synthesis Language Technology Meaning Text Text Speech Speech
What is NLG? Thought / conceptualization of the world ------ Expression The block c is on block a The block a is under block c The block b is by the side of a The block b is on the right of a The block b has its top free The block b is alone ………
Conceptualization • Some intermediate form of representation ON (C, A) ON (A, TABLE) ON (B, TABLE) RIGHT_OF (B,A) ……. What to say?
Conceptualization Is_a Block C On Is_a B A Right_of What to say?
What to say ? How to say ? Natural language generation is the process of deliberately constructing a natural language text in orderto meet specified communicative goals. [McDonald 1992]
Some of the Applications • Machine Translation • Question Answering • Dialogue Systems • Text Summarization • Report Generation
Thought / Concept Expression • Objective: • produce understandable and appropriate texts in human languages • Input: • some underlying non-linguistic representation of information • Knowledge sources required: • Knowledge of language and of the domain
Involved Expertise • Knowledge of Domain • What to say • Relevance • Knowledge of Language • Lexicon, Grammar, Semantics • Strategic Rhetorical Knowledge • How to achieve goals, text types, style • Sociolinguistic and Psychological Factors • Habits and Constraints of the end user as an information processor
Asking for a pen • have(X, z) not have (Y,z) • want have (Y,z) • ask(give (X,z,Y))) • Could you please give me a pen? Situation Why? Goal What? Conceptualization How? Expression
Some Examples Summer School on Natural Language Processing and Text Mining 2008
Example System #1: FoG • Function: • Produces textual weather reports in English and French • Input: • Graphical/numerical weather depiction • User: • Environment Canada (Canadian Weather Service) • Developer: • CoGenTex • Status: • Fielded, in operational use since 1992
Example System #2: STOP • Function: • Produces a personalised smoking-cessation leaflet • Input: • Questionnaire about smoking attitudes, beliefs, history • User: • NHS (British Health Service) • Developer: • University of Aberdeen • Status: • Undergoing clinical evaluation to determine its effectiveness
STOP: Output Dear Ms Cameron Thank you for taking the trouble to return the smoking questionnaire that we sent you. It appears from your answers that although you're not planning to stop smoking in the near future, you would like to stop if it was easy. You think it would be difficult to stop because smoking helps you cope with stress, it is something to do when you are bored, and smoking stops you putting on weight. However, you have reasons to be confident of success if you did try to stop, and there are ways of coping with the difficulties.
Approaches Summer School on Natural Language Processing and Text Mining 2008
Template-based generation • Most common technique • In simplest form, words fill in slots: • “The train from Source to Destination will leave platform number at time hours” Most common sort of NLG found in commercial systems
Pros and Cons • Pros • Conceptually simple • No specialized knowledge needed • Can be tailored to a domain with good performance • Cons • Not general • No variation in style – monotonous • Not scalable
Modern Approaches • Rule Based approach • Machine Learning Approach
Some Critical Issues Summer School on Natural Language Processing and Text Mining 2008
Context Sensitivity in Connected Sentences • X-town was a blooming city. Yet, when the hooligans started to invade the place, __________ . The place was not livable any more. • the place was abandoned by its population • the place was abandoned by them • the city was abandoned by its population • it was abandoned by its population • its population abandoned it……..
Referencing John is Jane’s friend. He loves to swim with his dog in the pool. It is really lovely. I am taking the Shatabdi Express tomorrow. It is a much better train than the Rajdhani Express. It has a nice restaurant car, while the other has nice seats.
Referencing John stole the book from Mary, but he was caught. John stole the book from Mary, but the fool was caught.
Aggregation The dress was cheap. The dress was beautiful The dress was cheap and beautiful The dress was cheap yet beautiful I found the boy. The boy was lost. I found the boy who was lost I found the lost boy. Sita bought a story book. Geeta bought a story book. ???? Sita and Geeta bought a story book. ???? Sita bought a story book and Geeta also bought a story book
Choice of words (Lexicalization) The bus was in time. The journey was fine. The seats were bad. The bus was in perfect time. The journey was fantastic. The seats were awful. The bus was in perfect time. The journey was fantastic. However, the seats were not that good.
General Architecture Summer School on Natural Language Processing and Text Mining 2008
Component Tasks in NLG • Content Planning === Macroplanner • Document Structuring • Sentence Planner === Microplanning • Aggregation ; Lexicalization; Referring Expression Generation • Surface Form Realization • Linguistic realization; Structure Realization
Document Planning Document Plan A Pipelined Architecture Microplanning Text Specification Surface Realization
An Example Consider two assertions has (Hotel_Bliss, food (bad)) has (Hotel_Bliss, ambience (good)) Content Planning selects information ordering Hotel Bliss has bad food but its ambience is good Hotel Bliss has good ambience but its food is good
has (Hotel_Bliss, food (bad)) Sentence Planning choose syntactic templates choose lexicon bad or awful food or cuisine good or excellent Aggregate the two propositions Generate referring expressions It or this restaurant Ordering A big red ball OR A red big ball Have Entity Feature Modifier Subj Obj
Realization correct verb inflection Have Has may require noun inflection (not in this case) Articles required? Where? Conversion into final string Capitalization and Punctuation
Content Planning • What to say • Data collection • Making domain specific inferences • Content selection • Proposition formulation • Each proposition A clause • Text structuring • Sequential ordering of propositions • Specifying Rhetorical Relations
Content Planning Approaches • Schema based (McKeown 1985) • Specify what information, in which order • The schema is traversed to generate discourse plan • Application of operators (similar to Rule Based approach) --- Hovy 93 • The discourse plan is generated dynamically • Output is Content Plan Tree
Discourse Detailed view Group nodes Demograph Summary Name Age Care Blood Sugar
Content Plan • Plan Tree Generation • Ordering – of Group nodes • Propositions • Rhetorical relations between leaf nodes • Paragraph and sentence boundaries
Rhetorical Relations ENABLEMENT MOTIVATION MOTIVATION EVIDENCE You should ... I’m in ... The show ... It got a ... You can get ...
Rhetorical Relations Three basic rhetorical relationships: • SEQUENCE • ELABORATION • CONTRAST Others like • Justification • Inference
Nucleus and Satellites Contrast I drive my Maruti 800 Elaboration I love to collect classic cars My favourite car is Toyota Innova N
Target Text The month was cooler and drier than average, with the average number of rain days, but the total rain for the year so far is well below average. Although there was rain on every day for 8 days from 11th to 18th, rainfall amounts were mostly small.
Document Structuring in WeatherReporter The Message Set: MonthlyTempMsg ("cooler than average") MonthlyRainfallMsg ("drier than average") RainyDaysMsg ("average number of rain days") RainSoFarMsg ("well below average") RainSpellMsg ("8 days from 11th to 18th") RainAmountsMsg ("amounts mostly small")
SEQUENCE ELABORATION ELABORATION CONTRAST CONTRAST MonthlyRainfallMsg RainyDaysMsg RainSoFarMsg RainSpellMsg RainAmountsMsg Document Structuring in Weather Reporter MonthlyTmpMsg
Some Common RST Relationships • Elaboration: The satellite presents more details about the content of the nucleus • Contrast: The nuclei presents things, which are similar in some respects but different in some other relevant way. • Multinuclear – no distinction bet. N and S • Purpose: S presents the goal of performing the activity presented in the nucleus • Condition: S presents something that must occur before the situation presented in N can occur • Result: N results from S
Planning Approach Save Document The system saves the document Click Save Button Choose Save option Type Filename Select Folder A dialog box displayed Dialog box closed
Planning Operator Name: Expand Purpose Effect: (COMPETENT hearer(DO-ACTION ?action)) Constraints: (AND (get_all_substeps ?action ?subaction) (NOT (singular list ?subaction)) Nucleus: (COMPETENT hearer (DO-SEQUENCE ?subaction)) Satellite: (((RST-PURPOSE (INFORM hearer (DO ?action)))
Expand Subactions Effect: (COMPETENT hearer (DO-SEQUENCE ?actions)) Constraints: NIL Nucleus: (for each ?actions (RST-SEQUENCE (COMPETENT hearer (DO-ACTION ?actions)))) Satellites: NIL
Purpose Sequence Choose Folder Choose Save Dialog Box Opens Result
Discourse • To save a file • 1. Choose save option from file menu A dialog box will appear • 2. Choose the folder • 3. Type the file name • 4. Click the Save button The system will save the document
Rhetorical Relations – Difficult to infer Johh abused the duck The duck buzzed John • John abused the duck that had buzzed him • The duck buzzed John who had abused it • The duck buzzed John and he abused it • John abused the duck and it buzzed him