Natural Language Processing
This paper presents a computational framework aimed at identifying and characterizing linguistic aspects of politeness in requests, utilizing a corpus of annotated data from online communities like Wikipedia and Stack Exchange. It details the annotation process, employs regular expressions for isolating politeness markers, and leverages machine learning models, particularly Support Vector Machines, for predicting politeness levels. The findings suggest that linguistically informed models outperform traditional approaches, demonstrating effective generalization across domains, while also exploring the connection between politeness and social dynamics.
Natural Language Processing
E N D
Presentation Transcript
Natural Language Processing A COMPUTATIONAL APPROACH TO POLITENESS with application to social factors (Mizil, Jurafsky, Leskovec, Potts) By: Sakaar Khurana Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur
Abstract • Computational framework for identifying linguistic aspects of politeness. • Starting point: A corpus of requests annotated for politeness – evaluate various aspects of politeness theory • Develop a computational framework for identifying and characterizing politeness marking in REQUESTS (because they involve speaker imposing on addressee – negative politeness – minimizing imposition)
Politeness Data • Requests in online communities • Wikipedia community of editors • Stack-exchange community.
Annotating Data • Data labelled using AMTs. • Context – Requests with 2 sentences. • Each annotator – 13 requests. • Each request – 5 annotators • Rate between very impolite to very polite(slider was presented) • Z-score normalization on each annotator
Data Distribution • Requests have average of 0 (interesting) • Standard deviation – 0.7 • Binary perception – 1st and 4th quartile have maximum binary consensus among annotators
Politeness Markers • Requests exhibiting politeness markers are extracted using regular expression matching on dependency parse by Stanford dependency parser with specialized lexicons
Predicting Politeness • Wikipedia – Training set • Stack exchange – Test set • BOW model – SVM with unigram feature representation • Linguistically informed classifier (Ling.) – SVM using features in previous table in addition to unigram features.
Results • Ling. Model performed 3-4 % better. • Results are within 3% from human performance • Hence the theory inspired features are effective and generalize well to new domains.
Relation to social factors • Relation to social outcome: • Politeness and Power:
Other Work • Other researches have identified politeness marking across • different text and media types(Herring) • Between social groups(Burke and Kraut) • This paper had more data which allowed a fuller survey of different strategies.