Natural Language Processing

Natural Language Processing A COMPUTATIONAL APPROACH TO POLITENESS with application to social factors (Mizil, Jurafsky, Leskovec, Potts) By: Sakaar Khurana Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur

Abstract • Computational framework for identifying linguistic aspects of politeness. • Starting point: A corpus of requests annotated for politeness – evaluate various aspects of politeness theory • Develop a computational framework for identifying and characterizing politeness marking in REQUESTS (because they involve speaker imposing on addressee – negative politeness – minimizing imposition)

Politeness Data • Requests in online communities • Wikipedia community of editors • Stack-exchange community.

Annotating Data • Data labelled using AMTs. • Context – Requests with 2 sentences. • Each annotator – 13 requests. • Each request – 5 annotators • Rate between very impolite to very polite(slider was presented) • Z-score normalization on each annotator

Data Distribution • Requests have average of 0 (interesting) • Standard deviation – 0.7 • Binary perception – 1st and 4th quartile have maximum binary consensus among annotators

Politeness Markers • Requests exhibiting politeness markers are extracted using regular expression matching on dependency parse by Stanford dependency parser with specialized lexicons

Predicting Politeness • Wikipedia – Training set • Stack exchange – Test set • BOW model – SVM with unigram feature representation • Linguistically informed classifier (Ling.) – SVM using features in previous table in addition to unigram features.

Results • Ling. Model performed 3-4 % better. • Results are within 3% from human performance • Hence the theory inspired features are effective and generalize well to new domains.

Relation to social factors • Relation to social outcome: • Politeness and Power:

Other Work • Other researches have identified politeness marking across • different text and media types(Herring) • Between social groups(Burke and Kraut) • This paper had more data which allowed a fuller survey of different strategies.

Natural Language Processing