1 / 14

Modeling Latent Biographic Attributes in Conversational Genres

Modeling Latent Biographic Attributes in Conversational Genres. Nikesh Garera David Yarowsky. Introduction. Latent classification of biographic attribute such as gender, age and native language. Modeling and classifying biographic attributes based on lexical and discourse factors.

kalare
Télécharger la présentation

Modeling Latent Biographic Attributes in Conversational Genres

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Latent Biographic Attributes in Conversational Genres NikeshGarera David Yarowsky

  2. Introduction • Latent classification of biographic attribute such as gender, age and native language. • Modeling and classifying biographic attributes based on lexical and discourse factors. • A speaker’s lexical choice and discourse style may differ substantially depending on the gender/age of the speaker’s interlocutor.

  3. Corpus Detail • Fisher telephone conversation corpus • Standard Switchboard conversational corpus • Both are transcript and annotated.

  4. Modeling Gender via Ngram Features • Using unigram and bigram features with tf-idf weighting. • Stop words were retained in the feature set. • Only Ngrams with frequency greater than 5 were retained. • SVM model. • 90.84% on Fisher corpora for gender classification • 90.22% on Switchboard corpora for gender classification

  5. Effect of Partner’s Gender • Modeling of speaker gender/age based on the prior and joint modeling of the partner speaker’s gender/age • People tend to use stronger gender-specific, age-specific or dialect-specific word/phrase usage and discourse properties when speaking with someone of a similar gender/age/dialect than when speaking with someone of a different gender/age/dialect, when they may adapt a more neutral speaking style

  6. Effect of Partner’s Gender

  7. Effect of Partner’s Gender • Oracle Experiment • Assume we know whether the conversation is homogeneous (same gender) or heterogeneous (different gender). • Classify both the test conversation side and the partner side, and if the classifier is more confident about the partner side then we choose the gender of the test conversation side based on the heterogeneous/homogeneous information. • Test vs. Partner • If Confidence(T) > Confidence(P) • If Confidence(T) < Confidence(P) • If Hetero, Class(T) = !Class(P) • If Hemo, Class(T) = Class(P) • Overall accuracy improves to 96.46% on Fisher corpus from 90.84%

  8. Effect of Partner’s Gender • Replacing Oracle by a Hemo vs. Hetero Classify • Classifying the conversation as mixed or single-gender • Low accuracy 68.35% on Fisher Corpus as male-male and female-female conversations are grouped into one class. • Create two different classifiers • Male-male vs. Rest • Female-female vs. Rest

  9. Effect of Partner’s Gender • Modeling partner via conditional model and whole-conversation model • The following classifiers were trained and each of their scores was used as a feature in a meta SVM classifier: • Male-male vs. Rest • Female-female vs. Rest • Conditional model of gender given most likely partner's gender • Ngram model

  10. Effect of Partner’s Gender • Conditional model of gender given most likely partner's gender • Two separate classifiers were trained for classifying the gender of a given conversation side, one where the partner is male and other where the partner is female. • Given a test conversation side, we first choose the most likely gender of the partner’s conversation side using the Ngram based model. • Choose the gender of the test conversation side using the appropriate conditional model.

  11. Effect of Partner’s Gender • Incorporating Sociolinguistic Features • 1. % of conversation spoken: • 2. Speaker rate: • 3. %of pronoun usage: • 4. % of back-channel responses such as "(laughter)" and "(lipsmacks)". • 5. % of passive usage: • 6. % of short utterances • 7. % of modal auxiliaries, subordinate clauses. • 8. % of “mm” tokens such as "mhm", "um", "uh-huh", "uh", "hm", "hmm",etc. • 9. Type-token ratio • 10. Mean inter-utterance time: • 11. % of “yeah” occurences. • 12. % of WH-question words. • 13. % Mean word and utterance length.

  12. Gender Classification Results

  13. Gender Classification Results

  14. Reference NikeshGarera. Multilingual Acquisition of Structured Information via Novel Relationship Extraction Models over Diverse Knowledge Sources. Ph.D. Thesis, Johns Hopkins University, Baltimore, Maryland, September 2009.

More Related