Affective patterns using words and emoticons in Twitter

Affective patterns using words and emoticons in Twitter Tyler Schnoebelen NWAV 40 Georgetown University Oct 30, 2011

Hello, Readers • If you’re reading this presentation on the web—Hi! • I’ve put in notes for most of the slides, which should help build out what they are about and give some of the narrative. • I’ve reduced the quality of the images to try to get the presentation smaller, but the file is still kind of big (sorry). • Feel free to tweet this presentation: • http://bit.ly/tyleremo (this presentation) • http://bit.ly/tyleremotion (a link to my main web page about emotions and language)

Get out your phones #NWAV40 @TSchnoebelen

What’s ahead • Situated cues and broader patterns • In this presentation: • What can we say about the meaning of various emoticons? • What are their usage patterns? • And which words do they co-occur with? • Words describe not just the emoticons, but users, stance objects, and types of audiences that they are most/least consistent with

Emoticon Dialectology • (^_^) • smiling • (^_~) • winking • (>_<) • angry • (-_-) • not amused • d-_-b • listening to music

Among English-Speakers? The biggest emoticons, worldwide and in America, are faces on their sides: =) :P ;-) (: XD • :) • :D • :( • ;) • :-) Some have equal eyes Some face right There are tongues, winks, and wrinkly eyes, too Some have noses

Smiley stuff :) :-) (: :D :-D XD =) =D

Winky stuff ;) ;-) (; ;P

Tongue-y stuff :P :-P ;P =P

Frown-y stuff =( :( :-( ): D:

Slant-y stuff :/ :-/

O-mouths :O

Coverage--WorldWide • 39 million tokens for 1,479 emoticons • http://www.infochimps.com/datasets/twitter-census-smileys

Corpus for this Presentation • 3,209,102 American English tweets with one of these emoticons :) and :-) and (: :D and :-D ;) and ;-) and (; :P and :-P ;P XD :O :( and :-( and ): :/ and :-/ D: =) and =D and =( and =P • 32,252,909 word tokens, 13,586 unique words

Fill in the blank • @KevinHarvickAwww, leave the cute little ground hogs alone. That is so sad… [_EMOTICON_]

Closest to “so sad” • @KevinHarvickAwww, leave the cute little ground hogs alone. That is so sad…[_EMOTICON_]

That “awww” • @KevinHarvickAwww, leave the cute little ground hogs alone. That is so sad…[_EMOTICON_]

Leave X alone • @KevinHarvickAwww, leave the cute little ground hogs alone. That is so sad…[_EMOTICON_] • @KevinHarvickAwww, leave the cute little ground hogs alone. That is so sad…[_EMOTICON_]

Qual and quant • Our intuitions are qualitative and nuanced. • But do these intuitions actually hold? • Are they built on quantifiable generalizations? • We can and should make reference to how the various linguistic resources we are using as cues get used in other situations.

Probability • There are 16,348 tokens of sad that appear with our 25 emoticons • :) occurs with 12,531,809 words • There are 32,252,909 words that appear with any emoticon • If :) was really just a random tag with no meaning, then we’d expect there to be: • (16,348/32,252,909)*(12,531,809/32,252,909)*(32,252,909)= • 6,351.986 tokens of sad alongside :) • Observed tokens of :) and sad together—only 1,972 • 31% of what we’d expect • Highly significant by Fisher’s exact test (~5.91e-05) • Throughout this presentation, I’ll report Observed/Expected values that are significant at minimally p<0.05

:-/

Scope and affect • Notice that “cute” and “little” are positively valenced. • But since they occur within the scope of “leave_alone”, they presumably become LESS likely to appear with smiles • @KevinHarvickAwww, leave the cute little ground hogs alone. That is so sad…[_EMOTICON_]

We also can think about… • The author’s gender (female) • The main recipient’s gender (male) • The author’s social network make-up • As defined not by followers/following but by mutual-@’ing across time • (in this case, mixed)

A quick note about gender • Nearly all emoticons are used by a higher percentage of women than men • The one exception is :-P • Once we distinguish tweeters based on the gender composition of their network • Instead of using “followers/following”, we use “who has consistently and mutually @’ed each other”? • We see that gender makeup doesn’t change how women use most emoticons • Men are much more sensitive • In the domain of “unhappy emoticons”, let’s compare gender-biased networks with mixed gender networks • :(men with male networks avoid this, men with female networks use it a lot • :-( men with female networks use it a lot • :-/ women with female networks avoid this • :/ men with male networks avoid, but women with male networks use a lot; women with female networks avoid Joint work with David Bamman and Jacob Eisenstein

Fill in the blank • @iShell_Beelieve I LOOVE YOU MOORE!!! yessss please skype me [_EMOTICON_]hahaim excited lol

Clustering so far • So far we’ve looked at 14 words • They’ve distinguished happy vs. sad in the first case • And noses from no-noses in the second case • What happens when we look at our 25 emoticons across 13,586 words?

Clustering overview • We can pick out 2 or 3 dimensions, but we have 25 dimensions • Lots of ways to cluster • Hierarchical clustering • Factor analysis • K-means • Model-based • Basically, they all look at distances between points • Close pairs should go in the same cluster • Distant pairs should go in different clusters

Hierarchical clustering overview • Agglomerative hierarchical clustering: • Start with each point as an individual and start fusing like points together. • Then take the “fused points” and fuse them with more, building up, ultimately to one giant cluster that shows a hierarchy beneath it. • Once a fusion is made, it’s done. • You can’t appear in more than one group.

Results of hierarchical clustering—all words Noses cluster separately :O seems “playful” Positive and negative cluster separately Noses cluster separately Why are D: and XD together?

factor analysis • We use factor analysis to discover “latent” variables • Imagine a test that had 30 geometry questions and 20 literature questions. You give it to a few hundred kids. • Going in, we know that there are kids who will do better in one section than the other • If we did a factor analysis on their data, we’d expect to discover a latent “math” variable and a latent “reading” variable. • Look for variables (emoticons) that are correlated • Combine them into factors • Each emoticon is then more-or-less associated with each factor

1. Negative vs Smile 3. Right-facing+ (: (; ;D 2. “Extra-expressives” D: XD :O vs. :) 4. Noses :-D ;-) :-)

Top 4 factors • Factor 1: :( :-( :/ :-/ =( vs. :) • disappointed, expired, allergies, migraine, grrrrr • DOES NOT GO WITH…mwah, terrific, kk, thankyou, notorious • Factor 2: D: XD :O vs. :) • facepalm, jizz, shitting, mexicans, omfg, (a lot of Spanish) • DOES NOT GO WITH…imy, yayyy, thankss, ughhhhh, sry • Factor 3: (:(; ;D • ithink, swagg, yur, idgaf, kickback, wassup, cutee • DOES NOT GO WITH…jajaja, iya, wicked, wkwk, odd, brainstorm • Factor 4: :-D ;-) :-) • twitterville, hubby’s, hee, pmsl, 4get, w00t • DOES NOT GO WITH…hahaa, ooc, heyy, fever, cus, nooo

What patterns emerge? • Happy and sad are different • Thank goodness this is recovered • Noses and no-noses are different • Consistent in the hierarchical clustering and factor analysis • There may also be a “right-facing” dialect • The factor analysis shows this most clearly • There may also be an “equal-eyes” dialect • The hierarchical cluster analysis shows this most clearly • Why are D: (worry) and XD (laughing face) clustering • Across all analyses here • And sometimes with :O • Aren’t tongues and winks different? • They don’t seem to be here

But surely :) and :-) mean the same thing??? • Well, sort of. • Different types of people use them • Which is to say “people who use :) use a different vocabulary than people who use :-)” • Each emoticon’s meaning is how it is used AND who it is used by • One way at getting to their emotional meaning is to stop looking at collocations with all words and start looking at collocations words we know to be emotional (angry, happy, sadness, frighten) • I gather 13 different lists of “emotion terms”—10,592 unique terms • I restrict myself to the 432 words that are on 3 or more lists

XD now in a better spot :O still playful (EqEyes may be a bit set apart) With or without noses, similar affective meaning Tongues not quite the same as winks Although not for neg

1. “Elaborate sad” vs. “elaborate happy” 3. Tongues 4. D:, :-/, :O vs. =) 2. :( and :/ vs. :)

Just emotion terms: Top Factors • Factor 1 ): =( =/ :-( vs. :D and :-) • upset, depressed, poor, sleepy, sore, hurt • DOES NOT GO WITH…nervous, worried, awful, confused, terrible • Factor 2 :( and :/ vs. :) • heartbroken, devastated, tragic, nauseous • DOES NOT GO WITH…excited, gratitude, dumb, bliss, confused, hate • Factor 3 =P :-P ;P :P • silly, lame, lazy, blah, blame, fantastic, crazy • DOES NOT GO WITH…sad, great, hope, welcome, beautiful, love, happy • Factor 4 D: :-/ :O vs. =) • nervous, ugly, attack, worried, awkward, awful, scared, ire • DOES NOT GO WITH…upset, lonely, disappointed, blah, nag, depressed, mad, blame

Redux • Happy and sad are still different • Phew • Noses and non-noses are AFFECTIVELY similar • Right-facing and equal-eye emoticons also seem to be affectively similar to their left-facing/normal-eye counterparts • D: now patterns with negative stuff and away from XD • :O is playful in the hierarchical cluster analysis, more grave in the factor analysis • Tongues and winks are distinct from each other affectively

So What’s the difference between… • We could ask a lot of questions, but I’ll restrict myself to… • Between noses and no-noses • Winks and tongues

No noses and noses

Full Disclosure (Fake nose)

~14% of people vary THEIR nose use 58,367 users with 10+ emoticon uses

Functional Reduction? • Length of tweet • Frequency of emoticon use

Save a character? • There’s a 140-character limit on Twitter • Do noses get tossed out to make room? • Actually, no • 30,000 random :) tweets and 30,000 random :-) tweets • People who use noses are writing MORE not less • Another way to say this is that people who leave off the noses are shortening other things, too Average number of characters • Sig p=2.96e-21 (by t-test)

frequent use = shorter • People who use emoticons a lot don’t use noses as much • Sig p=1.623e-28 (by t-test)

Affective patterns using words and emoticons in Twitter

Affective patterns using words and emoticons in Twitter

Presentation Transcript

Emoticons in IM Conversations

Using Describing Words

Using Transition Words in Your Writing

Patterns of Organization and Signal Words

Words of the Day: Patterns

Using words

Exploiting Emoticons in Sentiment Analysis

Using Guide Words

Using Transitional Words

Twitter Marketing in 4 words Listen. Learn. Care. Serve.

Stress patterns in English words

Using Twitter

Using Guide Words

twitter me this? using twitter in higher ed

Detecting patterns and antipatterns in software using prolog

Using Patterns Examples

Using Patterns and Inductive Reasoning

Reasons on why People are using Emoticons

Using Patterns Effectively

Generate Traffic and Money Using Twitter

Using Transition Words in Your Writing