490 likes | 511 Vues
Discover how Twitter, a powerful social network and mass media platform, influences society through research, big data analysis, and sentiment valence. Uncover the methods and tools used for data collection, cleaning, and analysis to measure influence and viral spreading rates on Twitter. Learn about the impact of sentiment analysis in online social networks and the challenges involved in classifying users and messages for research purposes.
E N D
Twitter Based Research Benny Bornfeld Mentors Professor SheizafRafaeli Dr. Daphne Raban
Where research meets Bigbird Research Twitter My Research & Tools Big Data
Twitter Research Big Data
About Twitter • Facts • Established in 2006 • ~140 million active users • ~340 million messages per day • Superlatives • “the stream of the world’s collective consciousness” • “the first rough draft of history”
How does it work? Followers
Retweet ReTweet ReTweet Tweet ReTweet ReTweet Tweet Tweet ReTweet ReTweet
Research Twitter Research Big Data
What is Twitter? Social network! Social Network? Mass Media?
Twitter based predictions I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper”
Influence What’s the influence of twitter on society? Why the revolution will not be tweeted? Malcolm Gladwell VS Technological determinism Clay Shirky
Influence in Twitter • How do we measure influence? • Number of followers? • Centrality? • Creating action/reaction? • Viral spreading?
Twitter Research Big Data
Big Data in SN Research • Pros: • Exploratory research (vs confirmatory research) • Avoid the sampling reliability issue (power law) • Collect what people are actually saying • Non intrusive • Allow analysis of many dimensions • Catch irregular events
Big Data in SN Research • Cons: • Lots of noise • It is sometimes hard to map the data to your research question • Cost of collecting the data • Lack of tools/knowledge on how to store and analyze the data • May come on the expense of theory
Where Research meets Bigbird Research Twitter My Research & Tools Big Data
Influence the capacity or power of persons or things to be a compelling force on or produce effects on the actions, behavior, opinions, etc., of others
InfluenceIn online social networks Tweet ReTweet Sentiment Valence
The research question • Which is more viral? Which is more likely to spread in a social network (Twitter) ? Messages of negative or positive sentiment valence
The Data • Collected ~2 million tweets about new movies • Why movies: • People have opinions about movies • People share their opinions about movies • Can compare to other researches (benchmarks)
Collecting the Tweets • Twitter provides an API for collecting tweets • Up to mid 2010, full data streams were available for free, currently, the rate is very limited (~150/hour) • Full data streams (fire hose) are available via a company called GNIP
Tweets Collecting architecture My App My App HTTP Streaming JSON Collect App JSON parser RULESFILTER Files DB PowerTrack Architecture
Data Fields User Data: Message Data: #followers #following #number of tweets klout tweet rate creation date language name description location sender content type (original/RT) post time Device computedfields # of RT Total Exposure Sentiment
Reading Tasks • Handle partial messages • Handle broken messages • Handle duplicate messages • Handle special characters
Clean the data • Non related messages [build your dream house] • Spammers • Gibberish messages • Normalize the data (e.g. Tweets/Time)
Tools for data analysis • Sorting • Filtering • Counting • Histograms • Sentiment analysis
Sentiment Analysis • Classify each message to positive/neutral/negative • Classification methods • Manual (~10 sec tweet) • Automatic
Sentiment Analysis : Some challenging Tweets examples • Just saw #Footloose with my sisters. The movie fab, and I even spotted my karaoke machine! Did you dolls catch it? • Paranormal Activity 3 seems almost as scary as a level 9 magikarp • My kids want to see Jack and Jill. Its making it hard to love them.
Naïve Bayes classifier Machine learning – supervised learning + + + + + + + + + + + POS + + + - + + + + + + + + + NEG + + + + + + + NEG + + + + + + POS + + + + + + + + + + NEU POS NEU
Naïve Bayes classifier Machine learning – supervised learning + + + + + + + + + + + Training POS + + + + + + + + + + + NEG + + + + + + + NEG + + + + + + POS + + + + + + + + + + NEU POS Testing NEU
Naïve Bayes classifier NGRAM = 2 + + + + + + + + + POS NEG NEG POS NEU POS NEU
Manual classification The Dictionaries
references • Why the revolution will not be tweeted? • Clay Shirky: How social media can make history [ted] • Looking At The World Through Twitter Data • Twitter mood predicts the stock market • Six Provocations for Big Data • Susan Blackmore on memes and "temes“ [ted]