1 / 53

Deriving Marketing Intelligence from Online Discussion

Deriving Marketing Intelligence from Online Discussion Natalie Glance and Matthew Hurst CMU Information Retrieval Seminar, April 19, 2006 Overview Motivation Content Segment: The Blogosphere Structural Aspects Topical Aspects Deriving market intelligence Conclusion Motivation Social

Télécharger la présentation

Deriving Marketing Intelligence from Online Discussion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deriving Marketing Intelligence from Online Discussion Natalie Glance and Matthew Hurst CMU Information Retrieval Seminar, April 19, 2006 © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  2. Overview • Motivation • Content Segment: The Blogosphere • Structural Aspects • Topical Aspects • Deriving market intelligence • Conclusion © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  3. Motivation Social Media Mobile phone data The celly 31 is awesome, but the screen is a bit too dim. © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  4. The Blogosphere © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  5. © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  6. Profile Analysis Hurst, “24 Hours in the Blogosphere”, 2006 AAAI Spring Symposium on Computational Approaches to Analysing Weblogs. © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  7. Hypotheses • Different hosts attract users with different capacity to disclose profile information (?) • Blogspot users are more disposed to disclose information (?) • Different interface implementations perform differently at extracting/encouraging information from users (?) © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  8. Per Capita: Spaces • variance in average age • variance in profiles with age • variance in per capita bloggers © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  9. Per Capita: Blogspot © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  10. The graphical structure of the blogosphere © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  11. Graphical Structure of the Blogosphere • Citations between blogs indicate some form of relationship, generally topical. • A link is certainly evidence of awareness, consequently reciprocal links are evidence of mutual awareness. • Mutual awareness suggests some commonality, perhaps common interests. • The graph of reciprocal links can be considered a social network. • Areciprocal links suggest topical relationships, but not social ones. © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  12. © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  13. Graph Layout • Hierarchical Force Layout • Graph has 2 types of links: reciprocal links and areciprocal links • Create set of partitions P where each partition is a connected component in the reciprocal graph. • Create a graph whose nodes are the members of P and whose edges are formed from areciprocal links between (nodes within) member of P. • Layout the partition graph. • Layout each partition. © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  14. Japanese r = 2 p = 25 cooking knitting © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  15. r = 2 p = 1 kbcafe/rss scoble engadget instapundit boingboing gizmodo powerline michellemalkin crooksandliars © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  16. r = 3 p = 100 technology The English blogosphere is political. social/politics © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  17. Political Blogosphere L. Adamic and N. Glance, “The Political Blogosphere and the 2004 U.S. Election: Divided They Blog”, 2nd Annual Workshop on the Weblogging Ecosystem, Chiba, Japan, 2005. © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  18. Political Blogs & Readership • Pew Internet & American Life Project Report, January 2005, reports: • 63 million U.S. citizens use the Internet to stay informed about politics (mid-2004, Pew Internet Study) • 9% of Internet users read political blogs preceding the 2004 U.S. Presidential Election • 2004 Presidential Campaign Firsts • Candidate blogs: e.g. Dean’s blogforamerica.com • Successful grassroots campaign conducted via websites & blogs • Bloggers credentialed as journalists & invited to nominating conventions © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  19. Research Goals & Questions • Are we witnessing a cyberbalkination of the Internet? • Linking behavior of blogs may make it easier to read only like-minded bloggers • On the other hand, bloggers systematically react to and comment on each others’ posts, both in agreement and disagreement (Balkin 2004) • Goal: study the linking behavior & discussion topics of political bloggers • Measure the degree of interaction between liberal and conservative bloggers • Find any differences in the structure of the two communities: is there a significant difference in “cohesiveness” in one community over another? © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  20. The Greater Political Blogosphere • Citation graph of greater political blogosphere • Front page of each blog crawled in February 2005 • Directed link between blog A and blog B, if A links to B • Method biases blogroll/sidebar links (as opposed to links in posts) • Results • 91% of links point to blog of same persuasion (liberal vs. conservative) • Conservative blogs show greater tendency to link • 82% of conservative blogs are linked to at least once; 84% link to at least one other blog • 67% of liberal blogs are linked to at least once; 74% link to at least one other blog • Average # of links per blog is similar: 13.6 for liberal; 15.1 for conservative • Higher proportion of liberal blogs that are not linked to at all © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  21. Citations between blogs extracted from posts (Aug 29th – Nov 15th, 2004) • All citations between A-list blogs in 2 months preceding the 2004 election • Citations between A-list blogs with at least 5 citations in both directions • Edges further limited to those exceeding 25 combined citations Only 15% of the citations bridge communities

  22. Are political blogs echo chambers? • Performed pairwise comparison of URL citations and phrase usage from blog posts • Link-based similarity measure • Cosine similarity: cos(A,B) = vA.vB/(||vA||*||vB||), where vA is a binary vector. Each entry = 1 or 0, depending on whether blog A cites a particular URL • Average similarity(L,R) = 0.03; cos(R,R) = 0.083; cos(L,L) = 0.087 • Phrase-based similarity measure • Extracted set of phrases, informative wrt background model • Entries in vA are TF*IDF weight for each phrase = (# of phrase mentions by blog)*log[(# blogs)/(# blogs citing the phrase)] • Average similarity(L,R) = 0.10; cos(R,R) = 0.54; cos(L,L) = 0.57 © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  23. Influence on mainstream media Notable examples of blogs breaking a story • Swiftvets.com anti-Kerry video • Bloggers linked to this in late July, keeping accusations alive • Kerry responded in late August, bringing mainstream media coverage • CBS memos alleging preferential treatment of Pres. Bush during the Vietnam War • Powerline broke the story on Sep. 9th, launching flurry of discussion • Dan Rather apologized later in the month • “Was Bush Wired?” • Salon.com asked the question first on Oct. 8th, echoed by Wonkette & PoliticalWire.com • MSM follows-up the next day © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  24. Deriving Market Intelligence N. Glance, M. Hurst, K. Nigam, M. Siegler, R. Stockton and T. Tomokiyo. Deriving Marketing Intelligence from Online Discussion. Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2005). © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  25. Automating Market Research • Brand managers want to know: • Do consumers prefer my brand to another? • Which features of my product are most valued? • What should we change or improve? • Alert me when a rumor starts to spread! © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  26. Comparative mentions: Halo 2 ‘halo 2’ © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  27. Case Study: PDAs • Collect online discussion in target domain (order of 10K to 10M posts) • Classify discussion into domain-specific topics (brand, feature, price) • Perform base analysis over combination of topics: buzz, sentiment/polarity, influencer identification © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  28. Dell Axim, 11.5% buzz, 3.4 polarity © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  29. Interactive analysis • Top-down approach: drill down from aggregate findings to drivers of those findings • Global view of data used to determine focus • Model parent and child slice • Use data driven methods to identify what distinguishes one data set from the other © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  30. SD card © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  31. Social network analysis for discussion about the Dell Axim © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  32. Drilling down to sentence level • Discussion centers on poor quality of sound hardware & IR ports • “It is very sad that the Axim’s audio AND Irda output are so sub-par, because otherwise it is a great Pocket PC.” • “Long story made short: the Axim has a considerably inferior audio output than any other Pocket PC we have ever tested.” • “When we tested it we found that there was a problem with the audio output of the Axim.” • “The Dell Axim has a lousy IR transmitter AND a lousy headphone jack.” • Note: these examples are automatically extracted. © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  33. Technology • Data Collection: • Document acquisition and analysis • Classification (relevance/topic) • Topical Analysis: • Topic classification using a hierarchy of topic classifiers operating at sentence level. • Phrase mining and association. • Intentional Analysis: • Interpreting sentiment/polarity • Community analysis • Aggregate metrics © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  34. Topical Analysis • Hierarchy of topics with specific ‘dimensions’: • Brand dimension • Pocket PC: • Dell Axim • Toshiba • e740 • Palm • Zire • Tungsten • Feature dimension: • Components • Battery © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  35. Topical Analysis • Each topic is a classifier, e.g. a boolean expression with sentence and/or message scoped sub-expressions. • Measured precision of classifier allows for projection of raw counts. • Intersection of typed dimensions allows for a basic approach to association (e.g. find sentences discussing the battery of the Dell Axim). © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  36. Polarity: What is it? • Opinion, evaluation/emotional state wrt some topic. • It is excellent • I love it. • Desirable or undesirable condition • It is broken (objective, but negative). • We use a lexical/syntactic approach. • Cf. related work on boolean document classification task using supervised classifiers. © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  37. Polarity Identification This car is really great © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  38. Polarity Identification This car is really great POS: DT NN VB RR JJ © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  39. Polarity Identification This car is really great POS: DT NN VB RR JJ Lexical orientation: 0 0 0 0 + © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  40. Polarity Identification This car is really great POS: DT NN VB RR JJ Lexical orientation: 0 0 0 0 + BNP BVP BADJP Chunking: © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  41. Polarity Identification This car is really great POS: DT NN VB RR JJ Lexical orientation: 0 0 0 0 + BNP BVP BADJP Chunking: (parsing): Positive Interpretation: © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  42. Polarity Challenges • Methodological: ‘She told me she didn’t like it.’ • Syntactic: ‘His cell phone works in some buildings, but it others it doesn’t.’ • Valence: • ‘I told you I didn’t like it’, • ‘I heard you didn’t like it’, • ‘I didn’t tell you I liked it’, • ‘I didn’t hear you liked it’: man verbs (tell, hear, say, …) require semantic/functional information for polarity interpretation. • Association © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  43. Polarity Examples © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  44. Polarity Metric • Function of counts of polar statements on a topic: f(size, f top, f top+pos, f top+neg) • Use empirical priors to smooth counts from observed counts (helps with low counts) • Use P/R of system to project true counts and provide error bars (requires labeled data) • Example: +/- ratio metric maps ratio to 0-10 score © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  45. © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  46. Predicting Movie Sales from Blogger Sentiment G. Mishne and N. Glance, “Predicting Movie Sales from Blogger Sentiment,” 2006 AAAI Spring Symposium on Computational Approaches to Analysing Weblogs. © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  47. Blogger Sentiment and Impact on Sales • What we know: • There is a correlation between references to a product in the blogspace and its financial figures • Tong 2001: Movie buzz in Usenet is correlated with sales • Gruhl et. al.: 2005: Spikes in Amazon book sales follow spikes in blog buzz • What we want to find out: • Does taking into account the polarity of the references yield a better correlation? • Product of choice: movies • Methodology: compare correlation of references to sales with the correlation of polar references to sales © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  48. Experiment • 49 movies • Budget > 1M$ • Released between Feb. and Aug. 2005 • Sales data from IMDB • “Income per Screen” = opening weekend sales / screens • Blog post collection • References to the movies in a 2-month window • Used IMDB link + simple heuristics • Measure: • Pearson’s-R between the Income per Screen and {references in blogs, positive/polar references in blogs} • Applied to various context lengths around the reference © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  49. Results Income per screen vs. positive references • For 80% of the movies, r > 0.75 for pre-release positive sentiment • 12% improvement compared with correlation of movie sales with simple buzz count (0.542 vs. 0.484) © 2006 Nielsen BuzzMetrics, A VNU business affiliate

  50. Conclusion • The intersection of Social Media and Data/Text Mining algorithms presents a viable business opportunity set to replace traditional forms of market research/social trend analysis/etc. • Key elements include topic detection and sentiment mining. • The success of the blogosphere has driven interest in a distinct form of online content which has a long history but is becoming more and more visible. • The blogosphere itself is a fascinating demonstration of social content and interaction and will enjoy many applications of traditional and novel analysis. © 2006 Nielsen BuzzMetrics, A VNU business affiliate

More Related