1 / 29

Introduction of “Insights into Internet Memes”

Introduction of “Insights into Internet Memes”. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media ( July 17, 2011 – July 21, 2011) Christian Bauckhage Fraunhofer IAIS Bonn, Germany. Introduction.

bart
Télécharger la présentation

Introduction of “Insights into Internet Memes”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction of “Insights into Internet Memes” Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (July 17, 2011 – July 21, 2011) Christian Bauckhage Fraunhofer IAIS Bonn, Germany

  2. Introduction • Dawkin, R. 1976. The Selfish Gene. He postulates memes as a cultural analogue of genes in order to explain how rumors, catch-phrases, melodies, or fashion trends replicate through a population. • Focuses on observable characteristics of Internet memes in epidemic outbreaks. • Means: email, instant messaging, forums, blogs, or social networking sites.

  3. Introduction • Content Example: • Internet memes are inside jokes or pieces of hip underground knowledge, that many people are in on.

  4. Introduction • Most Internet memes spread rapidly, voluntarily, unpredictably and uncontrollably. • Viral marketing: public relation, advertisement, political campaigning. Image than information. • knowyourmeme.com, memedump.com, or memebase.com • Collected from: Google Insights, delicious.com, digg.com, and stumbleupon.com.

  5. Introduction • Using mathematical epidemiology and log-normal distributions in modeling the temporal dynamics of Internet memes. • Analyze similarities and differences among the time series data from different sources . • Introduce mathematical models of outbreak data and apply them to characterize Internet memes and their evolution.

  6. Related Work • Identify influential members in a community so as to contain the spread of misinformation or rumors (Budak, Agrawal, and Abbadi 2010; Shah and Zaman 2009). • Set up models of how events disseminate through online communities and use these to track memes through specific social media (Adar and Adamic 2005; Lin et al. 2010) • Investigate the interplay between social and traditional media (Leskovec, Backstrom, and Kleinberg 2009).

  7. Related Work • Outbreak analysis for trend predictionis an activearea of research in epidemic modeling (Britton 2010). • An emergent property of the scale-free nature of socialorcommunication networks (Keeling and Eames 2005;Lloyd and May 2001; Pastor-Satorras and Vespignani 2001). • Infer socialrelations from observations of information propagationamong individuals (Myers and Leskovec 2010).

  8. Related Work • Yang and Lescovec (2011) and Kubo et al. (2007)cluster time series obtained from a micro bloggingservice in order to predict future interest in a topic. • The results indicate that more elaborate compartment models and log-normal distributions capture trend predictionmore accurately. • Log-normal distributions:to accurately model a wide range of longtailphenomena (Limpert, Stahel, and Abbt 2001) includingInternet measurements such as communication times orthe growth of the web graph (Downey 2005;Mitzenmacher2004).

  9. Data Collection and Preprocessing • Collectionof 150 Internet memes.

  10. Data Collection and Preprocessing • Google Insights : Aservice by Google that provides statistics on queries terms users have entered into the Google search engine. • Delicious : Asocial bookmarking service for storing webbookmarks. • Diggis a social news service where users can vote on webcontent submitted by others. • StumbleUpon is a discovery engine that recommendsweb content that has been entered by its users.

  11. Data Collection and Preprocessing • Time series = [] • T: From January2004 to December 2010.(84months) • Comparing meme related activities across different sources, the data were turned into probability vectors where = /. • Onset times using Teager-Kaiser operator : A signal processing technique to detect abrupt variations in a data stream.TK() = • For each time series, the earliest variation was define the onset time . Model fitting in was done using series = [, . . . , ].

  12. Immediate Observations and Implications • Over theyears there is a growing correlation between the frequenciesof meme related queries to Google and activities of the Deliciouscommunity.

  13. Immediate Observations and Implications • In an attempt to quantify this observation, the author examined weighted average annual correlations between series from Google Insights and their counterparts from the other services. • avgcorr(y) = • The largest correlations are between the Google and the Delicious time series.

  14. Immediate Observations and Implications • The author compared interests and behaviors of different communities and determined the average daily activities ranked in descending order. • The twenty highest ranking memes according to per day popularity in the Delicious, Digg, and StumbleUponcommunities.

  15. Immediate Observations and Implications • In the case of Diggreflectsits role as a social news service: if content or stories that just showed up on the Internet are posted at Digg, users reactquickly to the news. • The shorter the time since onset, the more meme related daily activity there is. On the other hand, memes that have been around for a while hardly provoke further reactions from the Digg community. • Aquarter of the memes that aremost popular among the StumbleUpon community have todo with rather artistic content, which is in contrast to the most popular memes determined from Delicious. Users of recommendation engines are more after sophisticated content than after mundane jokes or fads.

  16. Modeling Meme Dynamics • Investigate the use of two classes of models andargue that and why log-normal distributions are well suitedto represent the temporal dynamics of Internet memes.

  17. Compartment Models • Compartment models are an established approach to describethe progress of an epidemic in a large population. • SIRS model:populationwas divided into (S): susceptible to the disease,(I):those who are infectious, and (R):those who have recovered, which are governed by the following differential equations: S˙(t) = −βI(t)S(t) + φR(t) I˙(t) = βI(t)S(t) − νI(t) R˙(t) = νI(t) − φR(t)

  18. Compartment Models • Simpler models (of type SI, SIS, SIR) have been used to study information dissemination within web-based communities (Kubo et al. 2007; Myers and Leskovec 2010) and were reported to give a good account of the interaction dynamics in social networks. • General assumption: meme related time seriesavailable from Google Insights correspondto the infectious rates I(t) of epidemic processes.

  19. Compartment Models • However, differential equations are nonlinear so that model fitting is complicated. The author use Markov Chain Monte Carlo methodsand found SIRS type models to provide the best explanations of meme activity data. • SIRS models reproduces the general behavior of memes, but tend to underestimate the early contagious stages of the meme. This indicates that stochastic compartment models with constant parameters lack the flexibility required to accurately describe the temporal dynamics of Internet memes.

  20. Log-Normal Models • Log-normal distributions have been successfully used to model frequency distributions. (Wu and Huberman 2007; Lescovec, Adamic, and Huberman 2007; Crane and Sornette 2008). • A random variable x is log-normally distributed, if log(x) has a normal distribution. Accordingly, the probability density function of such a random variable is f(x) = exp • The distribution is only defined for positive values, skewed to the left, and often long-tailed. The mean μ and standard deviation σ of log(x) define the curve. • The process is governed by a time-dependent random variable γt such that =

  21. Log-Normal Models • Such processes are commonly applied to describe growth and decline in biological or economic systems and provide accurate models for a variety of Internet related phenomena, not for Internet memes. • The author determined the best fitting log-normal distribution using least squares optimization for each of the 150 time series. • Log-normaldistributions provide a highly accurate account ofthe temporal dynamics of the memes.

  22. Log-Normal Models • In order to quantify this impression,the author performed with all 150 memes and found the p-values of SIRS. Log-normal models exceeded a confidence threshold of 0.9 in about 70% of the cases. • In 83% of the cases, the p-values obtained for log-normal fits exceeded those of the corresponding SIRS fits. • Kullback-Leiblerdivergencelog

  23. Log-Normal Models • Table 1 lists the resulting DKL measures (closer to 0.0 is better) for SIRS and log-normal fits. In 55% of the cases, the log-normal fits better DKL measures than SIRS model. • Log-normal distributions provide accurate descriptions.

  24. Implications and Application to Prediction • Log-normaldo not model processes and mechanisms of meme spread but summarize corresponding time series. • Work byDover, Goldberg, and Shapira (2010) shown that temporally log-normal diffusion rates indicatenetworks of log-normal link distributions. • For the majority of Internetmemes data representedby log-normal distributions, the author conjecture that they spreadthrough rather homogenous communities of similar interestsand preferences instead of through the Internet at large.

  25. Implications and Application to Prediction • Apply the resulting descriptions inorder to produce a compressed representation of memes inthe space spanned by the shape parameters μ and σ. • Themajority of memes isfound in a cluster represented by the “salad fingers” meme. • Memes on the top rightindicating a pattern of still increasingpopularity.

  26. Implications and Application to Prediction • Forecastfuture evolution according to the correspondinglog-normal model. • 10-year forecasts for acollection of six memes.

  27. Conclusion • The term Internet meme is used to describe evolving contentthat rapidly gains popularity on the Internet. • Users ofthe Digg social news service react to recentmemes and users of the StumbleUponappear to be interested mostly in sophisticated memes. • Compartment modelsgive a good account of the growth anddecline patterns of memes yet lack the flexibility to characterizeshort-lived bursts of meme related activity. • Log-normaldistributionsaccountfor time-dependent growth and decline rates.

  28. Conclusion • Taking intoaccount the fact that log-normal diffusion processes indicatenetworks of log-normal link distributions (Dover, Goldberg, and Shapira 2010) and the observation that the globally scalefree Internet graph appears to contain many log-normal sub-graphs(Pennocket al. 2002), the majorityof currently famous Internet memes spreads throughhomogenous communities and social networks rather thanthrough the Internet at large.

  29. Thanks for your attention!!

More Related