1 / 26

Patterns of Cascading Behavior in Large Blog Graphs

Patterns of Cascading Behavior in Large Blog Graphs. Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21 Advisor: Dr. Koh, Jia Ling Speaker: Li, Huei Jyun. Outline. Introduction Preliminaries Experimental setup

talon
Télécharger la présentation

Patterns of Cascading Behavior in Large Blog Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21 Advisor: Dr. Koh, Jia Ling Speaker: Li, Huei Jyun

  2. Outline • Introduction • Preliminaries • Experimental setup • Observations, patterns and laws • Temporal dynamics of posts and links • Blog network and Post network topology • Patterns in the cascades • Conclusions

  3. Introduction • By examining linking patterns from one blog post to another, we can infer the way information spreads through a social network over the web • Does traffic in the network exhibit bursty and/or periodic behavior? • After a topic becomes popular, how does interest die off? – linearly, or exponentially?

  4. Introduction • We would also like to discover topological patterns in information propagation graphs (cascades) • Do graphs of information cascades have common shapes? • What are their properties? • What are characteristic in-link patterns for different nodes in a cascade • What can we say about the size distribution of cascades?

  5. Preliminaries • Blogs (weblogs) are web sites that are updated on a regular basis • Blogs are composed of posts that typically have room for comments by readers • Blogs and posts typically link each other, as well as other resources on the web

  6. Preliminaries

  7. Preliminaries • From the Post network, we extract information cascades

  8. Preliminaries • Cascades form two main shapes, which we refer to as stars and chains • A star occurs when a single center post is linked by several other posts, but the links do not propagate further • This produces a wide, shallow tree

  9. Preliminaries • A chain occurs when a root is linked by a single post, which in turn is linked by another post • This creates a deep tree that has little breadth

  10. Experimental setup • We are interested in blogs and posts that actively participate in discussions, so we biased our dataset towards the more active part of the blogsphere • Focused on the most-cited blogs and traced forward and backward conversation trees containing these blogs • This process produced a dataset of 2,422,704 posts from 44,362 blogs gathered over the two-month period(August and September 2005) • There are 245,404 links among the posts of our dataset

  11. Experimental setup • Reduced time resolution to one day • Removed edges pointing to webpages outside the dataset and to posts supposedly written in the future • Removed links where a post pointed to itself (although a link to a previous post in the same blog was allowed)

  12. Observations, patterns, and laws • Temporal dynamics of posts and links • Traffic in the blogsphere is not uniform • Posting and blog-to-blog linking patterns tend to have a weekend effect, with frequency sharply dropping off at weekends

  13. Observations, patterns, and laws • Examine how a post’s popularity grows and declines over time • Collect all in-links to a post and plot the number of links occurring after each day following the post

  14. Observations, patterns, and laws • The weekend effect creates abnormalities • Smooth the in-link plots by applying a weighting parameter to the plots separated by day of week • For each delay △ on the horizontal axis, we estimate the corresponding day of week d, and we prorate the count for △ by dividing it by l(d) • l(d) is the percent of blog links occurring on day of week d

  15. Observations, patterns, and laws • Fit the power-law distribution with a cut-off in the tail • Fit on 30 days of data, as most posts in the graph have complete in-link patterns for the 30 days following publication • Found a stable power-law exponent of around -1.5

  16. Observations, patterns, and laws • Blog network and Post network topology • Blog network’s topology

  17. Observations, patterns, and laws • Blog network and Post network topology • Blog network’s topology

  18. Observations, patterns, and laws • Blog network and Post network topology • Post network’s topology

  19. Observations, patterns, and laws • Patterns in the cascades • Found all cascade initiator nodes, i.e., nodes that have zero out-degree, and started following their in-links • This process gives us a directed acyclic graph with a single root node

  20. Observations, patterns, and laws • Common cascade shapes • A node represents a post and the influence flows from the top to the bottom • Cascades tend to be wide and not too deep – stars and shallow bursty cascades are the most common type of cascades

  21. Observations, patterns, and laws • Cascade topological properties • What is the common topological pattern in the cascades? • From the Post network we extract all the cascades and measure the overall degree distribution

  22. Observations, patterns, and laws • The in-degree exponent is stable and does not change much given level L in the cascade • A node is at level L if it is L hops away from the cascade initiator • Posts still attract attention even if they are some what late in the cascade and appear towards the bottom of it

  23. Observations, patterns, and laws • What distribution do cascade sizes follow? • Does the probability of observing a cascade on n nodes decreases exponentially with n? • Examine the Cascade Size Distributions over the bag of cascades extracted from the Post network

  24. Observations, patterns, and laws • All follow a heavy-tailed distribution, with slopes ≒ -2 overall • The probability of observing a cascade on n nodes follows a Zipf distribution: • Stars have the power-law exponent ≒ -3.1 • Chains are small and rare and decay with exponent ≒ -8.5

  25. Conclusion • Trying to find how blogs behave and how information propagates through the blogsphere • Temporal patterns: • The decline of a post’s popularity follows a power-law, rather than a exponential dropoff as might be expected

  26. Conclusion • Topological patterns: • Almost any metric we examined follows a power law: size of cascades, size of blogs, in- and out-degrees • Stars and chains are basic components of cascades, with stars being more common

More Related