1 / 111

Exploring Blog Networks

Exploring Blog Networks. Patterns and a Model for Information Propagation. (As seen at SIAM- Data Mining 2007). Mary McGlohon In collaboration with Jure Leskovec, Christos Faloutsos Natalie Glance, Matthew Hurst Sandia National Labs- July 6, 2007. Long-term Goals.

abeni
Télécharger la présentation

Exploring Blog Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring Blog Networks Patterns and a Model for Information Propagation (As seen at SIAM- Data Mining 2007) Mary McGlohon In collaboration with Jure Leskovec, Christos Faloutsos Natalie Glance, Matthew Hurst Sandia National Labs- July 6, 2007

  2. Long-term Goals • How does information on the Web propagate? • With what pattern do ideas catch on, diffuse, and decrease in popularity? • Can we build a model for this propagation?

  3. Why blogs? • Blogs are a widely used medium of information for many topics and have become an important mode of communication. • Blogs cite one another, creating a record of how information and ideas spread through a social network. • This record is publicly available.

  4. Why do we care? • Understanding how the blog network works is important for: • Social issues: Political mapping, social trends and change, reactions to mass media. • Economic issues: Marketing, predicting commercial success, discovering links between companies. Example: blogs in the 2004 election. [Adamic, Glance 2005]

  5. Immediate Goals • Temporal questions: Does popularity have half-life? Is there periodicity? • Topological questions: What topological patterns do posts and blogs follow? What shapes do cascades take on? Stars? Chains? Something else? • Generative model: Can we build a generative model that mimics properties of cascades?

  6. Outline • Motivation • Preliminaries • Concepts and terminology • Data • Temporal Observations • Topological Observations • Cascade Generation Model • Discussion & Conclusions

  7. What is a blog? • A blog is a frequently-updated webpage. • A blog’s author updates the blog using posts. • Each post has a permanent hyperlink, and may contain links to other blog posts. slashdot boingboing

  8. What is a blog? • A blog is a frequently-updated webpage. • A blog’s author updates the blog using posts. • Each post has a permanent hyperlink, and may contain links to other blog posts. The iPhone is here, hooray! slashdot boingboing

  9. What is a blog? • A blog is a frequently-updated webpage. • A blog’s author updates the blog using posts. • Each post has a permanent hyperlink, and may contain links to other blog posts. The iPhone is here, hooray! At this link, Slashdot says the iPhone has arrived. But I’m not buying one, because … slashdot boingboing

  10. What is a blog? • A blog is a frequently-updated webpage. • A blog’s author updates the blog using posts. • Each post has a permanent hyperlink, and may contain links to other blog posts. The iPhone is here, hooray! At this link, Slashdot says the iPhone has arrived. But I’m not buying one, because … Here Boingboing says they’re not buying an iPhone. They’re just jealous. slashdot boingboing

  11. From blogs to networks slashdot boingboing Blogosphere network MichelleMalkin Dlisted slashdot boingboing 1 MichelleMalkin Dlisted Blog network Post network Non-trivial vs. trivial cascades Stars vs. chains Nodes a,b,c,d are cascade initiators e is a connector Cascades

  12. Blogosphere network From networks to cascades slashdot boingboing MichelleMalkin Dlisted slashdot boingboing Non-trivial vs. trivial cascades MichelleMalkin Dlisted Cascades

  13. From networks to cascades Blogosphere network slashdot boingboing Non-trivial vs. trivial cascades Cascade initiators are first sources of information We also have stars and chains MichelleMalkin Dlisted Cascades

  14. Dataset (Nielsen Buzzmetrics) Gathered from August-September 2005* Used set of 44,362 blogs, traced cascades 2.4 million posts, ~5 million out-links, 245,404 blog-to-blog links Number of posts Time [1 day]

  15. Outline • Motivation • Preliminaries • Concepts and terminology • Data • Temporal Observations • Does blog traffic behave periodically? • How does popularity change over time? • Topological Observations • Cascade Generation Model • Discussion & Conclusions • Future Work

  16. Temporal Observations Does blog traffic behave periodically? • Posts have “weekend effect”, less traffic on Saturday/Sunday.

  17. Temporal Observations Does blog traffic behave periodically? • Monday appears to compensate for this behavior, but it is not actually the case. • We normalize data: countnorm = count / pd where pd is percentage of links on that day. Number in-links (log) Number in-links (log) Monday post dropoff- days after post Same data, normalized

  18. Temporal Observations How does post popularity change over time? Post popularity dropoff follows a power law identical to that found in communication response times in [Vazquez2006]. Observation 1: The probability that a post written at time tp acquires a link at time tp +  is: p(tp+) 1.5 Number of in-links Days after post Cascades

  19. Outline • Motivation • Preliminaries • Temporal Observations • Does blog traffic behave periodically? • How does post popularity change over time? • Topological Observations • What are graph properties for blog networks? • What shapes do cascades take on? Stars, chains, or something else? • Cascade Generation Model • Discussion & Conclusions • Future Work

  20. Topological Observations What graph properties does the blog network exhibit?

  21. Topological Observations What graph properties does the blog network exhibit? How connected? • 44,356 nodes, 122,153 edges • Half of blogs belong to largest connected component.

  22. Topological Observations What power laws does the blog network exhibit? Count (log scale) Count (log scale) Number of blog in-links (log scale) Number of blog out-links (log scale) Both in- and out-degree follows a power law distribution, in-link PL exponent -1.7, out-degree PL exponent near -3. This suggests strong rich-get-richer phenomena.

  23. Topological Observations How are blog in- and out-degree related? In-links and out-links are not correlated. (correlation coefficient 0.16) Number of blog out-links (log scale) Number of blog in-links (log scale)

  24. Topological Observations What graph properties does the post network exhibit?

  25. Topological Observations What graph properties does the post network exhibit? Very sparsely connected: 98% of posts are isolated.

  26. Topological Observations What power laws does the post network exhibit? • Both in-and out-degree follow power laws: • In-degree has PL exponent -2.15, out-degree has PL exponent -2.95. Count Count Post in-degree Post out-degree

  27. Topological Observations How do we measure how information flows through the network? We gather cascades using the following procedure: • Find all initiators (out-degree 0). a b c d e

  28. Topological Observations How do we measure how information flows through the network? We gather cascades using the following procedure: • Find all initiators (out-degree 0). • Follow in-links. a a b b c c d d e e

  29. Topological Observations How do we measure how information flows through the network? We gather cascades using the following procedure: • Find all initiators (out-degree 0). • Follow in-links. • Produces directed acyclic graph. a a a c b d b b c c e d d e e e

  30. Topological Observations How do we measure how information flows through the network? Common cascade shapes are extracted using algorithms in [Leskovec2006].

  31. Topological Observations How do we measure how information flows through the network? Number of edges increases linearally with cascade size, while effective diameter increases logarithmically, suggesting tree-like structures. Number of edges Effective diameter Cascade size (# nodes) Cascade size

  32. Topological Observations How do we measure how information flows through the network? We work with a bag of cascades– each cascade is a disconnected subgraph. We now explore some graph properties of cascades.

  33. Topological Observations What graph properties do cascades exhibit? As before, in- and out-degree in bag of cascades follow power laws. Count Count Cascade node out-degree Cascade node in-degree

  34. Topological Observations What graph properties do cascades exhibit? Cascade size distributions also follow power law.

  35. Topological Observations What graph properties do cascades exhibit? Cascade size distributions also follow power law. Observation 2: The probability of observing a cascade on n nodes follows a Zipf distribution: p(n)  n-2 Count Cascade size (# of nodes)

  36. Topological Observations What graph properties do cascades exhibit? Stars and chains also follow a power law, with different exponents (star -3.1, chain -8.5).

  37. Topological Observations What graph properties do cascades exhibit? Stars and chains also follow a power law, with different exponents (star -3.1, chain -8.5). Count Count Size of star (# nodes) Size of chain (# nodes)

  38. Outline • Motivation • Preliminaries • Temporal Observations • Topological Observations • What are graph properties for blog networks? • What shapes and patterns do cascades take on? • Cascade Generation Model • Epidemiological Background • Proposed Model • Experimental Validation • Discussion & Conclusions • Future Work

  39. Epidemiological models • We consider modeling cascade generation as an epidemic, with ideas as viruses. • We use the SIS model: • At any time, an entity is in one of two states: susceptible or infected. • One parameter  determines how easily spreading conversations are. • [Hethcote2000]

  40. Epidemiological models • We consider modeling cascade generation as an epidemic, with ideas as viruses. • We use the SIS model: • At any time, an entity is in one of two states: susceptible or infected. • One parameter  determines how easily spreading conversations are. • [Hethcote2000]

  41. Epidemiological models • We consider modeling cascade generation as an epidemic, with ideas as viruses. • We use the SIS model: • At any time, an entity is in one of two states: susceptible or infected. • One parameter  determines how easily spreading conversations are. • [Hethcote2000]

  42. Epidemiological models • We consider modeling cascade generation as an epidemic, with ideas as viruses. • We use the SIS model: • At any time, an entity is in one of two states: susceptible or infected. • One parameter  determines how easily spreading conversations are. • [Hethcote2000]

  43. Epidemiological models • We consider modeling cascade generation as an epidemic, with ideas as viruses. • We use the SIS model: • At any time, an entity is in one of two states: susceptible or infected. • One parameter  determines how easily spreading conversations are. • [Hethcote2000]

  44. Epidemiological models • We consider modeling cascade generation as an epidemic, with ideas as viruses. • We use the SIS model: • At any time, an entity is in one of two states: susceptible or infected. • One parameter  determines how easily spreading conversations are. • [Hethcote2000]

  45. Epidemiological models • We consider modeling cascade generation as an epidemic, with ideas as viruses. • We use the SIS model: • At any time, an entity is in one of two states: susceptible or infected. • One parameter  determines how easily spreading conversations are. • [Hethcote2000]

  46. Epidemiological models • We consider modeling cascade generation as an epidemic, with ideas as viruses. • We use the SIS model: • At any time, an entity is in one of two states: susceptible or infected. • One parameter  determines how easily spreading conversations are. • [Hethcote2000]

  47. Cascade Generation Model 0. Begin with Blog Net. 1 B1 B2 1 2 1 1 3 B3 B4

  48. Cascade Generation Model 0. Begin with Blog Net, but ignore edge weights. Example– B1 links to B2, B2 links to B1, B4 links to B2 and B1, as well as itself B3 is isolated, linking to itself. B1 B2 B3 B4

  49. Cascade Generation Model 1. Randomly pick a blog to infect, add node to cascade B1 B1 B2 B3 B4

  50. Cascade Generation Model 2. Infect each in-linked neighbor with probability b. B1 B1 B2 B3 B4

More Related