1 / 17

Dynamics of Conversations

Dynamics of Conversations. ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon. Presented by Annie T. Chen on March 29, 2011. Overview. RQ: What is the structure of online conversations? Method Proposed a simple mathematical model for the structure of conversations

edan-weiss
Télécharger la présentation

Dynamics of Conversations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011.

  2. Overview • RQ: What is the structure of online conversations? • Method • Proposed a simple mathematical model for the structure of conversations • Added to it to account for factors such as recency and author identity that may affect conversations. • Compared the predictions of these models back to the empirical data for three datasets: Usenet groups, Yahoo! Groups, and Twitter

  3. Properties of Conversations • Size and depth of thread • Depth: length of the maximum path from the root to a leaf in a thread • Size is roughly quadratic to depth • Degree distribution p • Close to power law: p(k)  k- for some >2

  4. Branching Process Model (BP-Model) - 1 • The Galton-Watson branching process is a classic model for generating a random tree. • At each ith step in the process, each node generates a certain number of children according to the distribution p • p(k): fraction of nodes with k children in the data • Zi: number of children at the ith level of the thread • let =E[p], the mean of the distribution p

  5. Branching Process Model (BP-Model) - 2 • According to the definition of a branching process, it can be shown that: E[Z] = (1-)-1 • Since  < 1 for all datasets, the branching process dies out. Empirical Simulated

  6. Branching Process Model (BP-Model) - 3 • Problems with the BP-Model • Model is not generative (degree distributions are stipulated) • Model does not capture the depth distributions that are observed in reality • Number of children is determined by a single distribution • Timestamps are left out

  7. T-Model • Concept: new messages receive more attention than old ones • Probability of the decision to add a child to v is proportional to some function h(degv, rv) of degree and recency of v • Probability of death is proportional to a constant  • h(degv, rv) = degv+rv for constants >=0 and   (0,1) • Thus, both degree and recency play a role in generating different types of threads

  8. TI-Model - 1 • The TI-Model was developed to model author identity. • Concept: authors tend to respond to responses to their own earlier messages. • Based on the polya urn model • Original polya urn problem: • Initially, an urn has x balls of color 1 and y balls of color 2. At each time t, one ball is drawn out and returned to the urn with another ball of the same color. • “Rich get richer” process

  9. TI-Model - 2 • New message v arrives with u=parent(v) • “Identity copying” effect an author on path(parent(u)) random author Empirical Simulated

  10. Examples • Usenet • Yahoo! Groups • Twitter

  11. Usenet • Empirical • Simulated

  12. Usenet • High : Higher degree of preferential attachment • Top ones tended to be politically related • High : High recency effect • Lower traffic groups had a higher recency effect

  13. Usenet • Identity copying rates • High  (low copying rate): new authors tend to join in often • Low  (high copying rate): tendency for authors of posts to have previously already authored a post

  14. Yahoo! Groups • Groups with “bushy” threads and high recency effects

  15. Twitter • Groups with “bushy” threads and high recency effects

  16. Conclusion • Employed various mathematical models to simulate patterns in online conversations • Strengths: • Incorporated time and author identity in the models • Were able to predict patterns that were found in actual datasets • Weaknesses / further directions: • Explanatory power: how well do these models explain differences between conversational environments and/or networks? • Could incorporate other elements of conversation: • Topics • Structural/semantic components of messages • Actor characteristics/roles • How well do these models emulate different types of communication tools, e.g. Twitter?

  17. References • Aldous, D. (2003). Lecture 2: Branching Processes. Accessed March 29, 2011 at http://www.stat.berkeley.edu/~aldous/Networks/lec2.pdf. • Kumar, R., Mahdian, M., & McGlohon, M. (2010). Dynamics of conversations. ACM SIGKDD 2010. • Zhu, T. (2009). Nonlinear Polya Urn Models and Self-Organizing Processes. Accessed March 29, 2011 at http://www.math.upenn.edu/grad/dissertations/tongzhudissertation.pdf.

More Related