1 / 51

Microsoft Instant Messenger Communication Network How does the world communicate?

Microsoft Instant Messenger Communication Network How does the world communicate?. Jure Leskovec (jure@cs.cmu.edu) Machine Learning Department http://www.cs.cmu.edu/~ jure. Joint work with: Eric Horvitz, Microsoft Research. Networks: Why?.

vinny
Télécharger la présentation

Microsoft Instant Messenger Communication Network How does the world communicate?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microsoft Instant Messenger Communication NetworkHow does the world communicate? Jure Leskovec (jure@cs.cmu.edu) Machine Learning Department http://www.cs.cmu.edu/~jure Joint work with: Eric Horvitz, Microsoft Research

  2. Networks: Why? • Today: large on-line systems leave detailed records of social activity • On-line communities: MyScace, Facebook • Email, blogging, instant messaging • On-line publications repositories, arXiv, MedLine • Emerging behavior (need lots of data): • Actions of individual nodes are independent but global patterns and regularities emerge

  3. The Largest Social Network • What is the largest social network in the world (that we can relatively easily obtain)?  For the first time we had a chance to look at complete (anonymized) communication of the whole planet (using Microsoft MSN instant messenger network)

  4. Instant Messaging • Contact (buddy) list • Messaging window

  5. Instant Messaging as a Network Buddy Conversation

  6. IM – Phenomena at planetary scale Observe social phenomena at planetary scale: • How does communication change with user demographics (distance, age, sex)? • How does geography affect communication? • What is the structure of the communication network?

  7. Communication data The record of communication • Presence data • user status events (login, status change) • Communication data • who talks to whom • Demographics data • user age, sex, location

  8. Data description: Presence • Events: • Login, Logout • Is this first ever login • Add/Remove/Block buddy • Add unregistered buddy (invite new user) • Change of status (busy, away, BRB, Idle,…) • For each event: • User Id • Time

  9. Data description: Communication • For every conversation (session) we have a list of users who participated in the conversation • There can be multiple people per conversation • For each conversation and each user: • User Id • Time Joined • Time Left • Number of Messages Sent • Number of Messages Received

  10. Data description: Demographics • For every user (self reported): • Age • Gender • Location (Country, ZIP) • Language • IP address (we can do reverse geo IP lookup)

  11. Data collection • Log size: 150Gb/day • Just copying over the network takes 8 to 10h • Parsing and processing takes another 4 to 6h • After parsing and compressing ~ 45 Gb/day • Collected data for 30 days of June 2006: • Total: 1.3Tb of compressed data

  12. Network: Conversations Conversation

  13. Data statistics Activity over June 2006 (30 days) • 245 million users logged in • 180 million users engaged in conversations • 17,5 million new accounts activated • More than 30 billion conversations

  14. Data statistics per day Activity on June 1 2006 • 1 billion conversations • 93 million users login • 65 million different users talk (exchange messages) • 1.5 million invitations for new accounts sent

  15. User characteristics: age

  16. Age piramid: MSN vs. the world

  17. Conversation: Who talks to whom? • Cross gender edges: • 300 male-male and 235 female-female edges • 640 million female-male edges

  18. Number of people per conversation • Max number of people simultaneously talking is 20, but conversation can have more people

  19. Conversation duration • Most conversations are short

  20. Conversations: number of messages Sessions between fewer people run out of steam

  21. Time between conversations • Individuals are highly diverse • What is probability to login into the system after t minutes? • Power-law with exponent 1.5 • Task queuing model [Barabasi] • My email, Darvin’s and Einstein’s letters follow the same pattern

  22. Age: Number of conversations High User self reported age Low

  23. Age: Total conversation duration High User self reported age Low

  24. Age: Messages per conversation High User self reported age Low

  25. Age: Messages per unit time High User self reported age Low

  26. Who talks to whom: Number of conversations

  27. Who talks to whom: Conversation duration

  28. Geography and communication • Count the number of users logging in from particular location on the earth

  29. How is Europe talking • Logins from Europe

  30. Users per geo location Blue circles have more than 1 million logins.

  31. Users per capita • Fraction of population using MSN: • Iceland: 35% • Spain: 28% • Netherlands, Canada, Sweden, Norway: 26% • France, UK: 18% • USA, Brazil: 8%

  32. Communication heat map • For each conversation between geo points (A,B) we increase the intensity on the line between A and B

  33. Homophily(gliha v kupštriha)  • Correlation: • Probability: Age vs. Age

  34. Per country statistics • On a particular typical day… Note that global usage and market share statistics are higher if we accumulate data over longer time periods.

  35. Per typical user per country • On a typical day MSN user from a country … Note that global usage and market share numbers are higher if we accumulate data over longer time periods.

  36. What about Slovenia (per capita)?

  37. Who is Slovenia talking to?

  38. Instant Messaging as a Network Buddy

  39. IM Communication Network • Buddy graph: • 240 million people (people that login in June ’06) • 9.1 billion edges (friendship links) • Communication graph: • There is an edge if the users exchanged at least one message in June 2006 • 180 million people • 1.3 billion edges • 30 billion conversations

  40. Buddy network: Number of buddies • Buddy graph: 240 million nodes, 9.1 billion edges (~40 buddies per user)

  41. Network: Small-world • 6 degrees of separation [Milgram ’60s] • Average distance 5.5 • 90% of nodes can be reached in < 8 hops

  42. Network: Searchability v • Milgram’s experiment showed: • (1) short paths exist in networks • (2) humans are able tofind them • Assume the following setting: • Nodes are scattered on a plane • Given starting node u and we want to reach target node v • Algorithm: always navigate to a neighbor that is geographically closest to target node v • Surprise: Geo-routing finds the short paths (for appropriate distance measure) u

  43. Communication network: Clustering • How many triangles are closed? • Clustering normally decays as k-1 • Communication network is highly clustered: k-0.37 High clustering Low clustering

  44. Communication Network Connectivity

  45. k-Cores decomposition • What is the structure of the core of the network?

  46. k-Cores: core of the network • People with k<20 are the periphery • Core is composed of 79 people, each having 68 edges among them

  47. Network robustness • We delete nodes (in some order) and observe how network falls apart: • Number of edges deleted • Size of largest connected component

  48. Robustness: Nodes vs. Edges

  49. Robustness: Connectivity

  50. Conclusion • A first look at planetary scale social network • The largest social network analyzed • Strong presence of homophily: people that communicate share attributes • Well connected: in only few hops one can research most of the network • Very robust: Many (random) people can be removed  and the network is still connected

More Related