Measurement and Analysis of Online Social Networks

Measurement and Analysis of Online Social Networks Professor :Dr SheykhEsmaili Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani

INTRODUCTION • The Internet has spawned different types of information sharing systems, including the Web. • MySpace (over 190 million users) • Orkut(over 62 million) • LinkedIn (over 11 million) • LiveJournal(over 5.5 million) • Unlike the Web, which is largely organized around content, online social networks are organized around users.

INTRODUCTION(cont.) • Users join a network, publish their profile and create links to any other users with whom they associate. • The resulting social network provides a basis for maintaining social relationships, for finding users with similar interests, and for locating content. • Understanding of the graph structure of online social networks is necessary to evaluate current systems, to design future online social network based systems.

INTRODUCTION(cont.) • Recent work has proposed the use of social networks to mitigate email spam, to improve Internet search, and to defend against Sybil attacks. • We obtained our data by crawling publicly accessible information on these sites

INTRODUCTION(cont.) • This differs from content graphs like the graph formed by Web hyperlinks, where the popular pages (authorities) and the pages with many references (hubs) are distinct. • We find that online social networks contain a large, strongly connected core of high-degree nodes, surrounded by many small clusters of low-degree nodes. • Flow of information in these networks.

Online social networking sites • Online social networking sites. are usually run by individual corporations. • Users. must register with a site, possibly under a pseudonym. Some sites allow browsing of public data without explicit. • Links. The social network is composed of user accounts and links between users. Some sites (e.g. Flickr, LiveJournal) allow users to link to any other user, without consent from the link target.

Online social networking sites(cont.) • Groups. Most sites enable users to create and join special interest groups. • Users can post messages to groups and upload shared content to the group. Certain groups are moderated. admission to such a group and postings to a group are controlled by a user designated as the group’s moderator. • Other groups are unrestricted, allowing any member to join and post messages or content.

Is the social network used in locating content? • Only Orkut is a “pure” social networking site, in the sense that the primary purpose of the site is finding and connecting to new users. • Flickr, YouTube, and LiveJournal are used for sharing photographs, videos, and blogs, respectively.

Why study social networks? • Are already at the heart of some very popular Web sites. • Play an important role in future personal and commercial online interaction. • Help us understand the impact of online social networks on the future Internet. • We speculate how our data might be of interest to researchers in other disciplines.

Shared interest and trust • Adjacent users in a social network tend to trust each other. • A number of research systems have been proposed to exploit this trust. • Adjacent users in a social network also tend to have common interests. • Users browse neighboring regions of their social network because they are likely to find content that is of interest to them.

Impact on future Internet • Impact on future Internet • Impact on other disciplines • Sociologists can examine our data to test existing theories • Studying the structure of online social networks may help improve the understanding of online campaigning and viral marketing. • Political campaigns have realized the importance of blogs in elections.

How to get datasets? • Sites reluctant to give out data • Cannot enumerate user list • Performed crawls of user graph • Crawled using cluster of 58 machines • Used APIs where available • Otherwise, used HTML screen scraping

Challenges in crawling large graphs • Need to crawl quickly • Underlying social networks changing rapidly • Need to crawl completely • Social networks aren’t necessarily connected, some users have no links, or small clusters • Need to estimate the crawl coverage

How to verify samples • Obtain a random user sample • Conduct a crawl using these random users as seeds • See if these random nodes connect to the original WCC (weakly connected component)

Dataset from Flickr • Used API to conduct the crawl • Obtained random users by guessing usernames (########@N00) to evaluate coverage • Covered 27% of user population, but remaining users have very few links

Dataset from LiveJournal • Used API to conduct the crawl • Obtained random users using special URL • http://www.livejournal.com/random.bml • Crawl covered 95% of user population

Dataset from Orkut • Used HTML screen-scraping to conduct the crawl

Dataset from YouTube • Used API to conduct the crawl • Could not obtain random users • Usernames user-specified strings • Unable to estimate fraction of users covered

High-level data characteristics • Metrics vary by orders of magnitude • However, networks share many key properties 19

Analysis of network structure • Characterize the structural properties of the four network and compare them • Link symetry • Power-law node degrees • Correlation of indegree and outdegree • Path lengths and diameter • Link degree correlations • Densely connected core • Tightly clustered fringe 20

How are the links distributed? • Distribution of indegree and outdegree is similar • Underlying cause is link symmetry

Link symmetry • Possibly contributed by informing users of new incoming links • Unlike other complex networks, such as the Web • Sites like cnn.com receive much links more than they give • makes it harder to identify reputable sources 22

Power-law node degrees • All social networks show properties consistent with power-law networks. • The majority of the nodes have small degree, and a few nodes have significantly higher degree

Correlation of indegree and outdegree • outdegreevs. indegreein web • outdegree vs. indegree in social networks PW CNN OSN

Path lengths and diameter all four networks have short path length from 4.25 – 5.88 25

Link degree correlations • Examine which users tend to connect to each other • Focus on: • Joint degree distribution • How often nodes of different degrees connect to each other • Scale free behavior • A value calculated directly from the joint degree distribution of graph • Assortativity • A measure of the likelihood for nodes to connect to other nodes with similar degrees

Joint degree distribution and Scale-free behaviour

Densely connected core • comprising of between 1% and 10% of the highest degree nodes • removing 10% of core nodes results in breaking up graph into millions of very small SCCs • graphs below show results as nodes are removed starting with highest-degree nodes (left) and path length as graph is constructed beginning with highest-degree nodes(right) Sub logarithmic growth 28

Clustering coefficient • Clustering coefficient C is a metric of cliquishness • Online social networks are tightly clustered • 10,000 times more clustered than random graphs • 5-50 times more clustered than random power-law graphs

Tightly clustered fringe • Low-degree users show high degree of clustering • Social network graphs show stronger clustering

Groups 31

What does the structure look like • the networks contain a densely connected core of high-degree nodes; • and that this core links small groups of strongly clustered, low-degree nodes at the fringes of the network. octopus

Conclusions • Structure of OSNs is significantly different from the Web • Higher degree symmetry in OSNs • Much higher levels of local clustering in OSNs • Privacy controls make graph crawling very difficult • Pure social networks different from content sharing networks

Thanks

Measurement and Analysis of Online Social Networks