MIS 644 Social Newtork Analysis 2017/2018 Spring

MIS 644Social Newtork Analysis2017/2018 Spring Chapter 4 The Large Scale Structure of Networks

Outline • Components

Components • Component sizes • In an undirected network • large component filling most of the network • more than half - over 90% • rest of the network • large number of small components • Figure 8.1 N-N • E.g., network of film actors • in a version May 2000, • 440,971 out of 449,913 - %98

Figure 8.1 of N-N Components in an undirected network. In most undirected networks there is asingle large component occupying a majority, or at least a significant fraction, of the network,along with a number of small components, typically consisting of only a handful of vertices each

Table 8.1 • S:size – fraction of largest components as a fraction of total network size • for some networks S=1 • The Internet • provide communication between its nodes • becase of the wey network is measured • first web network – single web crawl • may more than one components • Altawista network – several different starts

Table 8.1 of N-N Basic statistics for a number of networks. The properties measured are: type of network, directed or undirected; total number of vertices n; total number of edges m; mean degre c; fraction of vertices in the largest component S (or the largest weakly connected component in the case of a directed network); mean geodesic distance between connected vertex pairs ℓ; exponent α of the degree distribution if the distribution follows a power law (or “-”if not; in/outdegree exponents are given for directed graphs); clustering coefficient C from Eq. (7.41); clustering coefficient CWS from the alternative definition of Eq. (7.44); and the degree correlation

can two or more large components • usually no • A nework of n vertices • n/2 in each components • n2/4 possible pair of vertices – one is in one the other in the other component • most likely an edge between them • Networks with no large compnents? • only of small components • e.g., network is immediate family ties • under the same roof – networks at all?

Compnents in Direced Networks • weakly conneted components • one large weakly connected compnents – undireted • strongly connected components • one large, many small ones • web – large : a quarter of network • out-component, in-compnent • supersets of strongly connected comp. • e.g.,web in and out comp. – about a quarter of the network

small strongly conneted compnents may have • large out or in compnents • “bow tie” diagram - Figure 8.2 N-N, for web • acyclical directed networks – no strongly conneced comp. • e.g., citation networks – almost acyclical • small comp. of two or three vertices

Figure 8.2 of N-N

The “bow tie” diagram of components in a directed network. The typical directednetwork consists of one large strongly connected component and many small ones, each with an in-component and an out-component. Note that by definition each in-component includes the corresponding strongly connected component as a subset, as does each out-component. The largest strongly connected component and its inand out-components typically occupy a significant fraction of the whole network. The percentages shown here indicate how much of the network is taken up by each part of the bow tie in the case of the World Wide Web. After Broder et al. [56]

Shortest Paths and the Small-World Effect • small-world effect • in most networks, the distances between vertices are surprisingly samll • Milgram’s letter passing experiment • Mean distance between vertices • Internet, diseases, information • Path lengths scale as lomg • Directed networks • Harmonic mean • Diameter – small – scale log n - unstrable

Scale-free – degree distribution power laws • one large compnent • mean geodesic dsitance - scale log log n • Small components • Log n so diameter – log n • Dificult to test loglogn very slowly changing with n

Funneling • Milgram • most of theletters destined for a given target person passed through just one or two acquaintances of the target • one or two of your acquaintances are especially well connected • measure what fraction of the shortest paths between avertex i and every other reachable vertex go through each of i’s neighbors in the network • Colleboration of physisitsts %48 one or two • Internt - %49 one

Degree Distributions • the frequency distribution of vertex degrees • defining characteristic of network structure • undireced networks: • k: fraction of vertices that have degree k • n=10 verties • one has degree 0, 2 has 1, 4 has 2, • 2 has 3, 1 has 4 • so • p0=1/10, p1=2/10, p2=4/10, • p3=2/10, p4=1/10,

pk– as a probability • a randomly seleced vertex has degree k • total number of vertices with a given degree • npk, • degree sequence: • the set of degrees for all the vertices • {k1, k2, k3,...} • e.g., {0,1,1,2,2,2,2,3,3,4} • knowing degree dirtribution or sequence • not tell complete structure of a network

plots of frequency vs k • e.g., Figure 8.3 N-N Internet as autonomous system • hubs – well connected vertices • right-skewed • directed networks • in-degree and out-degree distributions • joint distribution • pjk: fraction of vertice simultaneously having in-degree j and out-degree k • two dimensional plot • in and out degrees might be corrleated • vertices with high in-degrees tend to have high out-degrees

The degree distribution of the Internet. A histogram of the degree distribution of the vertices of the Internet graph at the level of autonomous systems.

The degree distributions of the WWW. Histograms of the distribution of in–and out-degrees of pages on the WWW.

Power Lows and Scale-Free Networks • log scale plots lnpk = -lnk + c, taking exponential of both sides pk = Ck-, • c,C and  are positive constants C=ec, • power laws • The degree distribution of Internet follows a power law • other examples: in and out degree of web, in degree of citation networks (not out) • :exponent of power law 2 <=  <= 3 • C normalization constant

Figure 8.6 of N-N The power-law degree distribution of the Internet. Another histogram of the degreedistribution of the Internet graph, plotted this time on logarithmic scales. The approximate straightlineform of the histogram indicates that the degree distribution roughly follows a power law of theform (8.3)

networks with power law distributions – scale-free • not all networks obey power laws • for small k – not • scale-free – interesting charecteristics • examine • log log plots of degree distribution • linear plots

Detecting and Visualizing Power Laws • Problems with histograms • rigth side – large k – bins contain few nodes so noisy • use larger bins – nore observations • bin sizes are different • tail - large size less noice but lose of information • bins of different size • bins of 1 for low degrees and 5 for higfh • logarithmic binning • each bin wider then its predicesspr by a

vertices with degree zero are not shown • as log0 is infinite • Cumulative distribution furnction • Pk fraction of vertices degree k or greater • probablity that a randomly seleced vertex has degree k or greater • following a power law after kmin,

Figure 8.6 of N-N Histogram of the degree distribution if the Internet, created using logarithmic binning. In this histogram the widths of the bins are constant on a logarithmic scale, meaning thaton a linear scale each bin is wider by a constant factor than the one to its left. The counts in thebins are normalized by dividing by bin width to make counts in different bins comparable

plot cumulative distribution in log log scale • linear with one less alpha • not binning – lose of information • Easy to calculate • Pk=r/n, • sort the veertecis in decending order by degree • give ranks from 1 to n • ranks ri of the vertices • plot of ri/n v.s. degree ki, • see example on page 254 N-N

Figure 8.7 of N-N Cumulative distribution function for the degrees of vertices on the Internet. For adistribution with a power-law tail, as is approximately the case for the degree distribution of theInternet, the cumulative distribution function, Eq. (8.4), also follows a power law, butwith a slope1 less than that of the original distribution

Figure 8.8 of N-N Cumulative distribution functions for in–and out-degrees in three directed networks. a) Thea in-degree distribution of the WWWeb b) The out-degree distribution for the same Web data set. (c) The in-degree distribution of acitation network, from the data of Redner [280]. The distributions follow approximate power-lawforms in each case.

Cummulative distributions disadvantages • hard to interpret • succesive points are correlated • change a little

Estimation of  • An approximate maximum likelihood estimate of   = 1+Niln[ki /(kmin – ½)], • here • kmin : minimum degree for which the power law holds • N: number of vertices with degrees greater than or equal to kmin • the sum is performed on vertices with k>kmin • statistical error on :  = (N)1/2iln[ki /(kmin – ½)] = ( - 1)/(N)1/2,

Properties of Power-Low Distributions • power laws – many places: • sizes of city populations, earthkquakes, frequency of use of words, ... • Normalization • Moments

Normalization • C is fixed k=0pk = 1, • for a pure power distribution p0 is infinite Ck=1k- = 1, C = 1/k=1k- = 1/(), • where () is Riemann zeta function pk = k-/(), • whit k > 0 and p0=0

Normalization (cont.) • power law k >=kmin • normalize only the tail pk = k-/k=kmink- = k-/(,kmin), • (,kmin): genarilized zeta function • approximated by an integral • normalization constant: • cummulative distribution: show 8.14

Cumulative distribution furnction • Pk fraction of vertices degree k or greater • probablity that a randomly seleced vertex has degree k or greater • following a power law after kmin,

Moments • the firsrs moment - mean: k = k=0kpk, • the second moment – mean square: k2 = k=0k2pk, • the mth moment: km = k=0kmpk, • power law for k >= kmin:

Moments (cont.) • power law slowly varying with k - approximated • m-+1 < 0 the second term finite • >=0 infinite • m th moment is finite if >m+1 • all momemts wil diverge for m>= -1

Moments (cont.) • for second moments  > 3 • for real networks - 2 <=  <= 3 • second moment is not finite in theory • for real networks moments are finite in practice km = (1/n)ni=1kmi, • max degree: n-1 • upper bound cut off at n n m > -1 as finite but • large numbers for real nets

Moments (cont.) • for a network with  = 5/2 • diverges with n1/2, • momenets diverge in theory • in practical networks • divergence - large numbers • E.g., Internet n  20,000 nodes • second or higher moments • not infinite but very large • <k2> = 1156 very largfe

Top Heavy Distributions • fraction of edges connected to the vertices with highest degrees • pure power-law • W: fraction of ends of edges • attached to a highest degree vertices • P: fraction of highest degree vertices • Lorenz curves • concave, >2 initial increase • large fraction of edges – connected to small fraction of highest degree nodes

Figure 8.9 of N-N Lorenz curves for scale-free networks. The curves show the fraction W of the totalnumber of ends of edges in a scale-free network that are attached to the fraction P of vertices withthe highest degrees, for various values of the power-law exponent α.

Example www • power-law kmin = 20 • alpha 2.2 • p=1/2 • W is found to be 0.89 • 89% of hyperlinks to top half ranked pages • conversly setting • W=0.5 results P=0.015 • 50% of the links go to less than 2% of the ricest pages

MIS 644 Social Newtork Analysis 2017/2018 Spring