MIS 644 Social Newtork Analysis 2017/2018 Spring

MIS 644Social Newtork Analysis2017/2018 Spring Chapter 3 Measures and Metrics

Outline • Centrality Measures • Structural Balance • Similarity • Homophily and Assortative Mixing

Structure of a network – calculate • verious useful quantities or measures • capture features of the topology of the network

Centrality Meadures • Centrality Measures • Degree • Eigen value • Katz • Closeness • Betweenness

Centrality Mesures • Which are the most important or central vertices in a network? • many possible definitions of importance

Degree Centrality • undirected networks – degree • directed networks – in-degree and out-degree • E.g.: • SNs: individuals with high connections have more prestige, access to information resources • Citation networks: papers with high in-degree, are cited more influencial papers

Eigenvector Centrality • not all neighbors are equivalent • a vertices importance increased by having connections to other vertices themselves important • Instead of treating each neighboring vertex equally • give a score reflecting its importance Xi • Score of i (Xi) is proportional to the scores of the neighbors xijAijxj xi = jAijxj

Or in matrix form X = AX AX = -1X Let -1 =  AX = X • X: right eigenvector of A and  corresponding eigenvalue • For a symetric n x n matrix threre are n real eigenvectors and values • But which eigenvector or value?

AX = X • (A-I)X = 0 • non trivial solutions of this eq • making A-I singular • or det(A-I) = 0 • solve this for  making determinant 0

start: initial guess of centrality for each vertex i • 1 or degree centrality for each Xi • update x’i = jAijxj • in matrix notation • X’ = AX • where A: adjacency matrix • X vector of scores • repeating this t steps: • X(t) = AtX(0) • X(0)= jcivi,linear combination of eigenvectors of A

A = V-1V • for symetric matrices V= VT, • eigenvectors are orthogonal v’ ivj = 0 for disticnt i and j • A = V-1TV • At = V-1tV for symetric matrices • At = VTtV =

X(t)=Aticivi =iticivi =t1i(ti/t1)civi, • Where i is the leading eigenvalue maximum value since i<1 for all j other then 1 • So (ti/t1) = (i/1)t  0 as t   • X(t)= 1c1v1, • The converged score vector is proportional to the leading eigenvector • As eigenvector are invariant up to mulitplication by a constant: vi is an eigen vector then cvi is

AX = 1X • where • 1 is the leading eigenvalue • normalization of the X eigenvector • normalize to n – average centrality to 1 • Undirected networks more sutiable • Directed nets: • A is asymetric – right and left eigenvectors hence two leading eigenvectors • Rigth eigenvector – inlinks

Example 7 2 5 4 3 1 2 3 4 5 6 7 0,1,1,0,0,0,0 1,0,1,0,0,0,0 1,1,0,1,0,0,0 0,0,1,0,1,0,0 0,0,0,1,0,1,1 0,0,0,0,1,0,1 0,0,0,0,1,1,0 6

Leading Eigen Vector of A • 1 2 3 4 5 6 7 • 0.894 0.894 1.200 1.025 1.200 0.894 0.894

Illustrative Iterations 0.875 1.312 0.875 1.312 0.875 0.875 0.921 0.921 1.105 1.105 1.105 0.921 0.921 0.875 0.875 1.273 0.955 1.273 0.875 0.875 0.909 0.909 1.144 1.077 1.144 0.909 0.909 0.882 0.882 1.244 0.983 1.244 0.882 0.882 0.903 0.903 1.167 1.056 1.167 0.903 0.903 0.887 0.887 1.226 1.000 1.226 0.887 0.887 0.899 0.899 1.180 1.044 1.180 0.899 0.899 0.890 0.890 1.216 1.010 1.216 0.890 0.890 0.897 0.897 1.188 1.036 1.188 0.897 0.897 0.891 0.891 1.210 1.016 1.210 0.891 0.891 0.896

Directed Networks xi = -11jAijxj • Or AX = 1X • X right eigenvector • Each row of A is multiplied by X • Aij = 1 for ingoing links • If a node i has no ingoing links all Aij = 0 for all j • Hence xi for that vertex is 0 • Any outgoing links gets a weigth of 0 as well • Vertices in strongly connected compoents or their out-component have non zero centrality • Acyclic networks – citation – • no strongly connected compnents • Centrality of all nodes 0

A portion of a directed network. Vertex A in this network has only outgoing edges and hence will have eigenvector centrality zero. Vertex B has outgoing edges and one ingoing edge, but the ingoing one originates at A, and hence vertex B will also have centrality zero.

1 2 3 4 5 6 0.000 0.000 0.000 2.400 3.600 0.000 0.000 0.000 0.000 3.600 2.400 0.000 0.000 0.000 0.000 2.400 3.600 0.000 Eigen Vector Centrality 0.000 0.000 0.000 2.400 3.600 0.000

B A A has no ingoing links So its centrality is zere B has linkd from A only So its cenralityisalso zere

Strongly connected component and its out components 0,0,1,0,0,0,0,0 1,0,0,0,0,0,0,0 0,1,0,0,0,0,1,0 1,0,0,0,0,1,0,0 0,0,0,1,0,0,0,0 0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0 0,0,0,0,0,0,1,0

Acyclical network

Eigen Vector Centrality 2.667 1.333 1.333 1.333 1.333 0.000 0.000 0.000 Largest alpha:2.0 with alfa = 1.5 Katz Centrality 1.806 1.484 1.613 1.484 1.613 0.000 0.000 0.000 few iterations: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.600 1.600 1.000 0.400 0.400 1.000 1.566 1.063 1.399 1.399 1.566 0.224 0.224 0.559 1.398 1.549 1.297 1.752 1.398 0.135 0.135 0.337 PageRank Centrality 1.539 1.017 1.383 1.326 1.491 0.363 0.363 0.518

trns 0.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Cit Mat 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 Bib Mat 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00

autority Centrality 0.000 1.528 2.472 2.472 0.000 0.000 0.000 1.528 Hub Centrality 2.472 1.528 0.000 0.000 0.000 1.528 2.472 0.000 Hub Centrality2 4.000 2.472 0.000 0.000 0.000 2.472 4.000 0.000 autority Eigen Value: 1.0 Hub Eigen Value 2.618033988749895

Katz Centrality • Give a free centrality to each vertex xi = jAijxj+ , In matrix form X = AX + 1 Where 1 is a vector of 1s (1,1,1,...,1) X = (I - A)-11 make  = 1 X = (I - A)-11

free parameter : control balance between eigenvector term and the constant term increase  until I – A vanishes at det(A – -1I) = 0 characteristics roots are eigen values  -1 = 1 or  = 1/1,

How to compute • Ues X = AX + 1 • start with an initial estimate of X (X0=0) • X1 = AX0+ 1 • Stop when converges • Can be applied to undirected networks as well • give a centrality to a node by virue of its existance

Katz Centrality of the Example 1 2 3 4 5 6 7 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.940 0.940 1.149 0.940 1.149 0.940 0.940 0.934 0.934 1.136 0.992 1.136 0.934 0.934 0.921 0.921 1.166 0.983 1.166 0.921 0.921 ... ... 0.902 0.902 1.189 1.015 1.189 0.902 0.902 0.902 0.902 1.189 1.015 1.189 0.902 0.902 Katz Centrality 0.902 0.902 1.189 1.015 1.189 0.902 0.902

Extension • Make  not the same for each vertices X = AX +  • Solution X = (I - A)-1 

PageRank • Problem with Kazt • if a vertex with high Katz centrality points to another vertex • those others get high centrality • Yahoo high centrality • if points me should my page has the high centralyity as well

PageRank • Centrality derived is scaled by the out-degree of a vertex xi = jAij(xj/koutj)+ , problem when koutj=0 in matrix form X = AD-1X + 1 Where 1 is a vector of 1s D: diagonal matrix Dii= max(koutj,1) X = (I - AD-1)-11 make  = 1 X = (I - aAD-1)-11 = D(D - aA)-11

Free parameter  can be set to small values •  < inverse of largest eigen value of AD-1, • The largest eigenvalue is 1 by Peron-Frobenious theorem • for a matrix with columns sum to 1 • there is an eigenvalue 1 • for symetric matrices all other eigenvalues are less than 1 • Google sets  to 0.85

Extensions • Make  not the same for each vertices xi = jAij(xj/koutj)+ i, X = AD-1X +  • Solution X = D(D - A)-1  • Or make  zero xi = jAij(xj/koutj), similar to eigen vector centrality • for undirected networks • xi = ki,

Iterations and PageRenk Centrality 1 2 3 4 5 6 1.000 1.000 1.000 1.000 1.000 1.000 0.585 0.751 0.834 1.498 1.083 1.249 0.465 0.597 0.719 1.582 1.477 1.160 0.404 0.519 0.625 1.830 1.573 1.050 0.366 0.469 0.565 1.878 1.773 0.950 ... PageRank Centrality 0.239 0.306 0.369 2.287 2.179 0.620

0.558 1.135 1.439 0.707 1.265 0.896 0.558 1.135 1.439 0.707 1.265 0.896 0.558 1.135 1.439 0.707 1.265 0.896 • vold • 0.558 1.135 1.439 0.707 1.265 0.896 • new • 0.707 1.439 1.823 0.896 1.603 1.135 • 1.26 • Largest alpha:0.789

0,0,0,1,0,0 0,0,1,0,0,0 1,0,0,0,1,0 0,0,0,0,0,1 0,0,0,1,0,1 0,1,0,0,0,0

wth beta 1, alfa suggested of 0.45 • Katz Centrality 1 2 3 4 5 6 0.623 1.104 1.417 0.731 1.242 0.884 • PageRank Centrality 1 2 3 4 5 6 0.440 1.299 1.351 0.683 0.973 1.254

Hubs and Authorities • a vertex high centrality if pointed by high centrality vertexes • authorities: contain useful information on a topic • hubs: tells where the best authorities can be found • authority – hub • e.g., review articles • centrality for directed networks • authority and hub centrality • hyperlink-index-topic-search HITS by Kleinberg

HITS • authority centrality to hub centrality and visa versa xi = jAijyj, xi: authority centrality of vertex i yi = jAjixj, yi: hub centrality of vertex i • in matrix notation x = Ay, y = ATx, • or combining both AATx = x, ATAy = y, • where = ()-1

Solution • The autority and hub centralities are given by the eigenvectors of AAT and ATA respectively • With the same eigenvalue • Leading eigenvalue  • Both AAT and ATA have the same leading eigenvalue  • AT(AAT)x = ATx, • (ATA)ATx = ATx, (ATA)y = y • ATx = y • ATA : cocitation matrix • AAT: bibliographic coupling matrix

Solves the problem with Eigenvalue Centrality • Vertices that are not cited has authority centrality zero • But thay can have non-zero hub cenrality • And the vertices they site can have non-zero authority centrality

trnspose 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 0.00 coCitation Mat 1.00 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 1.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 Bib Mat 1.00 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 1.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00

autority Centrality 1.500 0.000 0.000 1.500 3.000 0.000 Hub Centrality 0.000 0.000 0.000 3.000 0.000 3.000 Hub Centrality2 0.000 0.000 0.000 4.500 0.000 4.500 autority Eigen Value: 3.0 Hub Eigen Value 2.0

Closeness Centrality • measures mean distance from a vertex to all other vertices li = (1/n)jdij, • where • dij: length of a geodesic path from i to j • li: mean geodesic distance to i, average over all vertices j li = (1/n)jdij, li = (1/n-1)jdij, • dii is 0 by definition

closeness centrality Ci: Ci = 1/li = n/jdij,

Problems • 1 - small range • dijs tend to be small – log n • smallest 1, largest in log n • average in between • e.g..actor net n= ,lmax=2.41,lmin=,8.66 • 2 – dij is . if i and j are in different components • so Ci becomes 0 • average over components i is in • vertices in small components have high C values

harmonic mean distance between vertices C’i = (ij1/dij)/(n-1) • desirable • when dij  the corresponding term drops out • give more weigth vertices close to i • mean geodesic distance l = (1/n2)ijdij = (1/n)ili, • problems • average over components • use harmonic mean distance • 1/lh=(1/n(n-1)) ij1/dij=(1/n) iC’i or lh=n/iC’i,

Betweenness Centrality • the extend to which a vertex lies on paths between other vertices • the number of geodesic paths the vertex lies on • betweenness centrality or betweenness • high betweenness high influence • control of information passing to others • removel most disrupt communication

MIS 644 Social Newtork Analysis 2017/2018 Spring