V2: Measures and Metrics (II)

V2: MeasuresandMetrics (II) - BetweennessCentrality - Groups of Vertices - Transitivity - Reciprocity - SignedEdgesandStructural Balance - Similarity - HomophilyandAssortative Mixing Mathematics of Biological Networks

Betweenness Betweennessmeasurestheextenttowhich a vertex lies on pathsbetweenothervertices. The ideaisusuallyattributedto Freeman (1977), but was alreadyproposed in an unpublishedtechnicalreportbyAnthonisse a fewyearsearlier. Vertices with high betweennesscentralitymayhaveconsiderableinfluencewithin a networke.g. byvirtueoftheircontroloverinformationpassingbetweenothers (thinkofdatapackages in a message-passingscenario). The verticeswithhighestbetweennessaretheoneswhoseremovalfromthe network will disruptmostcommunicationsbetweenothervertices. Mathematics of Biological Networks

Betweenness Centrality Letusassume an undirectednetworkforsimplicity. Letnstibe 1 ifvertexi lies on thegeodesicpathfromstot and 0 ifitdoes not orifthereisno such path. Thenthebetweennesscentralityxiisgivenby This definitioncountsseparatelythegeodesicpaths in eitherdirectionbetween eachvertex pair. The equation also includespathsfromeachvertextoitself. Excludingthosewould not changetherankingofthevertices in termsofbetweenness. Also, weassumethatverticess andt belongtopathsbetweensandt. Iftherearetwogeodesicpathsofthe same lengthbetween 2 vertices, eachpathgets a weightequaltothe inverse ofthenumberofpaths. Vertices A and B are connectedby 2 geodesicpaths. Vertex C lies on both paths. Mathematics of Biological Networks

Betweenness Centrality Wemayredefinenstitobethenumberofgeodesicpathsfromstotthat pass through vertexi, anddefinegsttobethe total numberofgeodesicpathsfromstot. Then, thebetweennesscentralityofxiis whereweadopttheconventionthatnsti/ gst = 0 ifbothnstiandgstarezero. In thissketchof a network, vertex A lies on a bridgejoiningtwogroupsofothervertices. All pathsbetweenthegroups must pass through A, so ithas a high betweennesseventhoughitsdegreeislow. Mathematics of Biological Networks

Range of Betweenness Centrality The betweennessvaluesaretypicallydistributedover a widerange. The maximumpossiblevalueforthebetweennessof a vertexoccurswhenthe vertex lies on theshortestpathbetweeneveryother pair ofvertices. This occursforthecentralvertex in a stargraphthatis attachedto all othern -1 verticesbysingleedges. The centralvertex lies on all n2shortestpathsbetweenvertexpairs exceptforthen-1pathsfromtheperipheralverticestothemselves. Thus, thebetweennesscentralityofthecentralvertexisn2 – n +1. Mathematics of Biological Networks

Range of Betweenness Centrality In contrast, thesmallestpossiblevalueofbetweenness in a networkwith a singlecomponentis 2n – 1, since at a minimumeachvertex lies on everypaththatstartsorendswithitself. Therearen – 1 pathsfromotherstothevertex, n – 1 pathsfrom a vertextoothersand onepathfromthevertextoitself. In total, thisgives 2 n – 1. This situationoccurs, forinstance, when a networkhas a „leaf“ attachedtoit, whichis a vertexconnectedtotherestofthenetworkby just a singleedge. The ratiooflargestandsmallestpossiblebetweennessvaluesisthus For large networks, thisrangeofvaluescanbecomevery large. Mathematics of Biological Networks

Betweenness Centrality for the film actor network Takingagaintheexampleofthenetworkof film actors, the individual withthehighestbetweennesscentrality in thelargestcomponentoftheactornetworkis SpanishactorFernando Rey whoplayed e.g. with Gene Hackman in The French Connection. Rey has a betweennessof 7.47  108. The lowest score ofanyactor in the large componentis just 8.91 105. Thus, thebetweennessvalues span a rangeofalmostonethousand. The secondhighestbetweennesshas Christopher Lee with 6.46 108. Mathematics of Biological Networks

Groups of Vertices Manynetworksdividenaturallyintogroupsorcommunities. In lectures V3 and V4, we will discusssomesophisticatedcomputermethods thathavebeendevelopedfordetectinggroups, such ashierarchicalclustering andspectralclustering. Today, we will startwithcliques, plexes, cores, andcomponents. A cliqueis a subsetofthevertices in an undirectednetwork such that everymemberofthesetisconnectedby an edgetoeveryothermember. Newman definescliquesas maximal cliques so thatthereisnoothervertex in thenetworkthatcanbeaddedtothesubsetwhilepreservingthe propertythateveryvertexisconnectedtoeveryother. Other scientistsdistinguishbetweencliquesand maximal cliques. Mathematics of Biological Networks

Cliques A cliqueof 4 verticeswithina network. Cliquescan also overlap. In thisnetwork, vertices A and B belongto 2 cliquesof4 vertices. In a socialnetwork, a cliquemaybeformed e.g. bytheco-workers in an officeor a groupofclassmates in school. Mathematics of Biological Networks

k-plex Often, manycirclesoffriends form onlynear-cliquesratherthanperfectcliques. Onewayto relax the stringent requirementthateverymemberisconnectedto everyothermemberisthe k-plex. A k-plexofsizen is a maximal subsetofn verticeswithin a network such that eachvertexisconnectedto at least n – k oftheothers. A 1-plex isthe same as a clique. In a 2-plex, eachvertex must beconnectedto all or all-but-oneoftheothers. And so forth. Onecan also specifythateachmembershouldbeconnectedto a certain fraction (75% or 50%) oftheothervertices. Mathematics of Biological Networks

k-core The k-core isanotherconceptthatiscloselyrelatedtothek-plex. A k-coreis a maximal subsetofvertices such thateachisconnectedto at least k othersin thesubset. Obviously, a k-core ofn verticesis also an (n – k)-plex. However, thesetof all k-cores for a givenvalueofk is not the same asthesetof all k-plexesfor anyvalueofk, sincen, thesizeofthegroup, canvaryfromonek-core toanother. Also, unlikek-plexesandcliques, k-cores cannotoverlap. The k-core isofparticularinterest in networkanalysisforthepractical reasonthatitisvery easy to find thesetof all k-cores in a network. Mathematics of Biological Networks

k-core: simple algorithm A simple algorithmistostartwiththewholenetworkandremovefromitany verticesthathavedegreelessthank. Such verticescannotunderanycircumstancesbemembersof a k-core. In doing so, one will normally also reducethedegreesofsomeothervertices in thenetworkthatwereconnectedtothevertices just removed. So wegothroughthenetworkagaintoseeifthereareanymorevertices thatnowhavedegreelessthankand, ifthereare, weremovethosetoo. Werepeatedlyprunethenetworkuntilnoverticesremainwithdegree < k. Whatisleft, isbydefinition a k-core or a setofk–cores. Mathematics of Biological Networks

Components and k-components A component in an undirectednetworkis a maximal subsetofvertices such thateachisreachablebysomepathfromeachoftheothers. A k-component(sometimes also called k-connectedcomponent) is a maximal subsetofvertices such thateachisreachablefromeachoftheothersby at least kvertex-independent paths. The ideaofk-componentsisconnectedwiththeideaofnetworkrobustness. A pair ofverticesconnectedby 2 independentpathscannotbedisconnectedby thefailureof a singlerouter. Mathematics of Biological Networks

Transitivity A propertyveryimportant in socialnetworks, andusefulto a lesserdegree in othernetworkstoo, istransitivity. Ifthe „connectedby an edge“ relationwere transitive, itwouldmeanthatif vertexuisconnectedtovertexv, andvisconnectedtow,thenuis also connectedtow. Perfecttransitivityonlyoccurs in networkswhereeachcomponentis a fully connectedsubgraphorclique. Perfecttransitivityisthereforeprettymuch a uselessconceptforunderstandingnetworks. However, partial transitivitycanbeveryuseful. In manynetworks, particularlysocialnetworks, thefactthatuknowsv, and vknowsw,doesn‘tguaranteethatuknowsw but makesitmuchmorelikely. Mathematics of Biological Networks

Transitivity We quantifytheleveloftransitivity in a networkasfollows: ifuknowsv, andvknowsw, thenwehave a pathuvwoftwoedges in the network. Ifu also knowsw, wesaythatthepathisclosed. Itforms a loopoflength 3, or a triangle. Wedefinetheclusteringcoefficienttobethefractionofpathsoflength 2 in thenetworkthatareclosed. C  [0,1] C = 1 impliesperfecttransitivity; C = 0 impliesnoclosedtriads (happens e.g. for a treetopologyor a squaredlattice). Mathematics of Biological Networks

Transitivity Socialnetworkstendtohavequite high valuesoftheclusteringcoefficient. E.g. thenetworkof film actorcollaborationshas C = 0.20 A networkofcollaborationsbetweenbiologistshas C = 0.09 A networkofwhosends email towhom in a large universityhas C = 0.16. These aretypicalvaluesofsocialnetworks. Mathematics of Biological Networks

Reciprocity The clusteringcoefficientmeasuresthefrequencywithwhichloopsoflength3 – triangles – appear in a network. A triangleistheshortestloop in an undirected graph. However, in a directednetwork, wecanalso have shorterloopsoflength 2. Whatisthefrequencyofoccurrenceof such loops? This ismeasuredbythereciprocitywhattellsushowlikelyitisthat a vertex thatyoupointto also pointstoyou. E.g. on the World Wide Web ifmy web page links toyour web page, how likelyisit, on average, thatyours link back againtomine? Mathematics of Biological Networks

Reciprocity The reciprocityrisdefinedasthefractionofedgesthatarereciprocated. The productofadjacencymatrixelementsAijAjiis 1 ifandonlyifthereis an edgefromi toj and an edgefromj toiandiszerootherwise. Thus, wecansumover all vertexpairsi,jtoget an expressionforthe reciprocity wheremis, asusual, the total numberofdirectededges in thenetwork. („Tr“ standsfor „trace“ = sumof diagonal elementsof a matrix.) In therightnetworkwith 7 directededges, 4arereciprocated, thusr = 4/7  0.57 Also the WWW hasr = 0.57 Mathematics of Biological Networks

Signed edges and structural balance In somesocialnetworks, andoccasionally in othernetworks, edgesareallowed tobeeither „positive“ or „negative“. E.g. in an acquaintancenetworkwecoulddenotefriendshipby a positive edge andanimosityby a negative edge. Such networksarecalledsignednetworksandtheiredgessignededges. A negative edgeis not the same astheabsenceof an edge. A negative edgeindicates, forexample, 2 peoplewhointeractregularly but dislikeeachother. The absenceof an edgerepresents 2 peoplewho do not interact. Mathematics of Biological Networks

Signed edges and structural balance 3 edges in a triangle in a signednetworkhavethefollowingpossible configurations: Configurations (a) and (b) arebalancedandhencerelativelystable. Configurations (c) and (d) areunbalancedandliableto break apart when interpretedas an acquaintancenetwork. We all knowthe „rule“ of„theenemyofmyenemyismyfriend“. In (d), thisruleisnot obeyed. Likelythese 3 enemies will simply break upandgo separate ways. Mathematics of Biological Networks

Signed edges and structural balance The featurethatdistinguishesthe 2 stableconfigurationsfromtheunstable onesisthattheyhave an evennumberof minus signsaroundtheloop. Onecanenumeratesimilarconfigurations forlongerloopswith e.g. n = 4. Surveys foundthatsocialnetworkscontain farlessunstableconfigurationsthanstable configurationswithevennumbersof minus signs. Networks containingonlyloopswithevennumbers of minus signsaresaidtoshowstructuralbalance oraresimplytermedbalanced. Mathematics of Biological Networks

Structural balance Frank Harary (see fig. right) proved: A balancednetworkcanbedividedintoconnectedgroups ofvertices such thatall connectionsbetweenmembersof the same groupare positive and all connectionsbetween membersof different groupsare negative. Shown on therightis a balanced, clusterablenetwork. Every loop in thisnetworkcontains an evennumberof minus signs. The dottedlinesindicatethedivision ofthenetworkintoclusters such that all acquaintanceswithinclustershave Networks thatcanbe positive connectionsand all acquaintancesdividedintogroups like this in different clustershave negative connectionsaretermedclusterable. http://www.cs.nmsu.edu/fnh/ Mathematics of Biological Networks

Proof of Harary’s theorem Letusstartbyconsideringconnectednetworks (theyhaveonecomponent). Westartwith an arbitraryvertexandcolorit in oneoftwocolors. Thenwecolortheotherverticesaccordingtothefollowingalgorithm: • A vertexvconnectedby a positive edgetoanotheruthathasalreadybeen coloredgetsthe same colorasu. 2. A vertexvconnectedby a negative edgetoanotheruthathasalreadybeen coloredgetscolored in theoppositecolorfromu. Formostnetworkswemaycome upon a vertexwhosecolorhasalreadybeen assigned. Then, a conflictmayarisebetweenthisalreadyassignedcolorandthenew colorthatwewould like toassignto it. Mathematics of Biological Networks

Proof of Harary’s theorem We will nowshowthatthisconflictcanonlyariseifthenetworkas a whole isunbalanced. Ifwhilecoloring a networkwearrive at a alreadycoloredvertex, there must beanotherpathbywhichthisvertexcanbereachedfromourstartingpoint. → thereis at least oneloop in thenetworktowhichthisvertexbelongs. Ifthenetworkisbalanced, everylooptowhichourvertexbelongs must have an evennumberof negative edgesaround it. Letussupposethatthecoloralreadyassignedtothevertexis in conflict withtheonewewould like toassignitnow. Thereare 2 ways in whichthiscould happen. Mathematics of Biological Networks

Proof of Harary’s theorem In case (a), vertexuiscoloredblackandwemove on toitsneighborvthatisconnectedby a positive edge but alreadycoloredwhite. Ifuandvhaveoppositecolors, thenaroundanyloopcontainingthemboth there must be an oddnumberof minus signs, so thatthecolorchanges an oddnumberoftimes. Ifthereis an oddnumberof minus signs, thenetworkis not balanced. Mathematics of Biological Networks

Proof of Harary’s theorem In theothercase (b) verticesuandvhavethe same color but theedgebetween themis negative. Ifuandvhavethe same colorthenthere must be an evennumberof minus signs aroundtherestoftheloopconnectingthem. Togetherwiththe negative edgebetweenuandvthisgivesagain an odd total numberof negative edgesaroundtheentireloop. Hence, thenetworkisagain not balanced. Mathematics of Biological Networks

Proof of Harary’s theorem In eitherway, ifweeverencounter a conflictaboutwhatcolor a vertexshould have, thenthenetwork must beunbalanced. Turnedaround, balancednetworks will not leadto such conflicts so thatwe cancolortheentirenetworkwith just twocolorsobeyingtherules. Whenthisiscompleted, wesimplydividethenetworkintocontiguousclusters ofverticesthathavethe same color. In everycluster, since all verticeshave the same color, they must bejoined by positive edges. Conversely, all edgesthatconnect different clusters must be negative sincetheclustershave different colors. This concludestheproof. Mathematics of Biological Networks 27

Proof of Harary’s theorem OurproofofHarary‘stheorem also ledto a methodforconstructingtheclusters. The proofcanbeeasilyextendedtonetworkswithmoretoonecomponent becausewecouldsimplyrepeattheproofforeachcomponentseparately. The practicalimportanceofHarary‘stheoremisbased on theobservationthat many real socialnetworksarefoundnaturallytobe in a balancedormostly balancedstate. Isthe inverse ofthistheorem also true? Are clusterablenetworksnecessarilybalanced? Mathematics of Biological Networks

Clusterable networks No. In thisnetwork, all 3 verticesdislikeeachother, so thereis an oddnumberof minus signsaroundtheloop. But thereisnoproblemdividingthenetworkinto 3 clustersofonevertexeach such thateveryonedislikesthemembersoftheotherclusters. This networkisclusterable but not balanced. Mathematics of Biological Networks

Similarity Anothercentralconcept in socialnetworksisthatofsimilaritybetweenvertices. E.g. commercialdatingservicestrytomatchpeoplewithothersbased on presumedsimilarityoftheirinterests, backgrounds, likesanddislikes. Forthis, onewouldlikelyuseattributesofthevertices. Here, we will restrictourselvestodeterminingsimilaritybetweenthevertices of a networkusingtheinformationcontained in thenetworkstructure. Therearetwo fundamental approachesforconstructingmeauresofnetwork similarity, calledstructuralequivalanceandregularequivalence. Mathematics of Biological Networks

Structural equivalence Twovertices in a networkarestructurallyequivalentiftheysharemanyofthe same networkneighbors. In thisfigure, thetwoverticesiandjshare 3 commonneighbors (twoblack verticesandonewhitevertex). Both also haveotherneighborsthatare not shared. Mathematics of Biological Networks

Regular equivalence 2 regularlyequivalentvertices do not necessarilysharethe same neighbors, but theyhaveneighbors whoarethemselvessimilar. E.g. twohistorystudents at different universitiesmay not haveanyfriends in common, but theycan still besimilar in the sense thattheybothknow a lot ofotherhistorystudents, historyinstructors, and so forth. We will focushere on mathematicalmeasurestoquantifystructuralequivalencebecausethesemeasuresareconsiderablybetterdevelopedthanthoseforregularequivalence. Mathematics of Biological Networks

Cosine similarity We will nowintroducemeasuresofstructuralequivalenceandconcentrate on undirectednetworks. The simplestandmostobviousmeasureofstructuralequivalenceis a count ofcommonneighborsnijof 2 verticesiandj: However, thisnumberisdifficultinterpret. If 2 verticeshave 3 commonneighbors, isthat a lotor a little? → weneedsome form ofnormalization. Onestrategywouldbetodividethisbythe total numberofverticesn in thenetwork, becausethisisthe maximal possiblenumberofcommonneighbors. Mathematics of Biological Networks

Cosine similarity However, thiswouldpenalizeverticeswithlowdegree. A bettermeasurewouldallowforthevaryingdegreeofvertices. Such a measureisthecosinesimilarity. In geometry, theinnerordotproductof 2 vectorsxandyisgivenby x ∙ y = |x| |y| cos, where|x| isthemagnitudeofxand the angle between the 2 vectors. Rearranging, wecanwritethecosineofthe angle as Mathematics of Biological Networks

Cosine similarity Salton proposedthatweregardthei-thandj-throws (orcolumns) ofthe adjacencymatrixastwovectorsandusethecosineofthe angle between themassimilaritymeasure. Bynotingthatthedotproductoftworowsis thisgivesusthefollowingsimilarity Assumingournetworkis an unweighted simple graph, theelementsofthe adjacencymatrixtake on onlythevalues0 and 1, so thatAij2 = Aijfor all i,j. Mathematics of Biological Networks

Cosine similarity Then wherekiisthedegreeofvertexi. Thus The cosinesimilarityofiandjisthereforethenumberofcommonneighborsnij ofthetwoverticesdividedbythegeometricmeanoftheirdegrees. In thisexample, thecosine similarityofiandjis Ifoneorbothverticeshavedegree zero, wesetij = 0. Mathematics of Biological Networks

Pearson coefficients An alternative waytonormalizethecountofcommonneighborsistocompare ittotheexpectedvaluewhenverticeschoosetheirneighbors at random. Supposeverticesiandjhavedegreeskiandkj , respectively. Howmanyneighborsshouldweexpectthemtohave? Imaginethatvertexichooseskineighborsuniformly at randomfromn possible ones (n -1 ifself-loops are not allowed). In the same manner, vertexjchooseskjrandomneighbors. Mathematics of Biological Networks

Pearson coefficients Forthefirstneighborthatjchooses, thereis a probabilityofki / nthatjchooses aneighborofi. The same probabilityappliestothenextchoices. (Weneglectthesmallprobabilityofchoosingthe same neighbortwice.) → theexpectednumberofcommonneighborsiskikj/ n A reasonablemeasureofsimilaritybetweentwoverticesistheactualnumber ofcommonneighbors minus theexpectednumbertheywouldhaveifthey chosetheirneighbors at random. Mathematics of Biological Networks

Pearson coefficients Here <Ai> denotesthemeanoftheelementsoftheithrowof theadjacencymatrix. This equationisntimesthecovariancecov(Ai,Aj) ofthetworowsofthe adjacencymatrix. Mathematics of Biological Networks

Pearson coefficients Normalizing thisaswedidwiththecosinesimilarity, so thatitrangesbetween -1 and 1, givesthestandardPearson correlationcoefficient Mathematics of Biological Networks

Homophily or Assortative Mixing Shownbelowis a friendshipnetworkamong 470 students in a US high school (ages 14-18 years). The verticesarecoloredbyrace. The figureshowsthatthe studentsaresharplydivided between a groupmainly consistingofwhitechildren andonethatcontainsmainly blackchildren. Mathematics of Biological Networks

Homophily or Assortative Mixing This observationisverycommon in socialnetworks. People haveapparently a strong tendencytoassociatewithotherswhom theyperceiveasbeingsimilartothemselves in someway. This tendencyistermedhomophilyorassortativemixing. Assortativemixingis also seen in somenonsocialnetworks, e.g. in citationnetworkswherepapersfromonefieldcitepapersfromthe same field. Howcanonequantifyassortativemixing? Mathematics of Biological Networks

Quantify assortative mixing Find thefractionofedgesthatrunbetweenverticesofthe same type andsubtractfromthisthefractionofedgeswewouldexpectifedges werepositioned at randomwithoutregardforvertex type. ci : classor type ofvertexi , ci [1 … nc] nc: total numberofclasses The total numberofedgesbetweenverticesofthe same type is Here (m,n) istheKroneckerdelta ( is 1 ifm = nand 0 otherwise). The factor ½ accountsforthefactthateveryvertex pair i,jiscounted twice in thesum. Mathematics of Biological Networks

Quantify assortative mixing Nowwe turn totheexpectednumberofedgesbetweenverticesiftheedges areplacedrandomly. Consider a particularedgeattachedtovertexiwhichhasdegreeki. Bydefinition, thereare2mendsofedges in theentirenetworkwheremisthe total numberofedges. Ifconnectionsaremaderandomly, thechancesthattheother end ofourparticularedgeisoneofthekjendsattachedtovertexjisthuskj/ 2m. Counting all kiedgesattachedtoi , the total expectednumberofedges betweenverticesiandjisthenkikj / 2m Mathematics of Biological Networks

Quantify assortative mixing The expectednumberofedgesbetween all pairsofverticesofthe same type is wherethefactor ½ avoids double-countingvertexpairs. Takingthedifferencebetweentheactualandexpectednumberofedgesgives = Typicallyonedoes not calculatethenumberof such edges but thefraction, whichisobtainedbydividingthisbym This quantity Q iscalledthemodularity. In thestudentnetwork Q=0.305. Mathematics of Biological Networks

Assortative mixing by scalar characteristics Whenconsideringnetworkswithassociatedscalarcharacteristics like age, income, orgeneexpression, wecouldstudywhethernetworkverticeswith similarvaluesofthisscalarcharacteristicstendtobeconnectedtogether moreoftenthanthosewith different values. This figureisforthe same groupof US schoolchildren, but nowas a functionofage. Eachdotcorrespondsto a pair offriends sortedbytheir grades (schoolyears). Younger children form a veryhomogenous cluster in thebottomleftcorner, olderchildren also havefriends in theother grades. Mathematics of Biological Networks

Assortative mixing by scalar characteristics A goodwayofquantifyingthistendencyisbythecovariance (noderivationhere): wherenowreplacestheKroneckersymbol. Mathematics of Biological Networks

V2: Measures and Metrics (II)