390 likes | 402 Vues
This paper introduces a multi-index evaluation algorithm based on Locally Linear Embedding (LLE) for determining the importance of nodes in complex networks. The algorithm utilizes characteristic indices such as degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, and mutual information to evaluate the importance of nodes. The LLE technique helps in mapping high-dimensional data into a lower-dimensional manifold space, preserving the neighbor relationships. Experimental simulations and analyses are conducted to validate the effectiveness of the proposed algorithm.
E N D
Multi-index Evaluation Algorithm Based on Locally Linear Embedding for the Node importance in Complex Networks Fang Hu Email:naomifang@mails.ccnu.edu.cn Fang Hu, Yuhua Liu ,Jianzhi Jin Website: www.ccnulyh.com Department of Computer Science, Central China Normal University, Wuhan 430079,Hubei, China
Outline Introduction LLE introduction and index concept definitions Multi-index evaluation algorithm based on LLE for the node importance Simulation and Analysis Conclusions
Introduction The single-index analysis method was evaluating node importance in complex networks by analyzing characteristic indices of nodes. such as degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, mutual-information, etc. researchers proposed many evaluation algorithms based on multi-index to evaluate important nodes in complex networks.
LLE Introduction (1/5) Scientists interested in exploratory analysis or visualization of multivariate data face a similar problem in dimensionality reduction . Locally linear embedding (LLE) was a nonlinear dimensionality reduction technique recently presented by Roweis and Saul.
LLE Introduction (2/5) LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima. LLE is a manifold learning technique which aims at mapping high-dimensional data into a low-dimensional manifold space by preserving neighbors. By exploiting the local symmetries of linear reconstructions, LLE is able to learn the global structure of nonlinear manifolds, such as those generated by facial recognition, image-processing, fault diagnosis and so on.
The problem involves mapping high-dimensional inputs into a low-dimensional “description” space with as many coordinates as observe modes of variability. LLE algorithm eliminates the need to estimate pairwise distances between widely separated data points, and recovers global nonlinear structure from locally linear fits. LLE Introduction (3/5)
LLE Introduction (4/5) Fig. 1. Example of Locally Linear Embedding
The problem of nonlinear dimensionality reduction, as illustrated for three-dimensional data (B) sampled from a three-dimensional manifold (A). An unsupervised learning algorithm must discover the global internal coordinates of the manifold without signals that explicitly indicate how the data should be embedded in two dimensions. The color coding illustrates the neighborhood-preserving mapping discovered by LLE; black outlines in (B) and (C) show the neighborhood of a single point. LLE Introduction (5/5)
Index Concept Definitions A large number of centrality measures have been proposed to identify important nodes within a graph and a complex network. Typical examples are degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, and Mutual-information, etc. Many researches and experiments prove that these indices can efficiently reflect node importance in different perspectives. These indices are chosen as the parameters for multi-index evaluation in this paper.
Index Definition (1/5) 1.DegreeCentrality The degree centrality of node , denoted as , is defined as (1) where is the degree of node , which is defined as the number of ties that node has. This value is used to normalize the degree centrality value, is the total number of nodes.
Index Definition (2/5) 2.Betweenness Centrality The betweenness centrality of node , defined as , is defined as (2) where is the number of the shortest paths between node and node , and is the number of those paths that go through node . This value is used to normalize the degree centrality value, is the total number of nodes.
Index Definition (3/5) 3. Closeness Centrality The closeness centrality of node , defined as , is defined as (3) where is the shortest path between node and node . This value is used to normalize the closeness is the total number of nodes. centrality value,
Index Definition (4/5) 4.Eigenvector Centrality , For node the eigenvector centrality score is proportional to the sum of the scores of all nodes which are connected to it, i.e. (4) , where denotes the score of the node is the adjacency matrix of the network, is the total number of nodes, and is a constant. In vector notation, this can be rewritten as , or as the eigenvector equation .
Index Definition (5/5) 5.Mutual-information The mutual-information of node is the sum of the mutual information between node and other nodes which are connected to it, i.e. (5) where is the degree of node . Mutual-information uses information theory to assess the importance of nodes which represents the amount of information each . node contains.
multi-index evaluation algorithm based on LLE for the node importance • Recently most algorithms evaluating node importance are according to single-index in complex network. • Because single-index is one-sided and unstable, it is difficult to reflect the whole situation in complex network. • Synthesizing multi-index factors of node importance, and applying idea of the multi-objective optimization, a new multi-index evaluation algorithm based on LLE for the node importance in complex network is proposed. • In this algorithm, high-dimensional data is mapped into a low-dimensional space by preserving neighbours.
Steps of the Algorithm(1/6) The principle of LLE algorithm is that it is given a set of points points in a high denote dimensional space, the LLE will find a new set of coordinates in a low dimensional space, satisfying the same neighbor-relations as the original points.
Steps of the Algorithm(2/6) Step 1 According to the index definitions above, the value of each index vector is calculated in complex , i.e. networks and construct matrix where is the number of nodes, is the number of evaluation indices in complex networks.
Steps of the Algorithm(3/6) , Step 2 For each data point find its nearest neighbors by using Euclidean distance, which is the length of the line segment connecting two points. Step 3 Compute the weights that best linearly reconstruct each data from its neighbors , minimizing the following cost function, (6) under the constraints that each vector of weights sums to unity.
Steps of the Algorithm(4/6) Step 4 Construct the optimal low dimensional embedding for , in which the local linear geometry of the high-dimensional data is best preserved by the reconstruction weights of the data in . This step is accomplished by minimizing the following cost function for the fixed weights , (7) subject to the following constraints,
Steps of the Algorithm(5/6) (8) To optimize the embedding error, we can rewrite it in the following quadratic form, (9) . The square Based on inner products of the outputs is a sparse, symmetric and semi-positive matrix, ,matrix is given by,
Steps of the Algorithm(6/6) (10) for which is an element of the identity matrix. The constrained minimization problem can be converted to solving and eigen-decomposition of the matrix as calculated below, (11) for which the eigenvectors associated with the bottom nonzero eigenvalues constitute the final embedding outputs .
Computable Complexity • The computable complexity of calculating degree centrality and mutual-information is , where is the number of nodes in complex network; calculating eigenvector centrality is ; calculating betweenness centrality and closeness is . • In LLE algorithm, choosing neighbors needs , where is the dimension of high-dimensional sample; calculating the reconstruction weights is , where is the number of neighbours; getting d-dimensional embedding is ; so the computable complexity of LLE algorithm is . • Finally, based on the analysis above, the computable complexity is .
Simulation Example(1/7) • Software: • Matlab, R, VC++ • Pajek, Gephi, NodeXL, SigmaPlot, UCINET • SPSS, SAS, ORIGINS • Windows, Linux • Data Set: • Real-world Networks • Computer-Generated Networks • Clinical data, Medical data • TCM(Traditional Chinese Medicine) data
Simulation Example(2/7) a. The Zachary’s Karate Club network b. The Bottlenose Dolphin network
Simulation Example(3/7) c、The American College Football network
Simulation Example(4/7) d、The network of interactions between major characters in the novel Les Mis´erables by Victor Hug
Simulation Example(5/7) E、Krebs’ network of books on American politics
Simulation Example(6/7) ARPA NETWORK Fig. 2. ARPA’s topology In this paper, ARPA network topology is used to analyze and illustrate the multi-index evaluation algorithm based on LLE. ARPA is the trunk topology in North America, composed of 21 nodes and 26 edges. Although an arbitrary node is removed from ARPA, the network is still connected.
Simulation Example(7/7) KARATE CLUB Fig. 3. Karate’s topology Wayne Zachary observed social interactions between the members of a karate club at an American university. After a long study, he built the network consisting of 39 members of the karate club as nodes and 78 edges representing friendship between the members of the club.
Analysis (1/7) Fig. 4(a) Nodes’ importance contrast line graph according to mutual-information, closeness, and LLE in ARPA
Analysis (2/7) Fig. 4(b) Nodes’ importance contrast line graph according to degree, betweenness, PageRank, and LLE in ARPA
Analysis (3/7) • The result of this proposed algorithm is that the most important nodes are , and , which is identical to the results of many single-index algorithms, such as degree centrality, eigenvector centrality and mutual-information. • Betweenness centrality and closeness centrality can identify correctly that the most important node is . • The secondary important nodes are , , and , which is identical to the results of eigenvector centrality, degree centrality and mutual-information. • The arrangement result of betweenness centrality represents that many nodes communicate with others via ; closeness centrality represents that is closer to the center of network.
Analysis (4/7) Fig. 5 Nodes’ importance contrast line graph according to PCA and LLE in ARPA
Analysis (5/7) Fig. 6(a) Nodes’ importance contrast line graph according to mutual-information, closeness, and LLE in Karate club
Analysis (6/7) Fig. 6(b) Nodes’ importance contrast line graph according to degree, betweenness, PageRank, and LLE in Karate club
Analysis (7/7) • The result of this proposed algorithm identifies the most five important nodes are , , , and , in which the and are generally considered as the most two important nodes and usually as the core nodes to detecting communities in complex networks. • This result of this new algorithm is identical to the results of the degree centrality, betweenness centrality, eigenvector centrality and mutual-information. • The closeness centrality can not identical the most two important nodes accurately, because the node is a wrong identification.
Simulation And Analyze Through the simulation and analysis above, it represents that the result acquired from this proposed algorithm is basically identical, or to some extent, is more careful and reasonable than other single-index algorithms and multi-index evaluation algorithm based on PCA.
Conclusions • In this paper, a new multi-index evaluation algorithm based on Locally Linear Embedding for node important in complex networks is proposed, which is simple and effective, and synthesizes the statistic characteristics of nodes in complex network. • This proposed algorithm maps high-dimensional data into a low-dimensional space by preserving neighbors. • By example analysis and simulation experiments, it shows that this algorithm can effectively reflect the differences of node importance, accurately and efficiently find the important node in complex network. • This method can be extended to directed and weighted networks in future work.