310 likes | 325 Vues
An Exploration of Power-law in Use-relation of Java Software Systems. Makoto Ichii , Makoto Matsushita, Katsuro Inoue Osaka University. Software Component Graph. A software system is composed of software components. Software component ( component ): building unit of a software system
E N D
An Exploration ofPower-law in Use-relation ofJava Software Systems Makoto Ichii, Makoto Matsushita, Katsuro Inoue Osaka University ASWEC 2008
Software Component Graph • A software system is composed of software components. • Software component (component): building unit of a software system • Complex use-relation is formed between components • Software component graph (component graph) represents use-relation between components • node: component / edge: use-relation • Various researches utilize component graphs to analyze software systems • It is important to know the nature of component graphs ASWEC 2008
Power-law distribution • A graph is characterized by the degree distribution • The graphs whose degree distribution follows the power-lawdistribution attracts attention in various research domains • Link structure of WWW pages • Hosts on the Internet • Such graphs tend to have interesting characteristics • Self similarity • Fault tolerance p(x) = Cx-α • Explore the component graphs to seek whether the degree distributions follow the power law ASWEC 2008
Questions [1-2/4] Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law? ? Q. 2 Do the in- and out-degree distributions of a component graph of multiple software systems follow the power law? ? ASWEC 2008
Questions [3-4/4] Q. 3 Do the in- and out-degree distributions of subgraph of a component graph follow the power law? ? Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs? ASWEC 2008
Definitions [1/2] Component: Java class (including interface) Use-relation: Any of the following six relation types acquired by static analysis of the component source files. • A class or an interface extends another class or interface respectively. • A class implements an interface. • A class or an interface declares a variable of a class or an interface. • A class instantiates a class object. • A class calls a method of a class or an interface. • A class or an interface references to a field variable of a class or an interface. ASWEC 2008
Definitions [2/2] Component graph: Directed simple graph • node: component • edge: use-relation between components In-(Out-)degree: The number of incoming (outgoing) edges to a node in-degree: 2 out-degree: 0 A class A { void exec() { … } } class B { … A.exec(); … } class C { … A a = new A(); … } B C in-degree: 0 out-degree: 1 in-degree: 0 out-degree: 1 ASWEC 2008
Observing the power-law • Plot cumulative frequency on log-log axis • The data forms a straight line if the distribution is the power law p(x) = Cx-α gradient : -(α-1) gradient : -α M. E. J. Newman, "Power laws, Pareto distributions and Zipf's law", Contemporary Physics 46, 323-351 (2005) in-(or out-)degree ASWEC 2008
Values shown in the experiments • α: exponent • Derive from the gradient of the regression line • R*2:the determination coefficient adjusted for the degree of freedom • Fitness of a regression model for data • [0..1] • Large value means good fitness p(x) = Cx-α gradient : -(α-1) in-(or out-)degree ASWEC 2008
Experiment 1 • Setup component sets • Each set contains a single software system • Analyze component sets to create component graphs. • Plot cumulative frequency of the degrees on log-log axis. Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law? ASWEC 2008
Result of experiment 1 / JDK • The in-degree follows the power law • The out-degree does not follow the power law ASWEC 2008
Result of experiment 1 / ECLIPSE • The similar characteristics with JDK • The in-degree follows the power law • The out-degree does not follow the power law ASWEC 2008
Experiment 2 • Setup component sets • Each set contains multiple software systems • Use-relation across the systems exists • Analyze component sets to create component graphs. • Plot cumulative frequency of the degrees on log-log axis. Q. 2 Do the in- and out-degree distributions of a component graph for multiple software systems follow the power law? ASWEC 2008
Result of experiment 2 / ASF • The similar characteristics with Exp. 1 • The in-degree follows the power law • The out-degree does not follow the power law ASWEC 2008
Result of experiment 2 / SPARS_DB • The similar characteristics with Exp. 1 • The in-degree follows the power law • The out degree does not follow the power-law completely • In-degree distribution fits to the power-law straight line almost ideally. ASWEC 2008
Experiment 3 • Construct subsets of SPARS_DB • Keyword: The components that contain a specified keyword in the source code • The keywords are randomly selected so that the number of resulting components is about 1,000/10,000 • Random: 1,000/10,000 random components • Analyze component sets to create component graphs. • Plot cumulative frequency of the degrees on log-log axis. Q. 3 Do the in- and out-degree distributions of subgraph of a component graph for software systems follow the power law? ASWEC 2008
Result of experiment 3 / KWD1K • The similar characteristics with SPARS_DB • The in-degree follows the power law • The out-degree does not follow the power law ASWEC 2008
Result of experiment 3 / KWD10K • The similar characteristics with SPARS_DB • The in-degree follows the power law • The out-degree does not follow the power law ASWEC 2008
Result of experiment 3 / RND1K • The original characteristics is almost lost ASWEC 2008
Result of experiment 3 / RND10K • The similar characteristics with SPARS_DB, however • # of edges is small ASWEC 2008
Experiment 4 • List top-ten components in the in- and out-degree • Calculate correlation between degrees and metric values. • Spearman's rank correlation coefficient • Target: SPARS_DB Q. 4What aspects of components affects the in- and out-degree distribution of component graphs? ASWEC 2008
Result of experiment 4 / In-degree • Top-ten components • The components that have fundamental/general role • Correlation with metrics • In-degree have low correlation with the metrics • The in-degree relates to the role ASWEC 2008
Result of experiment 4 / Out-degree • Top-ten components • Simply large/complex classes • Correlation with metrics • High correlation with LOC and WMC • The out-degree relates to the size/complexity of a component ASWEC 2008
Answers: summary of experiments [1/4] Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law? • The in-degree follows the power law • The out-degree does not follow the power law • Mixture of the power-law distribution and the lognormal distribution ASWEC 2008
Answers: summary of experiments [2/4] Q. 2 Do the in- and out-degree distributions of a component graph for multiple software systems follow the power law? • The in-degree follows the power law • The out-degree does not follow the power law • The similar results with that of single software systems ASWEC 2008
Answers: summary of experiments [3/4] Q. 3 Do the in- and out-degree distributions of subgraph of a component graph for software systems follow the power law? • Depends on how the subgraph is created. • Keyword-based subgraph has similar characteristics with the superset • Related components likely share words • Random-selection-based subgraph with small number of nodes has different characteristics • Few edges exist. ASWEC 2008
Answers: summary of experiments [4/4] Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs? • In-degree relates to the roles of components • Most of the components are used at the specific part • Components with fundamental/general role are used from everywhere • The more the size of component set grows, the more the value of in-degree becomes large. • Out-degree relates to size/complexity of components • Many components have reasonable size/complexity • Some components may have relatively large size/complexity • Extremely large components are unreasonable ASWEC 2008
Summary • Component graphs are investigated to seek whether the in- and out-degree distribution follows the power-law • As the results, following characteristics are revealed. • The in-degree distribution follows the power-law • The in-degree of a component relates to the role of the component • The out-degree distribution does not follows the power-law • The out-degree of a component relates to the size/complexity of the component • Some sort of subgraph of a component graph have the same characteristics of degree distribution with the graph. • Future works • Explore the other types of component graph ASWEC 2008
+ ASWEC 2008
Discussion • Generative models of a power-law graph • If a node is added to a graph, the nodes with large degree tend to get the edge to the new node. • “rich get richer” • Meanings for component graphs • If a new component is added to (developed for) a software system, the new component uses the component that is already used by many components • The members of frequently-used components hardly change even if the software development proceeds • If the member changes, it means that the fundamental structure (design, architecture) of the software is changed ASWEC 2008