INFO 631 Prof . Glenn Booker

INFO 631Prof. Glenn Booker Week 3 – Complexity Metrics and Models INFO631 Week 3

Origin • Complexity metrics were developed by computer scientists and software engineers • Strongly based on empirical (real world) measurement, with little theory • Primarily broken into internal and external measures INFO631 Week 3

Internal versus External • Internal measures describe the complexity within a module (number of decisions, loops, calculations, etc.) • External measures describe relationships among modules (program or function calls, external file activities, input/output, etc.) INFO631 Week 3

Internal Measures INFO631 Week 3

Internal Product Attributes • Size measures • Input to prediction models • Normalizing factor for cost, productivity, etc. • Progress during development • Typically use lines of code (LOC) or function point counts; • LOC is a better measure for predicting cost and schedule INFO631 Week 3

Lines of Code • Simple complexity metric, often based on number of executable statements or instruction statements • Highest defect rates often occurs in small modules • Larger modules have a smaller defect rate (if they exist at all) - until too cumbersome • Optimum module size ~ 250 lines INFO631 Week 3

Function Points • Function points help avoid biases due to the programming language(s) used • Provide a more “fair” basis for comparing different environments • Focuses on how much work the program accomplishes, not how concisely it is expressed INFO631 Week 3

Halstead Metrics • Also known as Software Science, 1977 • Examine program as compilable “tokens” • Tokens are either operators (+, -) or operands (variables) • Derive metrics such as Vocabulary, Length, Volume, Difficulty, etc. • Not widely used INFO631 Week 3

Data Structure (Halstead) • Halstead’s 2 - number of distinct operands in a module • Operands include: number of variables, number unique constants, and number of labels • Operand usage (OU) • OU = 2/N2 where N2 is the total number of operand references INFO631 Week 3

Software Complexity • Is a characteristic that influences the resources needed to build and maintain it • Many different characteristics of software relate to complexity • These complexity characteristics revolve around the structure of the software INFO631 Week 3

Types of Structural Measures • Control flow • Addresses sequence in which instructions are executed • Iteration and looping • Data flow • Follows trail of data as it is created and handled • Depicts behavior of data as it interacts with the program INFO631 Week 3

Types of Structural Measures • Data structure • Concerned with organization of data itself • Provides information about difficulties in handling data and in defining test cases INFO631 Week 3

Control Flow • Modeled by directed graphs (control flow graphs) • Each node corresponds to a single program statement • Arcs (directed edges) indicate flow of control from one statement to another INFO631 Week 3

Control Flow • Control flow graphs are useful for: • Analysis (estimating number of defects) • Expressing complexity by a single value • Assessing testability and test coverage INFO631 Week 3

Note: t=true f=false A t f If A then X If A then X else Y X Y A t Case A of a1 : X1 . . an : Xn f X A a1 an a2 ... X1 Xn X2 While A do X Repeat X until A X A f f t A t X Basic Control Constructs INFO631 Week 3

Cyclomatic Complexity • McCabe, 1976 • Based on a program’s control flow chart • Related to number of separate graphable areas, or number of linearly independent paths in the program • Complexity MC = edges - nodes + 2*(# of unconnected paths) INFO631 Week 3

Cyclomatic Complexity • Complexity under 10 generally desired • Can also find M as number of binary decisions (yes/no) minus one • Multiple choice decisions with ‘n’ choices count as (n-1) binary decisions • Ignores differences among specific types of control structures INFO631 Week 3

Cyclomatic Complexity • Uses of complexity metric: • Identify complex modules needing detailed inspection or redesign • Identify simple modules needing minimal inspection and/or testing • Estimate programming, testing and maintenance effort • Identify potentially troublesome code INFO631 Week 3

Control Flow Representation of Programs • Software programs can be represented by linear directed segments combined with the basic control flow constructs • Control flow constructs may be nested, e.g. an IF statement can be inside of a WHILE loop INFO631 Week 3

McCabe cyclomatic complexity (MC) - counts the number of linearly independent paths through a program 1 2 3 4 10 MC = # of edges - # of nodes +2 5 13 6 11 Linearly independent paths for example <2, 11> <2, 10, 12, 14> <2, 10, 12, 13, 12, 14> <1, 3, 5, 6, 9> <1, 4, 6,9> <1, 4, 6, 7, 8, 9> 12 7 8 14 9 Control Flow Representation of Programs • Example: INFO631 Week 3

a 1 2 d b 6 5 3 4 7 e 8 c f 9 10 g MC = edges - nodes + 2 = 10 - 7 + 2 = 5 Control Flow--Linearly Independent Paths Set of linearly independent paths: b1: abcg b2: abcbcg b3: abefg b4: adefg b5: adfg Any arbitrary path is equal to a linear combination of the linearly independent paths listed above For example, path abcbefg is equal to: b2 + b3 - b1 INFO631 Week 3

IF (TIME) 30,30,10 10 CALL TEMP1 IF (X1) 20,20,40 20 Y1=Y+1 Y2=0 CALL TEMP2 GO TO 50 30 Z1=1 40 CALL TEMP3 Z2=Z2+1 50 CALL TEMP4 How many are here? Knots - Control Flow Crossovers • Knot measure -- total number of points at which control flow lines cross INFO631 Week 3

Syntactic Constructs • Examine effect of using specific control structures on defect rate • Is, by definition, language-specific • Can result in statistically significant relationships • e.g. Lo used to show that DO WHILE should be avoided in COBOL INFO631 Week 3

External Measures INFO631 Week 3

Computational Complexity • Examines algorithmic efficiency and use of machine resources (memory, I/O, storage) • Studies quantitative aspects of solutions to computational problems • Examples may include sorting efficiency for a database, managing I/O constraints across a large scale network, etc. INFO631 Week 3

Psychological Complexity • Concerned with characteristics of software that affect human performance • Injection of defects (when and why does a programmer make errors?) • Ease of building the software (effort required) • Ease of maintenance (effort required) INFO631 Week 3

Data Structure (Database) • Database size per program size (DBSPPS) • DBSPPS = DBS/PS • Where DBS is database size in bytes or characters • PS is program size in source instructions • Used in COCOMO model as a cost driver • Ordinal scale measure derived from DBSPPS INFO631 Week 3

Fan-in and Fan-out • Focus is the interaction among code modules • Fan-in = # of modules which call a given module • Fan-out = # of modules which are called by a given module • Or, more formally... INFO631 Week 3

Fan-in and Fan-out • Fan-in of a module is the number of local flows terminating at the module, plus the number of data structures from which info is retrieved by the module • Fan-out of a module is the number of local flows that emanate from the module, plus the number of data structures (tables, arrays) that are updated by the module INFO631 Week 3

Fan-in and Fan-out • Do fan-in and fan-out affect software quality? • Large fan-in modules may be interpolation or look-up routines - no defect correlation • Large fan-out often relates to high defect rate - has a high defect correlation • Is large fan-in and fan-out bad? INFO631 Week 3

Fan-in and Fan-out • Information flow complexity • Henry and Kafura: Size*(fan-in * fan-out)2 • Shepperd: (fan-in * fan-out)2 • Henry and Kafura measure helps predict the number of software maintenance problems Henry, S. and D. Kafura, IEEE Transactions on Software Engineering, 1981. SE-7(5): p. 510-518 Shepperd, M. 1990. Software Engineering Journal 5, 1 (January), pp. 3-10. INFO631 Week 3

Structure Metrics • Shepperd measure correlates with software development time • Information flow metric (Henry & Selig) HC = C * (fan-in * fan-out)^2 • where C is the cyclometric complexity INFO631 Week 3

Structure Metrics • System complexity (Card & Glass) • Based on structural complexity (average fan-out squared) and data complexity (based on number of I/O variables and fan-out) • Quantified effect of complexity on error rate INFO631 Week 3

Module Call Graph • Module - a contiguous sequence of program statements, bounded by boundary elements, having an aggregate identifier • Or, a distinct, named group of LOC • The module call graph shows which modules call each other, and what key information is passed among them INFO631 Week 3

Main scores scores eof Read_Scores Average scores average average Find_Ave Print_Ave Module Call Graph example INFO631 Week 3

Number of Interconnections ANCPM = Number of Modules Number of Modules that call FMC = Number of Modules Module Coupling Measures • Average number of calls per module (ANCPM) • Fraction of modules that make calls (FMC) INFO631 Week 3

Information Flow Measures • Types of information flows • Local direct flow • Module invokes a 2nd module & passes info to it • Invoked module returns result to the caller • Local indirect flow • Invoked module returns info that is subsequently passed to a second invoked module • Global flow • Info flows from one module to another via a global data structure INFO631 Week 3

IEEE-STD-982 • Number of Entries and Exits per Module, ‘m’ • Like fan-in and fan-out m = entries + exits • Software Science measures INFO631 Week 3

IEEE-STD-982 • Graph-Theoretic Complexity • Static ComplexityC = Edges - Nodes + 1 • Generalized Static ComplexityBased on summing resources needed for each module (e.g. storage, access time, etc.) • Dynamic complexityComplexity as it changes over time across a network INFO631 Week 3

IEEE-STD-982 • Cyclomatic complexity • Minimal Unit Test Case Determination • Determine number of independent paths through a module, to get minimum number of test cases for unit testing • Data or information flow complexity • Fan-in and fan-out of variables INFO631 Week 3

IEEE-STD-982 • Design Structure adds weighted (%) average of six parameters: • Whether designed top down (Y/N) • Module inter-dependence • Module dependence on prior processing • Database size (# of elements) • Database compartmentalization • Module single entrance and exit (Y/N) • Weighting is chosen to meet project needs INFO631 Week 3

Other Measures • Compiler measures • Size (bytes of compiled code) • Number of symbols and variables • Cross-reference of all labels • Statement count INFO631 Week 3

Other Measures • Configuration Management Library Measures • Number of code modules • Number of versions of each module • History of change dates of each module • Module size • Number of related documents for each module INFO631 Week 3

Availability Metrics • Most information systems are critical to day-to-day operations • Witness Google or Blackberry being offline for mere minutes is news • Availability depends on 1) how often the system goes down, and 2) how long it takes to restore it after a crash INFO631 Week 3

Availability Metrics • Perfect availability (100%) is nice to dream of, but realistically, higher reliability is more expensive • Often measure availability by the number of 9’s in the desired level of availability • Two nines is 99%, three nines is 99.9%, four nines is 99.99%, etc. • How many nines can you afford? INFO631 Week 3

Availability Metrics INFO631 Week 3

Achieving High Availability • Many techniques are used to help ensure that high levels of availability are possible • Duplicate systems (clustering) • RAID data duplication • Duplicate power supplies • Independent power supplies • Uninterruptible power supplies (UPS’) INFO631 Week 3

Availability and Code Quality • Capers Jones demonstrated a clear connection between code quality (defect rate) and the corresponding mean time to failure (MTTF), which is a key aspect of availability • Consistent methods for measurement and definitions of terms are needed for further refinement INFO631 Week 3

Customer Outage Data • In order to determine availability, the actual customer-visible system outage time needs to be collected • In order to get this data, the customer must place a very high priority on availability • This data could be used to identify software components which most reduce availability INFO631 Week 3

Availability • We also expect that availability for a new system should increase over the first couple years of its use • Defect causal analysis can help reduce the root cause of defects, thereby improving availability INFO631 Week 3

INFO 631 Prof . Glenn Booker