1 / 50

INFO 631 Prof . Glenn Booker

INFO 631 Prof . Glenn Booker. Week 3 – Complexity Metrics and Models. Origin. Complexity metrics were developed by computer scientists and software engineers Strongly based on empirical (real world) measurement, with little theory Primarily broken into internal and external measures.

talisa
Télécharger la présentation

INFO 631 Prof . Glenn Booker

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INFO 631Prof. Glenn Booker Week 3 – Complexity Metrics and Models INFO631 Week 3

  2. Origin • Complexity metrics were developed by computer scientists and software engineers • Strongly based on empirical (real world) measurement, with little theory • Primarily broken into internal and external measures INFO631 Week 3

  3. Internal versus External • Internal measures describe the complexity within a module (number of decisions, loops, calculations, etc.) • External measures describe relationships among modules (program or function calls, external file activities, input/output, etc.) INFO631 Week 3

  4. Internal Measures INFO631 Week 3

  5. Internal Product Attributes • Size measures • Input to prediction models • Normalizing factor for cost, productivity, etc. • Progress during development • Typically use lines of code (LOC) or function point counts; • LOC is a better measure for predicting cost and schedule INFO631 Week 3

  6. Lines of Code • Simple complexity metric, often based on number of executable statements or instruction statements • Highest defect rates often occurs in small modules • Larger modules have a smaller defect rate (if they exist at all) - until too cumbersome • Optimum module size ~ 250 lines INFO631 Week 3

  7. Function Points • Function points help avoid biases due to the programming language(s) used • Provide a more “fair” basis for comparing different environments • Focuses on how much work the program accomplishes, not how concisely it is expressed INFO631 Week 3

  8. Halstead Metrics • Also known as Software Science, 1977 • Examine program as compilable “tokens” • Tokens are either operators (+, -) or operands (variables) • Derive metrics such as Vocabulary, Length, Volume, Difficulty, etc. • Not widely used INFO631 Week 3

  9. Data Structure (Halstead) • Halstead’s 2 - number of distinct operands in a module • Operands include: number of variables, number unique constants, and number of labels • Operand usage (OU) • OU = 2/N2 where N2 is the total number of operand references INFO631 Week 3

  10. Software Complexity • Is a characteristic that influences the resources needed to build and maintain it • Many different characteristics of software relate to complexity • These complexity characteristics revolve around the structure of the software INFO631 Week 3

  11. Types of Structural Measures • Control flow • Addresses sequence in which instructions are executed • Iteration and looping • Data flow • Follows trail of data as it is created and handled • Depicts behavior of data as it interacts with the program INFO631 Week 3

  12. Types of Structural Measures • Data structure • Concerned with organization of data itself • Provides information about difficulties in handling data and in defining test cases INFO631 Week 3

  13. Control Flow • Modeled by directed graphs (control flow graphs) • Each node corresponds to a single program statement • Arcs (directed edges) indicate flow of control from one statement to another INFO631 Week 3

  14. Control Flow • Control flow graphs are useful for: • Analysis (estimating number of defects) • Expressing complexity by a single value • Assessing testability and test coverage INFO631 Week 3

  15. Note: t=true f=false A t f If A then X If A then X else Y X Y A t Case A of a1 : X1 . . an : Xn f X A a1 an a2 ... X1 Xn X2 While A do X Repeat X until A X A f f t A t X Basic Control Constructs INFO631 Week 3

  16. Cyclomatic Complexity • McCabe, 1976 • Based on a program’s control flow chart • Related to number of separate graphable areas, or number of linearly independent paths in the program • Complexity MC = edges - nodes + 2*(# of unconnected paths) INFO631 Week 3

  17. Cyclomatic Complexity • Complexity under 10 generally desired • Can also find M as number of binary decisions (yes/no) minus one • Multiple choice decisions with ‘n’ choices count as (n-1) binary decisions • Ignores differences among specific types of control structures INFO631 Week 3

  18. Cyclomatic Complexity • Uses of complexity metric: • Identify complex modules needing detailed inspection or redesign • Identify simple modules needing minimal inspection and/or testing • Estimate programming, testing and maintenance effort • Identify potentially troublesome code INFO631 Week 3

  19. Control Flow Representation of Programs • Software programs can be represented by linear directed segments combined with the basic control flow constructs • Control flow constructs may be nested, e.g. an IF statement can be inside of a WHILE loop INFO631 Week 3

  20. McCabe cyclomatic complexity (MC) - counts the number of linearly independent paths through a program 1 2 3 4 10 MC = # of edges - # of nodes +2 5 13 6 11 Linearly independent paths for example <2, 11> <2, 10, 12, 14> <2, 10, 12, 13, 12, 14> <1, 3, 5, 6, 9> <1, 4, 6,9> <1, 4, 6, 7, 8, 9> 12 7 8 14 9 Control Flow Representation of Programs • Example: INFO631 Week 3

  21. a 1 2 d b 6 5 3 4 7 e 8 c f 9 10 g MC = edges - nodes + 2 = 10 - 7 + 2 = 5 Control Flow--Linearly Independent Paths Set of linearly independent paths: b1: abcg b2: abcbcg b3: abefg b4: adefg b5: adfg Any arbitrary path is equal to a linear combination of the linearly independent paths listed above For example, path abcbefg is equal to: b2 + b3 - b1 INFO631 Week 3

  22. IF (TIME) 30,30,10 10 CALL TEMP1 IF (X1) 20,20,40 20 Y1=Y+1 Y2=0 CALL TEMP2 GO TO 50 30 Z1=1 40 CALL TEMP3 Z2=Z2+1 50 CALL TEMP4 How many are here? Knots - Control Flow Crossovers • Knot measure -- total number of points at which control flow lines cross INFO631 Week 3

  23. Syntactic Constructs • Examine effect of using specific control structures on defect rate • Is, by definition, language-specific • Can result in statistically significant relationships • e.g. Lo used to show that DO WHILE should be avoided in COBOL INFO631 Week 3

  24. External Measures INFO631 Week 3

  25. Computational Complexity • Examines algorithmic efficiency and use of machine resources (memory, I/O, storage) • Studies quantitative aspects of solutions to computational problems • Examples may include sorting efficiency for a database, managing I/O constraints across a large scale network, etc. INFO631 Week 3

  26. Psychological Complexity • Concerned with characteristics of software that affect human performance • Injection of defects (when and why does a programmer make errors?) • Ease of building the software (effort required) • Ease of maintenance (effort required) INFO631 Week 3

  27. Data Structure (Database) • Database size per program size (DBSPPS) • DBSPPS = DBS/PS • Where DBS is database size in bytes or characters • PS is program size in source instructions • Used in COCOMO model as a cost driver • Ordinal scale measure derived from DBSPPS INFO631 Week 3

  28. Fan-in and Fan-out • Focus is the interaction among code modules • Fan-in = # of modules which call a given module • Fan-out = # of modules which are called by a given module • Or, more formally... INFO631 Week 3

  29. Fan-in and Fan-out • Fan-in of a module is the number of local flows terminating at the module, plus the number of data structures from which info is retrieved by the module • Fan-out of a module is the number of local flows that emanate from the module, plus the number of data structures (tables, arrays) that are updated by the module INFO631 Week 3

  30. Fan-in and Fan-out • Do fan-in and fan-out affect software quality? • Large fan-in modules may be interpolation or look-up routines - no defect correlation • Large fan-out often relates to high defect rate - has a high defect correlation • Is large fan-in and fan-out bad? INFO631 Week 3

  31. Fan-in and Fan-out • Information flow complexity • Henry and Kafura: Size*(fan-in * fan-out)2 • Shepperd: (fan-in * fan-out)2 • Henry and Kafura measure helps predict the number of software maintenance problems Henry, S. and D. Kafura, IEEE Transactions on Software Engineering, 1981. SE-7(5): p. 510-518 Shepperd, M. 1990. Software Engineering Journal 5, 1 (January), pp. 3-10. INFO631 Week 3

  32. Structure Metrics • Shepperd measure correlates with software development time • Information flow metric (Henry & Selig) HC = C * (fan-in * fan-out)^2 • where C is the cyclometric complexity INFO631 Week 3

  33. Structure Metrics • System complexity (Card & Glass) • Based on structural complexity (average fan-out squared) and data complexity (based on number of I/O variables and fan-out) • Quantified effect of complexity on error rate INFO631 Week 3

  34. Module Call Graph • Module - a contiguous sequence of program statements, bounded by boundary elements, having an aggregate identifier • Or, a distinct, named group of LOC • The module call graph shows which modules call each other, and what key information is passed among them INFO631 Week 3

  35. Main scores scores eof Read_Scores Average scores average average Find_Ave Print_Ave Module Call Graph example INFO631 Week 3

  36. Number of Interconnections ANCPM = Number of Modules Number of Modules that call FMC = Number of Modules Module Coupling Measures • Average number of calls per module (ANCPM) • Fraction of modules that make calls (FMC) INFO631 Week 3

  37. Information Flow Measures • Types of information flows • Local direct flow • Module invokes a 2nd module & passes info to it • Invoked module returns result to the caller • Local indirect flow • Invoked module returns info that is subsequently passed to a second invoked module • Global flow • Info flows from one module to another via a global data structure INFO631 Week 3

  38. IEEE-STD-982 • Number of Entries and Exits per Module, ‘m’ • Like fan-in and fan-out m = entries + exits • Software Science measures INFO631 Week 3

  39. IEEE-STD-982 • Graph-Theoretic Complexity • Static ComplexityC = Edges - Nodes + 1 • Generalized Static ComplexityBased on summing resources needed for each module (e.g. storage, access time, etc.) • Dynamic complexityComplexity as it changes over time across a network INFO631 Week 3

  40. IEEE-STD-982 • Cyclomatic complexity • Minimal Unit Test Case Determination • Determine number of independent paths through a module, to get minimum number of test cases for unit testing • Data or information flow complexity • Fan-in and fan-out of variables INFO631 Week 3

  41. IEEE-STD-982 • Design Structure adds weighted (%) average of six parameters: • Whether designed top down (Y/N) • Module inter-dependence • Module dependence on prior processing • Database size (# of elements) • Database compartmentalization • Module single entrance and exit (Y/N) • Weighting is chosen to meet project needs INFO631 Week 3

  42. Other Measures • Compiler measures • Size (bytes of compiled code) • Number of symbols and variables • Cross-reference of all labels • Statement count INFO631 Week 3

  43. Other Measures • Configuration Management Library Measures • Number of code modules • Number of versions of each module • History of change dates of each module • Module size • Number of related documents for each module INFO631 Week 3

  44. Availability Metrics • Most information systems are critical to day-to-day operations • Witness Google or Blackberry being offline for mere minutes is news • Availability depends on 1) how often the system goes down, and 2) how long it takes to restore it after a crash INFO631 Week 3

  45. Availability Metrics • Perfect availability (100%) is nice to dream of, but realistically, higher reliability is more expensive • Often measure availability by the number of 9’s in the desired level of availability • Two nines is 99%, three nines is 99.9%, four nines is 99.99%, etc. • How many nines can you afford? INFO631 Week 3

  46. Availability Metrics INFO631 Week 3

  47. Achieving High Availability • Many techniques are used to help ensure that high levels of availability are possible • Duplicate systems (clustering) • RAID data duplication • Duplicate power supplies • Independent power supplies • Uninterruptible power supplies (UPS’) INFO631 Week 3

  48. Availability and Code Quality • Capers Jones demonstrated a clear connection between code quality (defect rate) and the corresponding mean time to failure (MTTF), which is a key aspect of availability • Consistent methods for measurement and definitions of terms are needed for further refinement INFO631 Week 3

  49. Customer Outage Data • In order to determine availability, the actual customer-visible system outage time needs to be collected • In order to get this data, the customer must place a very high priority on availability • This data could be used to identify software components which most reduce availability INFO631 Week 3

  50. Availability • We also expect that availability for a new system should increase over the first couple years of its use • Defect causal analysis can help reduce the root cause of defects, thereby improving availability INFO631 Week 3

More Related