Software Development Process and Metrics: A Comprehensive Overview

Lecture topics • Software process • Software project metrics • Software project management

Does software have a life? • Software lifecycle is the sequence of stages the software goes through during its “lifetime” • Software is born • Requirements, design, coding, testing • Software lives • Maintenance • Software dies • Software retirement • Software process governs software lifecycle

What is software process A framework for a set of key areas necessary for successful production of software • General • Applicable to most software projects • Outlines major tasks • Requirements, specifications,… • Defines activities for each task • Quality assurance • Measurement of progress • Document preparation

Why do we need software process? • A look back at mechanical engineering • In the 1890, a mechanical engineer Frederick W. Taylor invented “scientific management” • The idea was that the way in which things are done is the key to better results • Improvements like using harder steel for the cutting tools • The labor component is important • Only good operators can take advantage of the better cutting tools • Extensive opposition movement • Many engineers thought that Taylor’s method wasn’t really engineering, but rather some non-technical hybrid

Why do we need software process (cont.) • What about development of software? • In many cases, it’s a pretty chaotic process, similar to mechanical engineering in 1800s • Opinion of many managers: software engineering is a bag of tricks to keep programmers in line • So, the process of software development should be studied, formalized, and controlled by engineering techniques • “Software processes are software too” -- Leon J. Osterweil • There is a split between technical and management software people on the process issue • “Process vs. product” controversy: what’s more important, organizing people or organizing products

Clash of issues: technical vs. managerial (or nerds vs. suits) • Very different concerns • The problem of running a a large multi-person project is different from doing the work itself • This course didn’t really touch the managerial side • Engineers need managers! • And vice versa, of course • Very few people are good at both technical and managerial jobs

Capability maturity model (CMM) • How do we measure the quality of a software process? • Need to do it to compare between organizations or to know how to improve software practices in a given organization • The Software Engineering Institute introduced the CMM model • Assigns a software development organization a maturity level • 1 to 5, low to high maturity • Ain’t no simple formula • Careful evaluation of of the organization is needed • Mostly about how its software projects are conducted (established practices) • Introducing predictability into software development is a primary goal of the higher CMM levels • A high quality software process is not a guarantee of a high quality software product • But the likelihood of improving software quality is high

CMM levels • Initial • Ad hoc software development • Repeatable • Cost, schedule, functionality tracking • Defined • The process is standardized • Managed • Measurements of progress and quality are used • Optimizing • The process is being constantly improved

Initial level • Might be better to call it “level 0” • An organization may use many of the ideas from CMM, but not in the order or manner described in the formal levels • Thus, it will be placed on this initial level

Repeatable level • Refers more to the ability to track cost, schedule, and functionality than to the routine exercise of this ability • The only technical reference in the formal definition of this level is configuration management • The requirements might seem modest, but this level is quite hard to achieve

Defined level • The management practices of level 2 are formally defined and recorded • Followed throughout the organization even when things go wrong • There must be a Software Engineering Process group within the organization that codifies practices

Managed level • The central concept is measurement of the development process and the software product • The product here includes requirements, design, code, documentation, test plans etc.

Optimizing level • Introduces feedback into the process from the measurements of level 4 • E.g., if a project is behind schedule in its design phase, • A manager at level 4 will have measurements to show this and then will try to correct matters (e.g. by adjusting schedule) • A manager at level 5 will use data from the delinquent project to try to discover the root cause of the problem and change the development process itself • So that the problem does not occur in future projects

Critique of CMM levels • Most descriptions of CMM levels are full of hype • Descriptions of different levels are not specific • The basis of CMM is mostly managerial (not technical) • The step from level 1 to level 2 is based on management alone • In general, effort should not be spent on process at the expense of effort on product • Unless there’s a clear indication that the product will benefit from that

Using CMM to evaluate a potential employer • Knowing the CMM level of a potential employer is a valuable data for an engineer • E.g. level 4 means that there is considerably more regimentation than at (say) level 2 • Many employees at a level 4 organization will have rigid job description • Likely little scope for advancement • Exciting technical risks are not taken • But managerial personnel has more opportunities for advancement

Process management is not for every organization • First off, there are two extremes • For a project involving a handful of people, process is often a waste of time • A project involving hundreds of people will not succeed without process • What about the non-extreme cases? • E.g., suppose that development time for a project is about 2 years, involving about 200,000 LOC • Technical model - hire about eight senior engineers who work essentially without any management hierarchy • Productivity about 1200 LOC/person-month • Managed model - hire 2 line managers and 16 junior engineers • Productivity about 500 LOC/person-month

Are all software processes born equal? • There are many different ways to organize software production • Different process models • The choice of a process model is based on • The nature of the project • The methods and tools that the organization wants to use • The controls over software production • The product

Waterfall process model • Does not represent the practice well • Too rigid • Parallel production is limited • All requirements must be specified fully in the beginning requirements HL design LL design coding testing

Prototyping process model (evolutionary development model in Sommerville) • This model is often practical! • Customers may get wrong impression about the final product from the prototype • Customers may ask for deployment before the product is ready • Often prototype flaws are not fixed in the final product requirements Prototype development Prototype Test-drive

Rapid application development (RAD) process model • A number of software teams, each • Developing a well-defined part of the product • Using the waterfall model • Benefits: • Very rapid development • Component-based (reusable) products • Drawbacks: • Requirements have to be well-understood • Product decomposition is not always possible • Sensitive to lack of commitment

Incremental process model • Useful when deadline cannot be achieved directly • May require significant human resources • If large number of teams requirements HL design LL design coding testing requirements HL design LL design coding testing requirements HL design LL design coding testing Time

Spiral process model codesector • Natural for large software systems • Customers are “stuck” with the development organization Testsector Designsector start Requirementssector

Concurrent development process model • May reduce development time by exploiting concurrency None Under development Awaiting changes Under review Under revision Baselined Done

Formal development process model • Similar to the waterfall model in its structure • Formal processes are used on each stage • Formal specifications on the requirements stage, including formal verification • Formal process of transforming requirements into design and implementation • Standard testing of the code

So, which process model is the best? • Depends on many parameters (nature of the product, availability of resources, organization, etc.) • The spiral model should probably be the choice in most cases • Driven by risk - in the first turn of the spiral, the developers decide if building of the system is feasible

Software metrics • What are they? • Formulas for computing quantitative characteristics of software development, deployment, and maintenance • Why do we need them? • Consider the following scenario (Hamlet, Maybee): • Someone in the organization makes what is called a “business case” for a new product by estimating the revenue that will be lost day by day if it is not available. Then they guess how long the business can stand the loss and come up with a schedule for developing the product - a schedule that bears no relation to what is actually required to develop it. Engineers are then told: meet this schedule. • Software developers may think that the schedule is unrealistic, but how can they prove it? • E.g. through statistical measurements available for projects of comparable complexity

Primary way to measure software • The size of the project • Lines of code (LOC) • Functional points • Historical data provides a link between LOC for a project and the resources needed: • People • Number of personnel and length of the period they are needed • Time • The whole process of development • Individual phases of development • Capital goods • Computers, desks, work rooms, pizzas, cups of coffee, …

But how would we know the size of a system before it is built? • Historical data • We did something like that in the past… • Estimation models • Not many people have personal experience with software projects of different sizes • Models summarize experience in equations that relate project size, schedule, and effort • Sophisticated: a 200,000 LOC project takes more than twice the resources of a 100,000 project

Can we do better than using LOC? • Functional points (FP) metric proposed • Based on counting: • External input and output points • User interaction points • External interfaces • Files used by the system • Each characteristic is evaluated based on its complexity (importance for the system) and assigned a weight • A word of caution: developed a long time ago • Before OO programming • Before database penetration • Biased toward data processing systems

Functional points metric • Unadjusted function-point count formula: • E.g., let • The number of inputs and outputs be 3, with assigned weight 10 • The number of user interactions be 2, with assigned weight 5 • The number of external interfaces be 5, with assigned weight 3 • The number of files used by the system be 2, with assigned weight 2 • Then UFC for this system is 3*10 + 2*5 + 5*3 + 2*2 = 59 UFC = (number of elements of given type) X (weight)

COCOMO estimation model • COnstructive COst MOdel • Developed by B. Boehm in the 1980s • Recognizes 3 classes of projects: • Organic mode • Small, simple projects; democratically configured teams • Semi-detached mode • Intermediate projects, a mix of rigid and non-rigid requirements • Embedded mode • Large projects, tight constraints • Defines 3 different levels • Basic, intermediate, advanced

Levels of the COCOMO model • Basic • Needs only the size in LOC • Intermediate • Needs LOC and a set of cost drivers • Advanced • Needs LOC and cost drivers • Applies cost drivers to each activity of the software process

Example: output from COCOMO for a 100,000 LOC project (Hamlet, Maybee) Model mode: semidetached Model size: large (100,000 lines of code) Total effort: 521.3 man-months, 152 man-hours/man-month Total schedule: 22.3 months

Rule-of-thumb facts from the COCOMO model • Projects in the range of 100,000 LOC take about 2 years • Required effort is • 20% for requirements/specification • 50% for design/coding • 30% for the rest • Staffing and distribution depend on the type of the project, but generally are • About 500 man-months • Distributed roughly 30-40-30% among the phases

Software quality metrics • Correctness • The degree to which software performs the intended function • Metric: number of defects per KLOC • Maintainability • The ease with which software can be corrected, adapted, or enhanced • Metric: mean time to change • Integrity • The degree to which software is protected against attacks • Metric: the success ratio of (known) attacks • Usability • The degree of user-friendliness • Metric: the time period required to become efficient in the use of the system

Defect-related quality metrics • Defect removal efficiency (DRE) • DRE = E/(E+D) • E is the number of errors • D is the number of defects • Can be used to estimate defect removal efficiency of process steps: • DREi = Ei/(Ei + Ei+1) • Ei is the number of errors discovered on step i • Ei+1 is the number of errors discovered on step i+1

Using quality metrics in management • Performance of individuals and teams can be compared • Team A found 112 errors in their softwre component; team B found 240 errors in their component • Which team is better? • After the deployment of the system, 5 defects were traced to software produced by team A and 2 defects were traced to software produced by team B • Which team is better? • The DRE metric for teams A and B is .9 and .8 • Which team is better? • Using quality metrics for management is not easy and can be misleading

So, if managed software process is so great, how come Open Source is so successful? • Background • Enthusiasts write software that is often quite good • Often done in collaboration by large groups of people • Informally! • Open Source Foundation and Free Software Foundation are organizations that support the notion of open source software • High profile open source projects • Linux • Apache

A case study of open source software development: the Apache server • A. Mockus, R. Fielding, and J. Herbsleb • Appeared in ICSE’2000 • An attempt to investigate the claim that open source software development can successfully compete with traditional commercial development methods

Characteristics of open software style (OSS) development • Built by potentially large numbers (hundreds and even thousands) of volunteers • Extremely geographically distributed • Participants rarely or never meet face to face • Work is not assigned • People undertake the work they choose to undertake • There is no explicit system-level design, or even detailed design • There is no project plan, schedule, or list of deliverables

The Apache Web server • Began in February 1995 • An effort to coordinate existing fixes to the httpd program • New architecture design by R. Thau in July 1995 • Apache httpd 1.0 released in January 1996 • According to the Netcraft survey, the most widely deployed server • Over 50% of the 7 mil sites queried • Developer email list is used for communication among developers • Problem reporting database is used for communication between users and developers • CVS archive is used for version control

The Apache development process • The Apache Group (AG) is an informal organization of developers • Only volunteers, with day jobs • Each member can vote on the inclusion of any code change and has write access to CVS • Members • People who have contributed for an extended period of time (usually >6 months) • 25 as of April 2000 • Core developers (about 15 at any given time) • Only a subset of AG active (4-6 usually)

The Apache development process (cont.) • Each developer iterates through a common sequence of actions • Discovering that a problem exists • Determining whether a volunteer will work on it • Identifying a solution • Developing and testing the code within their local copy of the source • Presenting the code changes to the AG for review • Committing the code and documentation to the repository

The size of the Apache development community • Almost 400 different people contributed code • 182 people contributed to 695 problem report related changes • 249 people contributed to 6092 non-PR changes • 3060 different people submitted 3975 problem reports • 458 individuals submitted 591 reports that caused a change to the code or documentation

Distribution of the work within the development community • The top 15 developers contributed more than 88% of added lines and 91% of deleted lines of code • A single person did about 20% of these • 66% of the PR related changes were produced by the top 15 contributors

Code ownership • Hypothesis: a single person would write the vast majority of the code for a module • This didn’t happen! • Of 42 .c files with >30 changes, 40 had at least two (and 20 had at least 4) developers making more than 10% of the changes

What is the defect density of Apache code? • It was more than in other four large systems (undisclosed) it was compared to • The role of bloaty code is unclear, though • Apache did better than others in the number of defects in pre-test state • There is no provision for systematic system test in OSS • Code inspection is better under OSS?

Hypotheses based on this study • OSS projects will have a core of developers who control the code base • A group larger by an order of magnitude than the core will repair defects and an even larger group will report problems • Projects with a small number of developers besides the core will fail because of a large number of defects • In successful OSS projects, developers are also users • OSS developments exhibit very rapid responses to customer problems • Defect density in OSS project releases will generally be lower than in commercial code that has only been feature tested

Software Development Process and Metrics: A Comprehensive Overview

Software Development Process and Metrics: A Comprehensive Overview

Presentation Transcript

Topics of Lecture

Lecture Topics: 11/29

Lecture 24. Assorted Topics

Lecture 5 - Topics

10/9: Lecture Topics

Lecture 2: Topics

9/29: Lecture Topics

Final Lecture TOPICS

Lecture Topics: 11/3

Lecture Topics

AILUN LECTURE TOPICS

Lecture Topics I

Lecture Topics: 11/19

Lecture Topics: 12/1

10/13: Lecture Topics

Lecture topics

Lecture 1 topics

Lecture topics

Lecture Topics: 11/13

Lecture Topics: 12/6

Lecture topics

10/15: Lecture Topics