290 likes | 481 Vues
“Concept and Theory” of Measurement. Why Measurement?. Characterization: allows us to gather information and “talk” intelligently about certain “relevant” characteristics (of software product or process).
E N D
Why Measurement? • Characterization: allows us to gather information and “talk” intelligently about certain “relevant” characteristics (of software product or process). • Tracking: allows us to gather information about some characteristics and follow it to see if things are “under control.” • Evaluation: let us judge on some characteristics of software product or process • Prediction: help us identify correlation and/or cause and effect relationships among software product or process. • Improvement: Using the relationships among the characteristics of interest to identify areas of change for better effects of the chosen characteristics. • Measurement is NOT free --- so we do not do this just for the sake of measurement • - In other science/engineering fields, the area of measurement is considerably ahead
Software Measurement Note that science and engineering progress is made through: a) observations => b)generalization => c) formulation of a theory ---- all based on “empirical” data • We will cover: • What is Software Measurement • Basic Types of Measurement • Scales of Measurement • Data Distribution • Information reliability and validity • Correlation • Cause and Effect Covered in next Lesson 3.5
Software Measurement • Software measurement is: the ‘mapping’ of attributesof software entities to some numeric or symbolic entities: • Conceptualize the Software entity : product, process, people, tool, etc. • Define the area (attribute) of interest to measure : product quality, product usability, product size, programmer productivity, etc. • Define the metrics (Operational Definition) to be used: defects per UML design entity, lines of code, lines of code developed per month, test cases per kloc, function points, etc. • Act of measuring and capturing the measurement : counting and grouping inspection errors (e.g. through performing the design inspection), counting the source lines of code, etc.
Measurement Example 8 7 6 5 4 3 2 mapping is the measurement color of marbles (attribute)of (entity) 1 numerical representation of the attribute “color”
GQM Approach to Measurements Note that measurement for some goal may be valid and useful but may not make sense for another context and goal. • GQM (Goal/Question/Metric) proposed by Basili and Weiss to approach the development of a metric: • Define the Goal : • e.g. Improve the time to locate (debug) software code problems • Pose the Question: • e.g. How program complexity influences the “debugging” time? • Develop Metrics: e.g. • Program control flow complexity: number of loops • Program dataflow complexity: distance (number of statements) between DEF and USE of a variable. • Debugging effort (person hours from error detection to error correction) • Debugging Index (complexity/debug effort) <----be careful with this! What if we changed the “Question” to how programmer skill influence ----- ?
A Side Note • In many engineering and science field, the measures that we take on some attribute are usually fairly intuitive and already agreed upon: • Length of table • Weight of a person • Volume of a “round” ball • There are also “difficult” measures such as “quality of Software” (We have already talked about this one in class in terms of TQM but ----- ) • Unfortunately, in software engineering, the problem of measurement is accentuated: • Volume of program? • Length of a program? • Quality of a program?
Types of Measurement • Direct • We can “observe” the entity and then apply the measurement to the entity to gauge the attribute of interest: e.g. • Directly measure the height of the chair • Directly measure the number of program statements. • Indirect: • We can not directly “observe” or apply the measure to the entity and thus we apply the measure directly to another entity or use another measurement as a surrogate. Then we use the “known” relationship to obtain the desired measure. • Indirectly measure the distance to the moon • Indirectly measure the “process maturity of organization” - e.g. via CMM or CMMI • Indirectly measure “complexity of code” attribute via some elements of the actual code such as # of lines of code or number of branches that can be directly observed.
Note: Prediction (projection) vs Measurement • Prediction is neither directly or indirectly measuring an entity. • It is estimating the measure of an entity that is still not in existence at the time of desired measurement. • e.g. estimating the “size” of a computer program in k-bytes of occupied memory space prior to the completion of the program.
Utility of Measurement • What meaningful statements can we make about an attribute ? • My program is 10%worse in quality than yours. • It had 10% more bugs found within the first year of release ? • Yesterday was twice as hot as today. • The temperature was 80 (F) degrees yesterday, and it is 40 (F) degrees today ?Or could it be “ -1 degrees yesterday and -2 degree today?” • What operations can we perform on the measurements? • The average of a 1-to-5 points survey on some customer reaction. • The averageof customers’ response on our software was 3.2 on a questionnaire that asked 1-delightful, 2-good, 3-average , 4-poor, and 5-abominable. ---- does “average” this make sense?
Scales (Levels) of Measurement • Nominal • Ordinal • Interval • Ratio
Scales (Levels) of Measurement (cont.) • 1) Nominal Level : “distinctiveness” • Simplest form of measurement • Classification by categories (similar to mathematical equivalence class): • Exhaustive – categories provide total coverage of all the elements with that attribute • Mutually Exclusive – every element belongs to only one category • NO relationshipmay be assumed among categories themselves • Examples ( are these nominal ?) : • Software development activities by: requirements, design, code, test, integration • Software process models by : waterfall, spiral, iterative, prototyping, and other • Product Defect source by : design, coding, testing, integration, and packaging • Students in college by: major fields (include “undeclared” but not allowing double major) in a college • Students in a class by : birth places • Survey result answers by: excellent, good, neutral, poor, bad • Arithmetic operations or comparison may or may not be valid • What does it mean to talk about “best” place of birth? • What does it mean to talk about “average” defect source? • What does it mean to talk about “average” student major? • What does it mean to say software engineering is “better” than music
Scales (Levels) of Measurement (cont.) • 2) Ordinal Level :“ordering” • Classified categories are ordered • The Ordinal scale istransitive: • If A> B and B>C, then A>C • The Ordinal scale isasymmetric: • The statement, “A>B then B>A,” is not true • Ordinal categories do not imply arithmetic operations are valid • Example: • Customer survey scale on UI: very easy : 10; easy : 7; somewhat easy : 5; clumsy : 3; difficult : 0 • (** note : we are often tempted to take averages and perform other arithmetic operations which may be meaningless.) • Johnny’s weight is more than Tom’s • Software engineering students’ average UI design survey results is higher than that of art students
Scales (Levels) of Measurement (cont.) • 3) Interval : “equal difference” • Categories are ordered and “equal” in size • “Points” of measurement are equidistant” • Allows the arithmetical operation of “addition” and “subtraction” • Example: • On a product “usability” attribute scale of 1 to 10, product A is measured at 8 and product B is measured at 4; then we can say that A is 4 points higher, but not necessarily 2 times better. (This is because there is no “absolute or a meaningful” base such as a fixed “zero” or fixed “null” ---- the “bottom” is made up as ‘1’--- it could have been ‘2’.)
Scales (Levels) of Measurement (cont.) • 4) Ratio Levels: “existence of fixed bottom - 0” • It is interval level with a fixed “bottom” of zero. • It allows the arithmetic operations of: addition, subtraction, multiplication and division • Ratio Example: • Programmer A codes at 500 loc per day, and programmer B codes at 1250 loc per day. We can not only say that the difference is 750 loc per day, but also programmer B produces 2.5 times more loc. per day ----- because there is a legitimate “0” loc per day
Pictorial Representations of Scaling Theory representation of measurement with partitioned data set entities measurement or mapping B Nominal level C A D M (entity) = A or B or C or D or E E entities representation of measurement with specific and ordered integers measurement or mapping Ordinal level M (entity) = Z or Z+2 or Z+5 or Z+9 where Z+9 > Z+5 > Z+2 >Z Z Z+2 Z+5 Z+9
Pictorial Representations of Scaling Theory entities representation of measurement with integers starting at “some”Z measurement or mapping Interval level M (entity) = Z or Z+n where n = 1 thru infinity . . . . . . . . Z Z-1 Z+1 Z+2 Z+3 note : [(Z+5) – (Z+1)] = 4 = [(Z+22) – (Z+18)] regardless of the value of Z BUT: (Z+2)/(Z+1) = 2/1 = (Z+6)/(Z+3) if Z=0 (Z+2)/(Z+1) = 3/2 & (Z+6)/(Z+3) = 7/4 if Z=1 entities representation of measurement with integers starting at Z=0 measurement or mapping Ratio level M (entity) = 0, 1, 2 --- . . . . . . . . 0 -1 1 2 3 note: with Z=0 fixed, 6/3 = 2/1 = - - - - -
Different Measures for Analysis • Ratio(different from ratio scale) : A/B where A and B are mutually exclusive • Example : (# of Testers / # of Coders – number of testers per coder) or (# of students / # of teachers) {can give us an estimating rule} • Proportion: A/(A+B+C+D) ; A,B,C,D are m. e. • Example : design errors/(requirements + design + coding + testing + integration) errors --- design errors in relations to all errors • Sometimes called “relative frequency” when numerator and denominator are both integers • Percentage: ( A/(A+B+C) ) * 100 ; A,B,C are m.e. • Rate : A/B where A and B may have “dependency” • Example : 3 defects / 10 statements
Expressing Measures of Single Attribute • Raw distribution of measure of single attribute (e.g. defects) • List of frequency count by some scale and displaying the “spread” or “distribution” : scale is the nominal categoriesof intervalof # of defects and the distribution shows how many modules of our product falls in each category of the nominal count of # of defects • Example : A - (1-5 defects) : 1 module • B - (6-10 defects) : 4 modules • C - (11-15 defects) : 13 modules • D - (16-20 defects) : 6 modules • E - (21-25 defects) : 2 modules • What are we looking at with Distribution ?: • Pattern ?; Any outliers?; Trend?; ? • Can we compare the defect distributions of 2 products and say something ? Your thoughts?
Describing the Distribution • Looking at the “raw” distribution may not always be practical for many situations because of the number of data points. We may use the following to describe the distribution: • Mean value: 1/n (x1 +x2 + - - - + xn ): characterizes the “Center” of the distribution. • it is easier to compare 2 means • prone to heavy influence of a few outliers • Median value : (n+1)/2 ‘s value from bottom in an ordered list is another characterization of the “Center” of a distribution • note that the median point is always the middle point (not in value--but in position) of the distribution; thus it is “resistant” to any outlier data.
Looking at Mean and Median Mean Median Mean : influenced by the skew Median Symmetrical spread Skewed to the right
Understanding the Spread • A common measure of the spread of a distribution about its mean. • Varianceof n observations : (let xm = mean of x’s) S**2 = (1/(n-1))*[(x1-xm)**2 + (x2-xm)**2+ ---(xn-xm)**2)] • it is squared to keep the value positive • the variance will be large if the distribution is widely spread out from the mean • An easier computational formula is : ( xi is each data point) S**2 = (1/(n-1)) * [( Sum(xi**2)) - 1/(n)((Sum xi)**2)] • (Try two sets of numbers : 1,2,3,4,5 and 1,3,5,7,9 with one data point in each category) • To bring the measurement back to its original unit, rather than squares, we take the square root • Standard Deviation, S, is the square root of Variance
The Normal Distribution • Symmetric, single peaked, belled shaped distribution • The mean, u , and the standard deviation, s, of a normal curve describes the shape of the curve. • The height of normal curve at point x may be expressed as: y = [ 1/(s*SQRT(2pi))] * [e **(-.5*(((x-u)/s)**2))] • 68% of the area under the normal distribution is covered by 1 std deviation which can be interpreted as 68% of the observations in a normal distribution fall within 1 std deviation • 2 std deviation cover 95% and 3 std deviation covers 99.7%
Standard Normal Distribution • All Normal distributions become the “same” if measurements are made in Standard Deviation units. (they can be viewed in relative terms) • If variables X has a normal distribution with mean u and standard deviation s, then the Standard variable is : Z = (X - u )/s • Example : suppose the software product defect rate in U.S. is a normal distribution and the mean is 12 errors per kloc with the standard deviation of 2. Your product with 14 defects per kloc mean would be • Z = (14-12)/2 = 1 or 1 standard deviation above the mean or within 68% of the the US software product.
6 Sigma • So what’s a sigma • A sigma is the same as 1 standard deviation • It covers about 68.26% of the normal curve. • 3 sigma covers about 99.73% of the normal curve • So 6 sigma covers 6 standard deviations from the mean or 99.9999998% of the curve. • So if the normal distribution represents the percentage of programs without bugs, then 6 sigma says 99.9999998% of the programs have no bugs. (So if you had a million programs, how many have bugs? What about a billion programs?)
Information Reliability and Validity • Reliability of Measurement • Refers to the consistencyof the measurements • Is a measure of defectiveness of measurements • Example: If the measurements taken consistently returns the same result, then it is viewed as reliable • One way to measure this consistency of measurement is to look at the ratio of standard deviation and the mean of the measurements, s/u (where s is 1 standard deviation and u is the mean). • The smaller this ratio is the more “reliable” is the measurement • Another way is to look at the equation, m = t + e, where m is the actualmeasurement, t is the “true” measurement, and e is the random error. • Consider the variance of all the pieces: var(m) = var(t) + var(e) • A reliable measurement should have var(e) = 0 or var(t)/var(m) = 1 • set: reliability = var(t)/var(m) = [var(m) – var(e)]/var(m) = 1 – var(e)/var(m)
Information Reliability and Validity • Validity of Measurement: • Refers to the applicabilityof the measurement • How do we know we have really measured the attribute that we want to measure? • This addresses the actual metric • Example: measuring the the number of defects may not really measure the customers’ satisfaction with the product. What is it a valid measure of --- your thoughts? • There 3 kinds of validity: • Construct validity – direct applicability of the metric • Number of defects to measure product quality • Predictive validity – relational applicability (indirect measure?) • pages of doc to quality of product as a relationship • Content validity – coverage applicability • “UI” defect to represent defect of “product” is only partially true