Data Envelopment Analysis

Data Envelopment Analysis Robert M. Hayes

Overview • Introduction • Data Envelopment Analysis • DEA Models • Extensions to include a priori Valuations • Strengths and Weaknesses of DEA • Implementation of DEA • The Example of Libraries • Annals of Operations Research 66 • Annals of Operations Research 73

Introduction • Utility Functions • Cost/Effectiveness • Interpretation for Libraries

Utility Functions • A fundamental requirement in applying operations research models is the identification of a "utility function" which combines all variables relevant to a decision problem into a single variable which is to be optimized. Underlying the concept of a utility function is the view that it should represent the decision-maker's perceptions of the relative importance of the variables involved rather than being regarded as uniform across all decision-makers or externally imposed. • The problem, of course, is that the resulting utility functions may bear no relationship to each other and it is therefore difficult to make comparisons from one decision context to another. Indeed, not only may it not be possible to compare two different decision-makers but it may not be possible to compare the utility functions of a single decision-maker from one context to another.

Cost/Effectiveness • A traditional way to combine variables in a utility function is to use a cost/effectiveness ratio, called an "efficiency" measure. It measures utility by the "cost per unit produced". On the surface, that would appear to make comparison between two contexts possible by comparing the two cost/effectiveness ratios. The problem, though, is that two different decision-makers may place different emphases on the two variables.

Cost/Effectiveness • It also must be recognized that effectiveness will usually entail consideration of a number of products and services and costs a number of sources of costs. Cost/effectiveness measurement requires combining the sources of cost into a single measure of cost and the products and services into a single measure of effectiveness. • Again, the problem of different emphases of importance must be recognized. This is especially the case for the several measures of effectiveness. But it may also be the case with the several measure of costs, since some costs may be regarded as more important than others even though they may all be measured in dollars. When some costs cannot be measured in dollars, the problem is compounded.

Cost/Effectiveness • More generally, instead of costs and effectiveness, the variables may be identified as "input" and "output". The efficiency ratio is then no long characterized as cost/effectiveness but as "output/input", but the issues identified above are the same.

Interpretation for Libraries • This issue can be illustrated by evaluating library performance. Effectiveness here is the extent to which library services meet the expectations or goals set by the organization served. It is likely to be measured by several services which are the outputs of library operations—making a collection available for use, circulation or other uses of materials, answering of information questions, instructing and consulting. • Inputs are represented by acquisitions, staff, and space, to which evident costs can be assigned, but they are also represented by measures of the populations served.

Interpretation for Libraries • Efficiency measures the library’s ability to transform its inputs (resources and demands) into production of outputs (services). The objective in doing so is to optimize the balance between the level of outputs and the level of inputs. The success of the library, like that of other organizations, depends on its ability to behave both effectively and efficiently. • The issue at hand, then is how to combine the several measures of input and output into a single measure of efficiency. The method we will examine is that called "data envelopment analysis".

Data Envelopment Analysis • Data Envelopment Analysis (DEA) measures the relative efficiencies of organizations with multiple inputs and multiple outputs. The organizations are called the decision-making units, or DMUs. • DEA assigns weights to the inputs and outputs of a DMU that give it the best possible efficiency. It thus arrives at a weighting of the relative importance of the input and output variables that reflects the emphasis that appears to have been placed on them for that particular DMU. • At the same time, though, DEA then gives all the other DMUs the same weights and compares the resulting efficiencies with that for the DMU of focus.

Data Envelopment Analysis • If the focus DMU looks at least as good as any other DMU, it receives a maximum efficiency score. But if some other DMU looks better than the focus DMU, the weights having been calculated to be most favorable to the focus DMU, then it will receive an efficiency score less than maximum.

Graphical Illustration • To illustrate, consider seven DMUs which each have one input and one output: L1 = (2,2), L2 = (3,5), L3 = (6,7), L4 = (9,8), L5 = (5,3), L6 = (4,1), L7 = (10,7). L4 L3 L7 L2 L5 L1 L6

Graphical Illustration • DEA identifies the units in the comparison set which lie at the top and to the left, as represented by L1, L2, L3, and L4. These units are called the efficient units, and the line connecting them is called the "envelopment surface" because it envelops all the cases. • DMUs L5 through L7 are not on the envelopment surface and thus are evaluated as inefficient by the DEA analysis. There are two ways to explain their weakness. One is to say that, for example, L5 could perhaps produce as much output as it does, but with less input (comparing with L1 and L2); the other is to say it could produce more output with the same input (comparing with L2 and L3).

Graphical Illustration • Thus, there are two possible definitions of efficiency depending on the purpose of the evaluation. One might be interested in possible reduction of inputs (in DEA this is called the input orientation) or augmentation of outputs (the output orientation) in achieving technical efficiency. Depending on the purpose of the evaluation, the analysis provides different sets of peer groups to which to compare. • However, there are times when reduction of inputs or augmentation of outputs is not sufficient. In our example, even when L6 reduces its input from 4 units to 2, there is still a gap between it and its peer L1 in the amount of one unit of output. In DEA, this is called the "slack" which means excess input or missing output that exists even after the proportional change in the input or the outputs.

Graphical Illustration • This example will be used to illustrate the several forms that the DEA model can take. • In each case, the results presented are based on the implementation of DEA that will be discussed later in this presentation. It is an Excel spreadsheet using the add-in Solver capability. • The spreadsheet is identical for all of the forms, but the application of Solver differs in the target optimized and in the values to be varied, so for each form the target and the values to be varied will be identified.

DEA Models • The Basic EDA Concept • Variations of DEA Formulation • Formulation: Primal or Dual • Orientation: Input or Output • Returns to Scale: Fixed or Variable

The Basic EDA Concept • Assume that each DMU has values for a set of inputs and a set of outputs. • Choose non-negative weights to be applied to the inputs and outputs for a focus DMU so as to maximize the ratio of weighted outputs divided by weighted inputs • But do so subject to the condition that, if the same weights are applied to each of the DMUs (including the focus DMU), the corresponding ratios are not greater than 1 • Do that for each DMU. • The resulting value of the ratio for each DMU is its EDA efficiency. It is 1 if the DMU is efficient and less than 1 if it is not.

Formulation • Let (Yk,Xk) = (Yki,Xkj), k = 1 to n, i = 1 to s, j = 1 to m • Maximize mYk/nXk for each value of k from 1 to n, subject in each case to mYj/nXj <= 1, j= 1 to n, where • mYk means Si mi*Yki, i = 1 to s, • nXk means Si ni*Xki, i = 1 to m • mYj means Si mi*Yji, i = 1 to s and j = 1 to n • nXj means Si ni*Xji, i = 1 to m and j = 1 to n. • mi, ni >= 0 • The solution is the set of maximum values for mYk/nXk and the associated values for m and n

Basic Linear Programming Model • For solution, this optimization problem is transformed into a linear programming problem, schematically displayed as follows: • In a moment, we will interpret this display as it is translated into alternative formulations of the optimization target and conditional inequalities.

Variations of DEA Formulation • But first, it is necessary to identify several sources of variation in the basic DEA formulation, leading to a variety of different models for implementation: • We will now examine and illustrate each of those sources of variation.

(1) Formulation: Primal or Dual • The first source of variation is interpretation of the display for the linear programming model. • One interpretation, called the Primal, treats the rows of the display as representing the model. • The other interpretation, called the Dual, treats the columns as representing the model. • Let’s examine each of those in turn.

Primal Formulation • The rows of this display are interpreted as follows: • (M) Maximize W = mYk – nXk subject to • (1) mYj – nXj <= 0, j = 1 to n • (2) -m <= -1, or m >= 1 • (3) -n <= -1, or n >= 1

The Dual Formulation • The Columns of this display are interpreted as follows: • (m) Minimize W = -a - b subject to • (1) lYj – a >= Yk • (2) –lXj - b >= -Xk

The Choice of Formulation • Since the results from the two formulation are equal, though expressed differently, the choice between them is based on computational efficiency or, perhaps, ease of interpretation. • The Dual form is more efficient in computation if the number of DMUs is large compared to the number of input and output variables. Note that the Primal form entails n conditions (n being the number of DMUs) which, in the Dual form, are replaced by just m + s conditions (m being the number of input variables and s, the number of output variables)

Illustration • To illustrate, consider the example previously presented. The target to be minimized in the Dual form is W = – a – b. The values to be varied are (l, a, b), or (m,n). • The following table shows the solution for both forms:

Illustration • Graphically, the results are as follows: • The maximum value for W, over all cases, is at L2, where W = 0 and the ratio of Y/X is a maximum. The slack for each other case is the vertical distance to the line which goes from the origin (0,0) through L2 (3,5).

(2) Orientation: Input or Output • The second source of variation, orientation, provides the means for focusing on minimizing input or on maximizing output. • These represent two quite different objectives in making assessments of efficiency. Is the objective to be minimally expensive (e.g., to save money) or is it to be maximally effective?

Orientation to Input • The linear programming display for the input orientation is as follows: • It adds one additional condition, nXk <= 1, to the display.

Orientation to Input • The resulting Dual formulation is as follows: • (m) Minimize W = c-1 subject to • (1) lYj – a >= Yk • (2) –lXj – b + (c – 1)Xk >= -Xk or lXk + b <= cXk

Orientation to Input • Continuing with the same example, the following table shows the solutions in both formulations. The target is W = c – 1. Values to be varied are now (l, a, b, c) or (m and n). • Note that L2 still dominates the solution, but the graph is now quite different,

Orientation to Input

Orientation to Output • The linear programming display for the output orientation is as follows: • It adds one additional condition, mYk <= 1, to the display.

Orientation to Output • The resulting Dual formulation is as follows: • (m) Minimize W = 1 – c subject to • (1) lYj – a >= cYk • (2) –lXj – b >= – Xk or lXk + b <= Xk

Orientation to Output • Continuing with the same example, the following table shows the solutions in both formulations. The target is W = 1 – c. Values to be varied are still (l, a, b, c) or (m and n). • Note that L2 still dominates the solution, but the graph is now quite different,

Orientation to Output • Note that the graphical display is identical to that for the general form, though the interpretation is somewhat different (replacing efficiencies by slacks).

(3) Returns to Scale: Fixed or Variable • The third basis for variation among DEA models is “returns to scale”. • The examples presented to this point have each involved “constant returns to scale”. That is, the ratio mY/nX can be replaced by the inequality mY – nX <= 0. • These variations of the DEA model are called CCR models and reflect the requirement of constant returns to scale, • But if there are “variable returns to scale”, the ratio mY/nX must now be replaced by mY – nX + u <= 0 where u can now vary to reflect the variable returns to scale. • The results from that change are dramatic and make the DEA model much more interesting. The resulting models are called BCC models.

Variable Returns to Scale, Basic Model • The linear programming display for the basic DEA model is as follows: • It adds the variable u to the display.

Variable Returns: Orientation to Input • The linear programming display for the variables returns to scale, input orientation is as follows: • It adds one additional condition, nXk <= 1, to the display.

Orientation to Input • The resulting Dual formulation is as follows: • (m) Minimize W = c-1 subject to • (1) lYj – a >= Yk • (2) –lXj – b + (c – 1)Xk >= -Xk or lXk + b <= cXk • (3) l >= 1 • The new, third condition makes things interesting.

Orientation to Input • Continuing with the same example, the following table shows the solutions in both formulations. The target is W = c – 1. Values to be varied are now (l, a, b, c) or (m, n, u).

Orientation to Input

Orientation to Output • The linear programming display for the output orientation is as follows: • It adds one additional condition, mYk <= 1, to the display.

Orientation to Output • The resulting Dual formulation is as follows: • (m) Minimize W = 1 – c subject to • (1) lYj – a >= cYk • (2) –lXj – b >= – Xk or lXk + b <= Xk

Orientation to Output • Continuing with the same example, the following table shows the solutions in both formulations. The target is W = 1 – c. Values to be varied are still (l, a, b, c) or (m and n). • Note that L2 still dominates the solution, but the graph is now quite different,

Orientation to Output • Note that the graphical display is identical to that for the general form, though the interpretation is somewhat different (replacing efficiencies by slacks).

Extensions to include a priori Valuations • To this point, DEA has been essentially a mathematical process in which the data for input and output are taken as given, without further interpretation with respect to the reality of operations. • But reality needs to be recognized, so there are several extensions that can be made to the basic DEA model, applicable to any of the variations. • They fall into seven categories: • (1) Discretionary and Non-discretionary Variables • (2) Categorical Variables • (3)A priori restrictions on Weights • (4) Relationships between Weights on Variables • (5) A priori assessments of Efficient Units • (6) Substitutability of Variables • (7) Discrimination among Efficient Units

Discretionary & Non-discretionary • In identifying input and output variables, one wants to include all that are relevant to the operation. For example, the level of output is determined not only by what the unit itself does but by the size of the market to which the output is delivered. • The result, though, is that some relevant variables, such as the size of the market, are not under the control of management. Such variables, called non-discretionary, are in contrast to those that are under management control, called discretionary. • In assessing efficiency, all variables are considered, but in determining the criterion function to be maximized or minimized, only the discretionary variables are included.

Categorical Variables • In the DEA model as so far presented, the variables are treated as essentially quantitative, but sometimes one would like to identify non-quantitative variables, such as ordinal or nominal variables. • For example, one might like to compare institutions of the same type, such as public or private universities. • This is accomplished by introducing categorical variables containing numbers for order or identifiers for names.

A priori Restrictions on Weights • In the model as presented, the weights are limited only by the requirements that they be non-negative. • However, there may be reason to require that weights be further limited. • For example, it may be felt that a given variable must be included in the assessment so its weight must have at least a minimal value greater than zero. This might represent an output that is essential in assessment. • As another example, a variable may be such a large weight would over-emphasize its a priori importance so that there should be an upper limit on the weight. This might represent an output variable that is counter-productive.

Relationships between Weights • Sometimes, a priori knowledge may imply that there is a necessary relationship among variables. For example, an output variable may absolutely require some level of an input variable. • Such a priori knowledge may be expressed as a ratio between the weights assigned to the related variables.

Data Envelopment Analysis