DEA: Data Envelopment Analysis

Version: D1.1

Last Update: January 27, 1996

Subject Index:

One Liner: Technique for determining relative productivity

Body:

Data Envelopment Analysis (DEA) is becoming an increasingly popular management tool. Questions about it have cropped up recently in the sci.op-research newsgroup but nothing underlines this fact more than the fact that it was featured in a recent issue of Fortune magazine (10/31/94, p.38). This is a very basic introduction to DEA. This introduction focuses more on gaining an intuitive understanding of DEA than mathematical rigor because of the varying backgrounds of the readers.

This section is an introduction to Data Envelopment Analysis (DEA) for people unfamiliar with the technique. For a more in-depth discussion of DEA, the interested reader is referred to the seminal work by Charnes, Cooper, and Rhodes [1978] or the listing in the DEA Home Page.

DEA is commonly used to evaluate the relative efficiency of a number of producers. A typical statistical approach is characterized as a central tendency approach and it evaluates producers relative to an average producer In contrast, DEA is an extreme point method and compares each producer with only the "best" producers. By the way, in the DEA literature, a producer is usually referred to as a decision making unit or DMU. Extreme point methods are not always the right tool for a problem but are appropriate in certain cases. (See Strengths and Limitations of DEA.)

A fundamental assumption behind an extreme point method is that if a given producer, A, is capable of producing Y(A) units of output with X(A) inputs, then other producers should also be able to do the same if they were to operate efficiently. Similarly, if producer B is capable of producing Y(B) units of output with X(B) inputs, then other producers should also be capable of the same production schedule. Producers A, B, and others can then be combined to form a composite producer with composite inputs and composite outputs. Since this composite producer does not necessarily exist, it is typically called a virtual producer.

The heart of the analysis lies in finding the "best" virtual producer for each real producer. If the virtual producer is better than the original producer by either making more output with the same input or making the same output with less input then the original producer is inefficient. The subtleties of DEA are introduced in the various ways that producers A and B can be scaled up or down and combined.

The procedure of finding the best virtual producer can be formulated as a linear program. Analyzing the efficiency of n producers is then a set of n linear programming problems. The following formulation is one of the standard forms for DEA. lambda is a vector describing the percentages of other producers used to construct the virtual producer. lambda X and lambda Y and are the input and output vectors for the analyzed producer. Therefore X and Y describe the virtual inputs and outputs respectively. The value of theta is the producer's efficiency.

DEA Input-Oriented Primal Formulation

It should be emphasized that an LP of this form must be solved for each of the DMUs. There are other ways to formulate this problem such as the ratio approach or the dual problem but I find this formulation to be the most straightforward. The first constraint forces the virtual DMU to produce at least many outputs as the studied DMU. The second constraint finds out how much less input the virtual DMU would need. Hence, it is called input-oriented. The factor used to scale back the inputs is theta and this value is the efficiency of the DMU.

Simple Numerical Example

A simple numerical example might help show what is going on. Assume that there are three players (DMUs), A, B, and C, with the following batting statistics. Player A is a good contact hitter, player C is a long ball hitter and player B is somewhere in between.

Now, as a DEA analyst, we play the role of Dr. Frankenstein by combining parts of different players. First let us analyze player A. Clearly no combination of players B and C can produce 40 singles with the constraint of only 100 at-bats. Therefore player A is efficient at hitting singles and receives an efficiency of 1.0.

Now we move on to analyze player B. Suppose we try a 50-50 mixture of players A and C. This means that lambda=[0.5, 0.5]. The virtual output vector is now,

lambda Y = [0.5 * 40 + 0.5 * 10, 0.5 * 0 + 0.5 * 20] = [25, 10]

Note that X = 100 = X(0) where X(0) is the input(s) for the DMU being analyzed. Since lambdaY > Y(0) = [20, 5], then there is room to scale down the inputs, X and produce a virtual output vector at least equal to or greater than the original output. This scaling down factor would allow us to put an upper bound on the efficiency of that player's efficiency. The 50-50 ratio of A and C may not necessarily be the optimal virtual producer. The efficiency, theta, can then be found by solving the corresponding linear program described earlier.

It can be seen by inspection that player C is efficient because no combination of players A and B can produce his total of 20 home runs in only 100 at bats. Player C is fulfilling the role of hitting home runs more efficiently than any other player just as player A is hitting singles more efficiently than anyone else. Player C is probably taking a big swing while player A is slapping out singles. Player B would have been more productive if he had spent half his time swinging for the fences like player C and half his time slapping out singles like player A. Since player B was not that productive, he must not be as skilled as either player A or player C and his efficiency score would be below 1.0 to reflect this.

This example can be made more complicated by looking at unequal values of inputs instead of the constant 100 at-bats, by making it a multiple input problem, or by adding more data points but the basic principles still hold.

Graphical Example

The single input two-output or two input-one output problems are easy to analyze graphically. The previous numerical example is now solved graphically. (Constant returns-to-scale is implied. Explanations of this and other important issues can be seen in the DEA Home Page.) The analysis of the efficiency for player B looks like the following:

Graphical Example of DEA for Player B

If it is assumed that convex combinations of players are allowed, then the line segment connecting players A and C shows the possibilities of virtual outputs that can be formed from these two players. Similar segments can be drawn between A and B along with B and C. Since the segment AC lies beyond the segments AB and BC, this means that a convex combination of A and C will create the most outputs for a given set of inputs.

This line is called the called the efficiency frontier. The efficiency frontier defines the maximum combinations of outputs that can be produced for a given set of inputs. The segment connecting point C to the HR axis is drawn because of disposability of output. It is assumed that if player C can hit 20 home runs and 10 singles, he could also hit 20 home runs without any singles. We have no knowledge though of whether avoiding singles altogether would allow him to raise his home run total so we must assume that it remains constant.

Since player B lies below the efficiency frontier, he is inefficient. His efficiency can be determined by comparing him to a virtual player formed from player A and player C. The virtual player, called V, is approximately 64% of player C and 36% of player A. (This can be determined by an application of the lever law. Pull out a ruler and measure the lengths of AV, CV, and AC. The percentage of player C is then AV/AC and the percentage of player A is CV/AC.)

The efficiency of player B is then calculated by finding the fraction of inputs that player V would need to produce as many outputs as player B. This is easily calculated by looking at the line from the origin, O, to V. The efficiency of player B is OB/OV which is approximately 68%. This figure also shows that players A and C are efficient since they lie on the efficiency frontier. In other words, any virtual player formed for analyzing players A and C will lie on players A and C respectively. Therefore since the efficiency is calculated as the ratio of OA/OV or OA/OV, players A and C will have efficiency scores equal to 1.0.

The graphical method is useful in this simple two dimensional example but gets much harder in higher dimensions. The normal method of evaluating the efficiency of player B is by using an LP formulation of DEA.

Applications

The simple baseball example described earlier may not convey the full view on the usefulness of DEA. It is most useful when a comparison is sought against "best practices" where the analyst doesn't want the frequency of poorly run operations to affect the analysis. DEA has been applied in many situations such as hospitals, schools, banks, manufacturing, benchmarking, management evaluation, fast food restaurants, and retail stores.

By the way, the analyzed data sets vary in size. Some analysts work on problems with as few as 15 or 20 DMUs while others are tackling problems with over 10,000 DMUs. (The largest that I know of is Barr and Durchholz with 25,000 DMUs analyzed on a Sequent parallel computer with custom software.)

Strengths of DEA

As the earlier list of applications suggests, DEA can be a powerful tool when used wisely. A few of the characteristics that make it powerful are:

Limitations of DEA

The same characteristics that make DEA a powerful tool can also create problems. An analyst should keep these limitations in mind when choosing whether or not to use DEA.

There are a lot of other items that I haven't covered in this brief introduction to DEA such as returns-to-scale and input vs. output orientation. Pointers to these issues can be found in more detailed bibliographies such as the DEA Home Page or Dr. Seiford's DEA bibliography.

Bibliography:

DEA has become a popular subject since it was first described in 1978. There have been hundreds of papers and technical reports published along with a few books. Technical articles about DEA have been published in a wide variety of places making it hard to find a good starting point. Here are a few suggestions as to starting points in the literature. (The author also maintains a more detailed hyperlinked bibliography with over 250 entries.)

The first paper was the original paper describing DEA and results in the abbreviation CCR for the basic constant returns-to-scale model. The Seiford and Thrall paper is a good overview of the literature. The BCC paper gives a formulation that allows for variable returns-to-scale.

One of the best books covering the entire field of productivity analysis is The Measurement of Productive Efficiency edited by Fried, Lovell, and Schmidt, 1993, from Oxford University Press. This is the closest thing to a handbook that I've run across covering both the parametric and nonparametric analysis techniques.

There is a new book from Kluwer Publishers, Data Envelopment Analysis: Theory, Methodology, and Applications by Charnes, Cooper, Lewin, and Seiford, that has an excellent coverage of DEA. One of the editors, Dr. Larry Seiford, also maintains the authoritative DEA bibliography.

This introduction to DEA is a constantly evolving document and feedback is appreciated.

Contributor: Dr. Tim Anderson, Engineering Management Program, Portland State University.

Referenced by:

Refers to: DEA Home Page

Status: Work in progress.

Return to top of Introduction to DEA

Crawl to WORMS Virtual Encyclopedia