Using funnel plots to compare performance indicators

by Daljit Dhadwal

Credit: ©iStockphoto.com/faberfoto_it

In the opening chapter of Picturing the Uncertain World: How to Understand, Communicate, and Control Uncertainty through Graphical Display, Howard Wainer writes about what he calls The Most Dangerous Equation: the equation for the standard error of the mean which he refers to as DeMoivre’s equation. Wainer’s general point is that there is greater variability in small samples. He gives several examples demonstrating this phenomenon: U.S. counties with the lowest and highest age adjusted kidney cancer rates “tend to be very rural, midwestern, southern, and western counties”; the highest and lowest performing schools tend to be smaller schools; and the safest and most dangerous cities in the U.S. tend to be smaller cities. Failing to take into account the effect of the sample size on a statistic is a cognitive bias called insensitivity to sample size. From Wikipedia:

Insensitivity to sample size is a cognitive bias that occurs when people judge the probability of obtaining a sample statistic without respect to the sample size. For example, in one study subjects assigned the same probability to the likelihood of obtaining a mean height of above six feet [183 cm] in samples of 10, 100, and 1,000 men. In other words, variation is more likely in smaller samples, but people may not expect this.

In another example, Amos Tversky and Daniel Kahneman asked subjects:

A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50%, sometimes lower.  For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days?

1. The larger hospital

2. The smaller hospital

3. About the same (that is, within 5% of each other)

56% of subjects chose option 3, and 22% of subjects respectively chose options 1 or 2. However, according to sampling theory the larger hospital is much more likely to report a sex ratio close to 50% on a given day than the smaller hospital.

This phenomenon—that smaller units (hospitals, cities, schools, etc.) have more random variability—is important to take into account when ranking or comparing performance indicators as part of a performance measurement system.  David Spiegelhalter, a statistician at the University of Cambridge, introduced the use of funnel plots to compare performance indicators in his 2005 article, Funnel plots for comparing institutional performance. A funnel plot is a scatter plot of the actual values of a particular performance indicator for each unit (i.e., each city, school, hospital, etc) on the y-axis against the sample size on the x-axis along with a horizontal line showing the overall value of the performance indicator and control limits around the overall value. Units that lie outside the control limits have performance that is significantly different than the overall value. Recently on his blog, Understanding Uncertainty, Spiegelhalter worked through an example of using a funnel plot to compare the performance of local governments in England for the following performance indicator: the proportion of children whose adoption placement occurred within 12 months. He found that the performance for the majority of local governments was within the control limits and “so their variation [was] essentially indistinguishable from chance.”

Funnel plots are a very useful tool when measuring differences in performance. From the article, Statistical Process Control Methods in Public Health Intelligence:

Methods based on ranking, such as league tables or percentiles, have a number of flaws. The main problem with ranking is the implicit assumption that apparent differences between organisations are the results of better or poorer performance. Simply because institutions may produce different values for an indicator, and we naturally tend to rank these values, does not mean that we are observing variation in performance. All systems within which institutions operate, no matter how stable, will produce variable outcomes.

The questions we need to answer are: ‘Is the observed variation more or less than we would normally expect?’; ‘Are there genuine outliers?’; ‘Are there exceptionally good performers?’; ‘What reasons might there be for excess variation’, and so on. Alternative methods [to ranking] based on understanding variation [such as funnel plots] may be more appropriate…

Fortunately, it’s fairly straightforward to make funnel plots: here’s how in Excel, Stata, and SAS.