CUSUM exponential exp

The CUSUM chart is used to detect small and persistent change, and is based on the cumulative sum of differences between sampling measurements and the mean, (thus CUSUM). This assumes that, in the "in control" state the CUSUM would hover around the expected mean level, as deviation around the mean would cancel each other out. In the "out of control" state, there will be a bias away from the expected mean, and the CUSUM will drift away from the expected mean level.

The CUSUM programs on this site follow the approach outlined in the text book by Hawkins and Olwell (see references), summarized as follows.

The user defines the "in control". The central tendency and variance is defined, according to the nature of the data. In the normally distributed measurements, these are the mean and the Standard Deviation.
Using specific algorithms, the level of departure (h) from the central tendency to decide that the "out of control" alarm should be triggered is calculated. This level is abbreviated as h
h is calculated, depending on the amount of departure (k) the system is designed to detect, and the sensitivity of the detection, in terms of the averaged run length (ARL). The ARL is the expected number of observations between false alarms when the situation is "in control". Conceptually this represents the probability of Type I Error (α). An average run length of 20 is equivalent to α=0.05, ARL of 100, α=0.01.
Once the chart and the ARL are defined, sampling takes place at regular intervals. The departure from the expected is corrected by k, then added to CUSUM. If the CUSUM regressed to 0 or beyond, as it often does when the situation is "in control", the CUSUM recommences at 0.
In most cases therefore, two CUSUMs can be plotted, one for excessive increase in value, and one for excessive decrease in value (two tails). In most quality control situation however, only one of the tails is of interest.
In the programs of this site, 3 levels of h are offered in default, for ARLs of 20 (α=0.05) for yellow alert, 50 (α=0.02) for orange alert, and 100 (α=0.01) for red alert. The idea is that a yellow alert should trigger a heightened expectancy, orange alert triggers an investigation, and red alert triggers immediate response. However, these are merely recommendations, and users should define their own levels of sensitivity.

Exponentially Distributed Measurements are data that follows the inverse of the Poisson distribution, and are sometimes referred to as the Poisson Process.

In the Poisson distribution, the number of events occurring over an interval (usually a period of time) is the averaged count (λ). The inverse of this (β = 1 / λ), is the average interval (time) between events, which follows the exponential distribution.

Many measurements follows the exponential distribution, but the two most common types are time to event and ratios. As time to event is both intuitively easy to understand, and the most common type of data used in quality control, this page will use time to event as example and to explain CUSUM for this type of data.

Time to event is the reverse of events per unit of time. λ events per time unit is translated to an average of β = 1/λ time units between events. Examples of time to event measurements are the time interval between successive , patient falls in an old age care facility, time to burn out in light bulbs, time required to run or swim a set distance, or , time interval between soldiers being accidentally kicked to death by horses in Napoleon's army.

Exponentially distributed measurements, such as time, can sometimes be treated as normally distributed measurements. This is particularly so when the range of the data is narrow, and the central tendency is far away from the basic value of 1., An example is maternal age, usually with a range of 20-35 years, and is usually handled statistically as a normally distributed measurement. However, neonatal age, with ranges from 1 to 30 days, has to be handled as an exponentially distributed measurement.

CUSUM for Exponentially Dsitributed Measurements

A common model for CUSUM is that in the CUSUM and Shewhart Charts for Poisson Distributed Counts Explained Page, counts in a defined environment over a defined period of time. Examples are the number of complaints per month in a hospital, the number of falls in a aged care facility per week, the number of adverse events in a surgical unit per year.

The disadvantage of using Poisson counts is that no information is available until the end of the defined period, when the number of events are compared with the expected number.

The use of the exponentially Distributed measurements overcome this disadvantage. The measurements are the number of hours between complaints, the number of days between falls or adverse events. Decision points are whenever an event occurs rather than after a pre-determined period of time.

The symbol beta (β) is usually used to represent time to event. The in control measurement is usually represented by β₀, and the out of control measurement a particular CUSUM is designed to detect is represented by β₁.

All calculations are based on the ratio to the measurement in control, regardless of whether the measurement is in seconds, minutes or hours. CUSUM therefore use the value 1 for in control, and all time measurements are ratio of measurement / β₀.

The parameters of a CUSUM chart is therefore as follows

The change to be detected is β = β₁ / β₀,
The adjustment constant k = βLog(β)/(β-1)
Where β₁ > β₀, CUSUM is designed to detect an upward trend or an increase in time to event intervals. Where β₁ < β₀, CUSUM is designed to detect a downward trend or a decrease in time to event intervals. Given that the up and down sides are not symmetrical in exponentially distributed data, CUSUM is usually designed to detect changes on one side only (an increase or a decrease)
If
- C_t is the t^th CUSUM value
- C_t-1 is the preceding CUSUM value
- X_t is the ratio between the time to event measurement and β₀
Then
- The upward CUSUM C_t = maximum(0, C_t-1 + X_t - k)
- The downward CUSUM C_t = minimum(0, C_t-1 + X_t - k)
The CUSUM value is calculated whenever an event occurs and the time interval becomes available. This is plotted on the CUSUM chart where the x axis is the number of events since monitoring began, and y is the CUSUM value,
The decision interval h is the value, when crossed by the CUSUM, should result in an alarm being raised. The value (h) is calculated by iteration using the adjustment constant (k) and the required level of sensitivity Average Number of events (ANOS) which is comparable to the ARL of other CUSUMS. This is the average number of events between alarms, the ANOS for in control represents the probability of false alarms (error), and that for out of control the probability of true alarm (sensitivity)., , ,,

, Exponential distribution is asymmetrical, with a longer tail at higher values. Calculations for ANOS and h values are therefore complex. The amount of memory requirements and the size of numbers required increases exponentially at extreme values, and the computer program needs to be adjusted to cope with this.

The results from calculations in StatTools, presented in the CUSUM for Exponentially Distributed Measurements Program Page have been checked against the charts produced by Gan (see references) and they seem correct. However rounding errors increase in the program for β values less than 0.02, and for the lower ANOS values when β values are very high. h values for β between 0.99 and 1.01 have not been published, so have not been tested, and ion any case these are unlikely values in quality control.

Gan and Choi presented algorithms for calculating ANOS for two sided CUSUM charts, but these are difficult to be converted into estimating two decision lines based on a single ANOS, as there are infinite combinations of the , two decision values for each ANOS. The two tail model is therefore not provided by StatTools.,

We will use the default example in the CUSUM for Exponentially Distributed Measurements Program Page as an example. The data was computer generated to demonstrate the methodology and is not real.

We are monitoring the quality of light bulbs produced from our factory, and we sample them regularly, and test how many hours of use before a light bulb burns out. We expect the standard when everything is in control to average 200 hours of life (β₀ = 200), and we would consider that a reduction to 125 hours (β₁ = 125) to represent the out of control situation. In other words, we want to know when the life of our light bulbs is reduced to 62.5% of that expected (β = β₁/β₀ = 125/200 = 0.625).

We calculate the adjustment constant k, where k = βLog(β)/(β-1)
= 0.65 Log(0.65) / (0.65 - 1) = 0.65 x -0.438 / -0.35 = 0.78

The common convention is to nominate a decision interval (h) that will provide the in and out of control ANOS that suits the user. The alternative, used in these pages, is to nominate the in control ANOS, calculate the decision interval, , then calculate the out of control ANOS of that decision interval. Also, given the bio-medical background of this web site , (even though the example is about light bulbs), 3 different alert levels are provided.

The example nominates the following

Yellow alert (take notice but no action). ANOS nominated (averaged number of events before a false alert) = 50, decision value (h) = -2.78, and an alarm will be raised at an average of 14 events after the situation becomes out of control.
Similarly, the orange alert (requires investigation) has values of 100, -3.67 and 19. The red alert (requires immediate response) 200, -4.65, and 25.,,

In the default example data is as follows.

209,168,130,197,171,220,242,183,169,208,92,164,195,152,183,115,
139,181,158,153,114,153,145,110,94,153,192,171,133,106,192,144,
82,110,183,186,35,146,90,93,95,190,81,152,158,150,117,116,175,103

The computer generated the first 10 values with average of 190, and the remaining 40 with average of 138

The figure to the right is the CUSUM plot. Starting with a CUSUM value of 0,

the first light bulb had a life of 209 hours. Ratio = 209/200 = 1.05,
C₁=minimum(0,0 + 1.05 - 0.78) = minimum(0,0.27) = 0
the second measurement, life of 169 hours, ratio = 169/200 = 0.84,
C₂=minimum(0,0 + 0.84 - 0.78) = minimum(0,0.06) = 0;,,,
the third measurement, life of 130 hours, ratio = 130/200 = 0.65,
C₃=minimum(0,0 + 0.65 - 0.78) = minimum(0,-0.13) = -0.13;,,,
the fourth measurement, life of 197 hours, ratio = 197/200 = 0.99,
C₄=minimum(0,-0.13 + 0.99 - 0.78) = minimum(0,0.11) = 0;

From the 11th sample onwards, consistent lower values resulted in progressive decline of the CUSUM values, until the 41^stsample when it crossed the yellow alert line, and at the 50^th sample when it crossed the orange alert line.

Similar to Example 1, the data here is computer generated to demonstrate the calculations and does not reflect any reality.

We are running an airline, and wish to maintain efficiency in aircraft turn-around time, between landing to take off. We expect the standard when everything is in control is an average turn-around time of 120 minutes , (β₀ = 120), and we would consider that an increase to 180 minutes (β₁ = 180) , to represent the out of control situation. In other words, we want to know when the average turn-around time increased , to 1.5 times that expected (β = β₁/β₀ = 180/120 = 1.5).

We calculate the adjustment constant k, where k = βLog(β)/(β-1) = 1.5 Log(1.5) / (1.5 - 1)
= 0.65 x 0.41 / 0.5 = 1.22

We nominate the following parameters for our monitoring

Yellow alert. ANOS nominated (averaged false positive rate) = 50 flights, decision value (h) = 3.95, and an alarm will be raised at average of 12 flights after the situation ,,becomes out of control.
Similarly, the orange alert has values of 100, 5.43 and 17. The red alert 200, 7.09, and 22.,,

The data , the number of hours in turn-around time in successive flights, is as follows.

147,196,214,197,62,179,146,171,46,223,174,231,192,126,234,97,192,
256,145,136,120,152,193,215,149,118,160,176,162,126,157,213,138,
211,282,153,86,256,93,274

The computer generated the first 10 values with average of 158, and the remaining 30 with average of 174.

The figure to the right is the CUSUM plot. Starting with a CUSUM value of 0,

The first flight had a turn-around time of 147 minutes. Ratio = 147/120 = 1.23,
C₁=maximum(0,0 + 1.23 - 1.22) = maximum(0,0.22) = 0.22
The second measurement, life of 196 hours, ratio = 196/200 = 1.63,
C₂=maximum(0,0.22 + 1.63 - 1.22) = minimum(0,0.43) = 0.43;,,,
The CUSUM continued to increase because of long turned around time, but the fifth and ninth flight had very short turn-around time so the CUSUM returned towards the baseline

From the 11th flight onwards, consistent longer turn-around times resulted in progressive increase of the CUSUM values, until the 24^thflight when it crossed the yellow alert line, and at the 34^th flight when it crossed the orange alert line, and at the 38^th flight when it crossed the red alert line.

Hawkins DM, Olwell DH (1997) Cumulative sum charts and charting for quality improvement. Springer-Verlag New York. ISBN 0-387-98365-1 p 47-74

Gan FF (1994) Design of Optimal Exponential CUSUM Charts. Journal of Quality Technology 26:2 p. 109-124. Program code in Fortran available at url = http://lib.stat.cmu.edu/jqt/26-2

Gan FF and Choi KP (1994) Computing Average Run Lengths for Exponential CUSUM Schemes. Journal of Quality Technology 26:2 p. 134-143