Related link :
CUSUM for Normally Distributed Means Program Page
Introduction
Mean
SD
Binomial
Bernoulli
Poisson
Negative Binomial
Inverse Gaussian
Exponential
References
k
Paradigm
Conceptual Model
Algorithms
Technical Considerations
e
f
CUSUM is a set of statistical procedures used in quality control. CUSUM stands for Cumulative Sum of Deviations.
In any ongoing process, be it manufacture or delivery of services and products, once the process is established and running, the outcome should be stable and within defined limits near a benchmark. The situation is said to be In Control
When things go wrong, the outcomes depart from the defined benchmark. The situation is then said to be Out of Control
In some cases, things go catastrophically wrong, and the outcomes departure from the benchmark in a dramatic and obvious manner, so that investigation and remedy follows. For example, the gear in an engine may fracture, causing the machine to seize. An example in health care is the employment of an unqualified fraud as a surgeon, followed by sudden and massive increase in mortality and morbidity.
The detection of catastrophic departure from the benchmark is usually by the Shewhart Chart, not covered on this site. Usually, some statistically improbable outcome, such as two consecutive measurements outside 3 Standard Deviations, or 3 consecutive measurements outside 2 Standard Deviations, is used to trigger an alarm that all is not well.
In many instances however, the departures from outcome benchmark are gradual and small in scale, and these are difficult to detect. Examples of this are changes in size and shape of products caused by progressive wearing out of machinery parts, reduced success rates over time when experienced staff are gradually replaced by novices in a work team, increases in client complaints to a service department following a loss of adequate supervision.
CUSUM is a statistical process of sampling outcome, and summing departures from benchmarks. When the situation is in control, the departures caused by random variations cancel each other numerically. In the out of control situation, departures from benchmark tend to be unidirectional, so that the sum of departures accumulates until it becomes statistically identifiable.
All CUSUM methods conforms to the same conceptual model, although how the coefficients and parameters are calculated depends on the nature of the numbers being used. This section briefly describes this conceptual model. Greater details are provided under sections of each probability distribution.
- Background constants
- Model Constant are parameters that are assumed to remain the same throughout the process.
- In Normally distributed means, the in control Standard Error is expected to remain the same even when the
situation is out of control. If CUSUM is calculated using single measurements, then the standard error is the same as standard deviation. If more it uses means of multiple (n) samples, then the standard error is standard error / sqrt(n)
- In Normally distributed Standard Deviations, the same sample size used to calculate Standard Deviation is
used throughout the exercise
- In Binomially distributed proportions, the same sample size is used to estimate proportion throughout.
For the Bernoulli distributed proportions, the sample size is 1
- In Negative Binomial Proportions, the number of positive cases referenced is determined from the in
control proportion and sample size, this number of positives remains the same in used throughout
- In Inverse Gaussian distribution, the λ value of the data is expected to remain the same.
- Bench mark values are either defined by the quality controller, or the constant value produced
when the situation is under control.
- For Normally distributed means, and in Inverse Gaussian distributed means, the in control mean value
- For Normally distributed Standard Deviations, the in control Standard Deviation
- For Poisson distributed counts, the in control count
- For both Binomially distributed and Bernoulli distributed proportions, and for Negative Binomial
distributed counts, the in control proportion
- Average Run Length (ARL) The expected average number of consecutive sample required to obtain a false
positive alarm, when the situation is still in control. This can be considered to be equivalent to
the false positive rate, or the probability of Type I Error. For example, an ARL=100 is having a
1% False Positive Rate, or p<=0.01. The ARL defines the sensitivity of the CUSUM process being
planned
In manufacturing industry, with thousands of items produced every hour, the ARL is often set to thousands.
In health care, depending on the competing priority of risks and costs, the ARL can be set from 20 (FPR=5%)
to 1000 (FPR=0.1%). Often, this depends on the amount of data being monitored, and is set so that a
false alarm only occurs once a month to once a year.
- Statistical Parameters Two statistical parameters are calculated using the background constants
- The Reference Value k is calculated from the in and out of control values. The mathematics differs
according to the probability distribution of the numbers being used. Reference value is used to modify
the departure from benchmark value before it is used to create CUSUM
- The Decision Interval h is calculated using k and arl. The mathematics varies according to the probability
distribution of the numbers being used. During monitoring, an alarm is triggered when the
CUSUM value crosses the decision interval
- Measurement of Departure from Benchmark is represented by the symbol X, representing the comparison
between the current measurement and the expected value if the situation is in control. In nearly all cases,
the departure is the difference (current value - expected in control value). In CUSUM for Normally distributed
variance however, this is a ratio of the variance ((current SD)2 / (expected SD)2).
- Winsorization is a statistical process whereby unexpected outliers with extreme values are modified
before they are used for calculating CUSUM. The algorithm for winsorization provided by Hawkins was not
included in the programs of StatTools, and users will need to manually modify extreme outlier values
if they should choose to do so.
- Calculation and Plotting of CUSUM During auditing, the value of the CUSUM is updated and plotted with each
sampling, with the following conventions
- The value of CUSUM at the start of CUSUM can be
- zero(0), assuming that all is well and there is no departure
- A proportion of the decision interval h, usually half (0.5). The argument is that, if things are already out of control, CUSUM would reach the decision interval sooner. If things are in control, then CUSUM would drift towards the base line 0 value
- With each new measurement, the departure (X) is calculated, and the CUSUM is amended using the formula
CUSUMt = CUSUMt-1 + X - k
- If the CUSUM crosses the 0 value, it is defaulted to 0
- If CUSUM crosses the decision interval (h), an alarm is triggered, the auditor then has two options
- In the real live audit situation, CUSUM ends. Investigations and remedies are applied. A new CUSUM then starts
- In the research situation, CUSUM continues without resetting, showing the movements of CUSUM throughout the data space
The diagram to the left demonstrates the essential features of the CUSUM plot
Computer programs
The CUSUM programs in StatTools are written in php. The programs are adapted from the following sources
- The CRAN package, statistical
resources for the R language contains R and FORTRAN codes for Hawkin's general programs for calculation
k and h for Normally distributed location, variation, Inverse Gaussian, Binary and Inverse Binary, and
Poisson distributed data, as described in Hawkins DM (1992). Evaluation of average run lengths of
cumulative sum charts for an arbitrary data distribution. Journal Communications in Statistics -
Simulation and Computation Volume 21, - Issue 4 Pages 1001-1020
These codes were converted to php and used in the programs in StatTools. Calculations for
k and h are checked against the GetAnyH program downloaded from
Hawkins and Olwell's web site
http://www.stat.umn.edu/cusum/software.htm
Results of k and h calculations from StatTools were compared and validated against GetAnyH
and found to be correct
- Hawkins did not provide an algorithm for CUSUM when the data is a proportion based on the Bernoulli
distribution. The program in StatTools was constructed following the algorithm described in the
paper
Reynolds Jr. MR and Stoumbos ZG (1999) A CUSUM Chart for Monitoring
a Proportion when Inspecting Continuously. Journal of Quality Technology
vol 31: No. 1. p.87 - 108. No examples were provided in that paper, so the results of computation could not be
checked for accuracy. However the results look OK, and reflects changes in proportions.
- Although Hawkins provided CUSUM algorithm for inverse Gaussian distribution, which measures time to conclusion in
individual cases. Gan provided a CUSUM algorithm for exponential distribution, which is time between occurrences
of an event or Inverse Poisson distribution. The algorithm for this was constructed following the algorithm
described by Gan in
Gan FF and Choi KP (1994) Computing Average Run Lengths for Exponential CUSUM Schemes.
Journal of Quality Technology 26:2 p. 134-143. The results of calculations were compared with graphs
published in that paper and found to be similar (close but not exact as calculated numbers were compared
with lines on a graph.
One Tail Tests
All CUSUM programs in StatTools use the one tail model to calculate k and h, so that the CUSUM plots are always one sided, either an increase or a decrease in measurements when the situation is out of control.
This means that, should user require CUSUM monitoring for departure from benchmark in both directions, two CUSUM charts will need to be constructed, one for an upward shift, and another one for downward shift.
Different models for CUSUM are available on StatTools, dependent on the nature of the data. On this page, a panel is provided to describe each of these models, but here is a summary
- There are 2 models using data that are Normally distributed
- CUSUM for means monitors changes to the mean value
- CUSUM for Standard Deviations monitors changes in variability of the data
- There are 2 models using data that are Proportions
- CUSUM for proportions (Bernoulli distribution) monitors changes to proportions of 1s using the
frequencies of 0/1 values
- CUSUM for proportions (Binomial distribution) monitors changes to proportions using the number of 1s in
sets of defined sample size
- There is 1 models using data that are Poisson distributed counts of events
- There are 2 models using data that have a positively skewed distributions, usually with a long tail to the right
- CUSUM for Inverse Gaussian monitors changes to harmonic means. It is a useful method when the data is
related to time as measurement of duration, such as time to run 100M, time to burnt
out for light bulbs, duration of labour, and so on
Ratios also have Inverse Gaussian distributions. As nearly all scalar measurements are ratios to a
standard, it can be said that all measurements are Inverse Gaussian.
When the range of interest and
variance are small and not closed to zero, the λ value is large and the measurements are approximately
Normal. If the variance is large compared to the range of interest, or the values are closed to 0, then
λ is small, and the distribution skewed. Examples are that body weight in adults are accepted as
Normally distributed, while mass of the recently fertilized embryo is Inverse Gaussian. Similarly, height
of human are accepted as Normally distributed, but distance between stars are Inverse Gaussian. In
Obstetrics, maternal age is considered Normally distributed, but duration of the first stage Inverse Gaussian,
when both are time as duration measurements.
- CUSUM for Exponential distribution (also called Inverse Poisson distribution) monitors changes to the
Poisson process.
In the Poisson distribution, the number of events occurring over an interval (usually a period of time)
is the averaged count (λ). The inverse of this (β = 1 / λ), is the average interval
(time) between events, which follows the exponential distribution.
Please note that this distribution is for time between events occurring, different to the Inverse
Gaussian, which is for time as a measurement of duration, time to complete a process in an individual.
The Exponential distribution does not applied only to time, but to other environmental measurements
also. For example, number of cells in a set volume of fluid follows the Poisson distribution, so the
volume of fluid required to contain 1 cell follows the Exponential distribution.
Other examples are the time interval between successive patient falls in an old age care facility,
time between traffic accidents in an intersection, time between birth of abnormal babies in a
particular population
Contents of e : 24
Contents of f : 25
This panel explains and support the calculations from CUSUMMean_Pgm.php. This is for CUSUM for Normally distributed means (location) , which is probably the most commonly used CUSUM model in quality assessment
- The first reason is that most measurements are Normally distributed, or are approximately Normally distributed
- The second reason is that, data from many other distributions can be transformed into approximately Normal distribution, which can then take advantage of this model. Numerical transformation are explained in
Transformation_Exp.php, which also provides a link to algorithms for transformations.
- Finally, the Normal distribution mean model is visually simple, and conceptually easy to understand.
The default parameters and data is an example using computer generated data. It represents a CUSUM process set up to monitor the amount of sugar being placed in packages for sale by an automated dispenser
- Mean (in control) is the benchmark, the mean value expected when the system is in control.
In our example,
we expect that each packet contains 100g of sugar if the process is in control and working properly
- Standard Error (SE) in control is the expected Standard Error when the situation is under control, and even
when the system is out of control. Please note the term Standard Error is used because sometimes CUSUM uses
the mean of a number of samples for input, so the Standard Deviation (SD) needs to be adjusted as
SE = SD / sqrt(n). In most cases when individual samples are monitored n=1 and SE = SD
In our example, we use individual samples for our CUSUM, and we expect the Standard Deviation (and therefore
Standard Error) to be 10g
- Mean (Out of control) is the mean value when the system is out of control. Really it defines the departure from
benchmark the CUSUM is designed to detect
In our example, if the dispenser wears out, we expect it to gradually increase the amount it dispenses.
We would like to be notified if the dispenser regularly dispenses 102g or more, and increase of 2g, one fifth of
a Standard Error
- From in control and out of control parameters, the reference value is calculated. In our example, k = 1
- Averaged Run Length (ARL) is the expected averaged number of samples required to trigger one false alarm when
the situation is actually under control. Statistically it represents the False Positive Rate, or α the
probability of Type I Error.
In our example, we set our ARL to 100, FPR=0.1, α=0.1
- From k and ARL, the decision interval is calculated to be 63.5
- The data represents the weights of the sugar as they are dispensed. These are computer generated random
numbers with Standard Deviation of 10. The first 30 values had a mean of 100, and the rest a mean of 102
Resulting CUSUM plot is as shown to the left. Please note the following
CUSUM starts at a value that is half of the decision interval
With each new measurement X, the CUSUM is amended so that
Ct = Ct-1 + X - k
When CUSUM crosses the value 0, it defaults to 0
When CUSUM reaches or exceeds the decision interval (h), an alarm is triggered, and the CUSUM restarts at h/2
CUSUM values remained under the decision interval h in the first 50 values, increases in the second 50 and repeatedly exceeds the decision interval.
This panel explains and support the calculations from CUSUMSD_Pgm.php. This is a CUSUM for detection of changes in variability in terms of the Standard Deviation. Although it is not as commonly carried out as CUSUM for means, it is nevertheless important
- Firstly, the earliest changes when a system becomes out of control is that the output becomes more variable. When
this happens, the Standard Deviation of the measurements may well increase before the mean value of the output.
- Secondly, a change in the variability of output usually alters the mean values as well. When a change in mean
is detected, it is important to check that Standard Deviation, so that the faults in the system can be better
identified.
This mathematics involved in actually the ratio of variances X = SD 2 / SD in control2, rather than a difference from the in control Standard Deviation. However, the parameters and the data are usually stated in terms of the Standard Deviation. The default parameters and data is an example using computer generated data. It represents a CUSUM process set up to monitor the amount of sugar being placed in packages for sale by an automated dispenser. Instead of monitoring the mean values however, this CUSUM monitors the variability in term of the Standard Deviation
- Standard Deviation (in control) is the benchmark, the expected Standard Deviation of the measurement
expected when the system is in control.
In our example, we expect that the Standard Deviation is
10g if the process is in control and working properly
- Standard Error (out of control) defines the departure from in control value that the CUSUM id designed
to detect
In our example, the CUSUM is designed to detect an increase to 10.5 (0.5/10 = 5% increase
in the in control Standard Deviation)
- Sample size (m) is the sample size that will be used to estimate Standard Deviation on the go.
In our example this is 10. This means that the Standard Deviation will be calculated on every 10 cases, and
this used for CUSUM
- From in control and out of control parameters, and the sample size, the reference value k is calculated. In
our example, k = 1.05
- Averaged Run Length (ARL) is the expected averaged number of samples required to trigger one false
alarm when the situation is actually under control. Statistically it represents the False Positive Rate,
or α, the probability of Type I Error.
In our example, we set our ARL to 100, FPR=0.1, α=0.1
- From k and ARL, the decision interval is calculated to be 3.05
- The data represents the Standard Deviations of the sugar, each Standard Deviation value calculated
from 10 consecutive measurements. The 100 values therefore represents the data from 1000 samples of sugar.
The data are computer generated random numbers. The first 50 centred around 10 and the rest around 10.5
Resulting CUSUM plot is as shown to the left. Please note the following
CUSUM starts at a value that is half of the decision interval
When CUSUM is less than 0, it defaults to 0
When CUSUM reaches or exceeds the decision interval (h), an alarm is triggered, and the CUSUM restarts at h/2
CUSUM values remained under the decision interval h in the first 50 values, increases in the second 50 and repeatedly exceeds the decision interval.
Two CUSUM procedures related to changes in proportions are provided by StatTools, the Binomial distribution and the Bernoulli distribution. This panel discusses the Binomial distribution, and supports the calculations in
CUSUMBinomial_Pgm.php.
The Binomial distribution examines the probability of having the number of positive cases (N+) in a group with sample size m.
The sensitivity and the stability of CUSUM depends on the sample size m, the smaller the group, the more sensitive and unstable it is.
The default and example is an audit of Caesarean Section in an obstetric unit
- The Proportion in control p0 is our benchmark. We expect that 20% (0.2) of our deliveries would be
by Caesarean Section
- The Proportion out of control p1 is 25% (0.25). We design the CUSUM to detect an increase from 20% to 25%
- CUSUM is carried out by counting the number of positive cases (N+) in each group. This number is a
function of proportion and the sample size. In our example, we examine data in groups of 10 (m=10), so
the expected number of positives are
- When in control, N+ = p0 * m = 0.2 * 10 = 2
- When out of control, N+ = p1 * m = 0.25 * 10 = 2.5
- The Average Run Length is the average number of groups observed before an alarm even if the situation
is in control. This is equivalent to the False Positive Rate. In our example we assigned ARL=100, a
false positive rate of 1% or α=0.01
- From proportions in and out of control, group sample size, and the ARL, two parameters are calculated, and
from our example parameters, the Reference Value, k = 2, and the Decision interval h = 11.5
- The Data is a single columns, each row contains the number of positives (N+) in a group
size m (in our example m=10). The data in our example are 20 randomly generated integers, the first 10
centred around 2, and the second 10 centred around 2.5. This data represents a total of 200 deliveries,
divided into 20 consecutive groups of 10, and each number represents the number of Caesarean Sections in
that group.
- The CUSUM starts as with the value half of decision interval (h/2). With each new group
- If CUSUM crosses 0 value, it is default to 0
- If CUSUM crosses the Decision Interval h, the alarm is triggered, and the CUSUM starts again with the value of h/2
Resulting CUSUM plot is as shown to the left. Please note the following
CUSUM starts at a value that is half of the decision interval
With each new group the number of positives is N+, the CUSUM is amended so that
Ct = Ct-1 + N+ - k
When CUSUM crosses the value 0, it defaults to 0
When CUSUM reaches or exceeds the decision interval (h), an alarm is triggered, and the CUSUM restarts at h/2
CUSUM values remained under the decision interval h in the first 10 values (groups), increases in the second 10 and eventually exceeds the decision interval.
Two CUSUM procedures related to changes in proportions are provided by StatTools, the Binomial distribution and the Bernoulli distribution. This panel discusses the Bernoulli distribution, and supports the calculations in
CUSUMBernoulli_Pgm.php.
The Bernoulli distribution is a special case of the Binomial Distribution. It examines the probability of each individual case having a value of true (1, +) or false (-, 0).
The advantages of using the Bernoulli distribution for CUSUM is that the CUSUM value can be calculated with every case, based on whether the case is true or false. It is therefore more responsive to changes as it does not have to wait for the collection of a group of cases before a proportion can be calculated.
The disadvantage of doing CUSUM using the Bernoulli distribution is that the model is highly sensitive to any change, so that short term variations may cause mark changes to CUSUM and trigger false alarms. An example is monitoring adverse surgical outcomes, when most of the dangerous operations are carried out on a particular day by a senior surgeon, so that the adverse outcomes peaks one day a week rather than being averaged over the whole week, causing a false alarm to be triggered
The default and example is an audit of Caesarean Section in an obstetric unit
- The Proportion in control is our benchmark. We expect that 20% (0.2) of our deliveries would be
by Caesarean Section
- The Proportion out of control is 25% (0.25). We design the CUSUM to detect an increase from 20% to 25%
- The average number of observations before signal an alarm (ANOS) is similar
in concept to the average run length (arl) in other CUSUMs, and is the average
number of observations before an alarm even if the situation is in control. This is equivalent to the
False Positive Rate. In our example we assigned ANOS=100, a false positive rate of 1% or α=0.01
- From proportions in and out of control, and the ANOS, two parameters are calculated
- The Reference Value, γB, is equivalent to k in other CUSUM programs, except that it is used
differently to construct the CUSUM. From our example, γB=0.2243
- The Decision Interval, h, is similar to other CUSUM models, used to trigger an alarm
- The Data is a single columns, with 0 or 1, representing false/true, -/+, no/yes. The data in
our example are 60 randomly generated, the first 30 with a 20% frequency of 1s, and the second 30 25% of 1s.
- The CUSUM starts as with the value half of decision interval (h/2). With each new case
- If the case is 1 (true, +, yes),CUSUMt = CUSUMt-1 + γB - 1/γB
- If the case is 0 (false, -, no),CUSUMt = CUSUMt-1 - 1/γB
- If CUSUM crosses 0 value, it is default to 0
- If CUSUM crosses the Decision Interval h, the alarm is triggered, and the CUSUM starts again with the value of h/2
Resulting CUSUM plot is as shown to the left. Please note the following
CUSUM starts at the value h/2
When CUSUM is less than 0, it defaults to 0
When CUSUM reaches or exceeds the decision interval (h), an alarm is triggered, and the CUSUM restarts at h/2
CUSUM has a saw tooth appearance, as it either increases with 1 or decreases with 0 with every case
CUSUM values increases in the second 30 cases and eventually exceeds the decision interval.
Poisson distribution concerns events occurring within a defined environment, such as the number of cells in a volume of fluid, or number of tadpoles in a pond
The most common environment however is time, so most counts are in terms of a unit of time. In health care, commonly used counts are number of complaints received by a hospital in a month, the number of falls in an age care facility per month, number of adverse incidents in an Intensive Care Unit a week, and so on.
The classical case is described by Poisson himself, on the number of soldiers in Napoleon's army that were accidentally kicked to death by horses.
When monitoring Poisson distributed counts, it is important that the environment is clearly defined and remains constant throughout. For this reason, evaluations can only take place at set intervals as the intervals must be long enough for some events to occur.
For this reason, the newer method in the CUSUM for Exponential Distributed Data Explained Page is use, where the interval between events, the inverse of Poisson count, is increasingly used, as evaluation
can take place after each event.
The panel, however supports the algorithm of CUSUM for Poisson distributed counts, in the program CUSUM and Shewhart Charts for Poisson Distributed Counts Program Page.
The default and example is an audit of the number of complaints per month from patients and relatives in a hospital
- The Mean Count in control λ0 is our benchmark and derived from experience. We expect that 3 complaints a month. However, if supervision and care deteriorates then complaints may increase
- The Mean count out of control λ1 is set at 5. Our CUSUM is designed to trigger an alarm if the number of complaints increases to 5 per month or beyond
- The Average Run Length is the average number of episodes, in this example, the number of months observed before an alarm even if the situation is in control. This is equivalent to the False Positive Rate. In our example we assigned ARL=100, a false positive rate of 1% or α=0.01
- From mean counts in and out of control, and the ARL, the Reference Value, k = 4, and the Decision interval h = 5.6, are calculated
- The Data is a single columns, each row contains the number of events in the period. The data in our example are 20 randomly generated integers, the first 10 centred around 2, and the second 10 centred around 5. These represents the number of complaints per month over 20 months of monitoring
- The CUSUM starts as with the value half of decision interval (h/2). With each new group
- If CUSUM crosses 0 value, it is default to 0
- If CUSUM crosses the Decision Interval h, the alarm is triggered, and the CUSUM starts again with the value of h/2
Resulting CUSUM plot is as shown to the left. Please note the following
CUSUM starts at a value that is half of the decision interval
With each month, the CUSUM is amended so that
Ct = Ct-1 + count - k
When CUSUM crosses the value 0, it defaults to 0
When CUSUM reaches or exceeds the decision interval (h), an alarm is triggered, and the CUSUM restarts at h/2
CUSUM values remained under the decision interval h in the first 10 values (groups), increases in the second 10 and eventually exceeds the decision interval. An alarm is triggered and the CUSUM starts again, reaching the decision interval again, and the second alarm triggered.
The panel supports the algorithm of CUSUM for Inverse Gaussian distributed measurements, in the program CUSUM for Inverse Gaussian Distribution
.
The following paragraph is a summary of the description of the Inverse Gaussian Distribution, provided by
the Wikipedia
The distribution, formally, is the probability distribution of the harmonic mean, 1/x, when x
is normally distributed. The distribution is skewed with a long tail to the right, and the characteristics of the distribution id defined by its mean, μ, and the level of skewness, λ. Values, μ and λ are all greater than 0. Skewness is most extreme with low values of λ, and the distribution becomes Normal Gaussian when λ=∞.
λ is difficult to compute formally, but an approximation can be obtained from rearrangement of the skewness formula
Skewness = 3 . sqrt(μ / λ)
μ / λ = (skewness / 3)2
λ = μ / (skewness / 3)2
For any set of data, the mean (μ) and skewness can be determined, so λ can also be determined. StatTools provides calculations for mean and skewness at Data Testing for Normal Distribution Program Page
Although CUSUM for Inverse Gaussian distribution can be a useful method to audit any measurements that has a positive skew (long right tail), it is particularly appropriate to use for the measurement of time in terms of duration or time to complete a task. In manufacturing, time required to produce a product or planning. In health care, the duration of the waiting list. In labour the duration of first and second stage.
Ratios also have an Inverse Gaussian distribution. Given nearly all measurements are ratios of a standard unit, all measurements can be considered Inverse Gaussian. However, where the variance is small compared with the range of interest, most measurements can be approximately Normally distributed. For example, the weight of a newborn is usually accepted as Normally distributed, but the mass of stars has an Inverse Gaussian distribution
In many circumstances, the use of the complex mathematics involved with Inverse Gaussian distribution may be unnecessary, as skewed data can be transformed by logarithm, square root, cube root, or the Box Cox transformation to become approximately Normal. StatTools provides algorithms for transformation in Numerical Transformation Program Page
Should users still wish to proceed with CUSUM for Inverse Gaussian distribution, the following discussion use the example parameters and data from CUSUM for Inverse Gaussian Distribution
. This example evaluates the waiting time for patients in a clinic waiting to be seen.
- Mean in control μ0 is our benchmark. We expect that patients wait about an hour before they are seen
- Lambda in control λ0 is the level of our skewness. From experience, we decided that λ0=1
- Mean out of control λ1 is set at 1.5. Our CUSUM is designed to trigger an alarm if the waiting time increases to 1.5 hours or more
- Average Run Length is the average number of patients reviewed before an alarm is triggered, even if the situation is in control. This is equivalent to the False Positive Rate. In our example we assigned ARL=100, a false positive rate of 1% or α=0.01
- From means in and out of control, λ0, and the ARL, the Reference Value, k = 1.2, and the Decision interval h = 5.6, are calculated
- The Data is a single columns, each row contains the waiting time of a patient. The data in our example are 100 randomly generated values with a positive skew, the first 50 centred around 1.0, and the second 50 centred around 1.5.
- The CUSUM starts as with the value half of decision interval (h/2). With each new group
- If CUSUM crosses 0 value, it is default to 0
- If CUSUM crosses the Decision Interval h, the alarm is triggered, and the CUSUM starts again with the value of h/2
Resulting CUSUM plot is as shown to the left. Please note the following
CUSUM starts at a value that is half of the decision interval
With each patient, the CUSUM is amended so that
Ct = Ct-1 + X - k (where Xis the waiting time)
When CUSUM crosses the value 0, it defaults to 0
When CUSUM reaches or exceeds the decision interval (h), an alarm is triggered, and the CUSUM restarts at h/2
CUSUM values remained under the decision interval h in the first 50 values, increases in the second 50 and eventually exceeds the decision interval. An alarm is triggered and the CUSUM starts again, reaching the decision interval again, and the second alarm triggered.
Exponentially Distributed Measurements are data that follows the inverse of the Poisson distribution, and are
sometimes referred to as the Poisson Process. The distribution is positively skewed with a long right tail.
In the Poisson distribution, the number of events occurring over an interval (usually a period of time) is the averaged count (λ).
The inverse of this (β = 1 / λ), is the average interval (time) between events, which follows the exponential distribution.
Many measurements follows the exponential distribution, examples are
- Time to events, the time interval between occurrences. For example, the time between adverse events in a health facility.
- Other environmental qualifiers. For example, instead of number of cells in a volume of fluid, the volume of fluid required to find one cell
In many circumstances, the use of the complex mathematics involved with Exponential distribution may be unnecessary, as skewed data can be transformed by logarithm, square root, cube root, or the Box Cox transformation to become approximately Normal. StatTools provides algorithms for transformation in Numerical Transformation Program Page
Should users still wish to proceed with CUSUM for Exponential distribution, the following discussion use the example parameters and data from CUSUM for Exponentially Distributed Measurements Program Page. This example evaluates the quality of an age care facility. Instead of using the number of falls per month as a measurement of quality of care (Poisson distributed count), the time intervals between successive falls is used (Exponential or Inverse Poisson distribution).
- Measurement in control β0 is our benchmark, and in our example β0 = 100. We expect that, in our nursing home, patient would not fall over more frequently than once every 100 days
- Measurement out of control β1 in our example β1 = 80. If the quality of care deteriorates, falls would become more frequent, and the intervals between falls would decrease. Our CUSUM is designed to trigger an alarm if the intervals between falls decreases to 80 days or less
- Average Run Length is the average number of falls found before an alarm is triggered, even if the situation is in control. This is equivalent to the False Positive Rate. In our example we assigned ARL=100, a false positive rate of 1% or α=0.01
- From measurements in and out of control, and the ARL, the Reference Value, k = 0.89, and the Decision interval h = -5.55, are calculated
- The Data is a single columns, each row contains the number of days in successive falls. The data in our example are 100 randomly generated values with a positive skew, the first 50 centred around 100, and the second 50 centred around 80.
- The CUSUM starts as with the value half of decision interval (h/2). With each fall
The number of days between falls = n
X = n / β0
Ct = Ct-1 + X - k
- If CUSUM crosses 0 value, it is default to 0
- If CUSUM crosses the Decision Interval h, the alarm is triggered, and the CUSUM starts again with the value of h/2
Resulting CUSUM plot is as shown to the left. Please note the following
To detect a decrease, h is a negative value
CUSUM starts at a value that is half of the decision interval
When CUSUM crosses the value 0, it defaults to 0
When CUSUM reaches or exceeds the decision interval (c <= h), an alarm is triggered, and the CUSUM restarts at h/2
CUSUM values remained above the decision interval h in the first 50 values, decreases in the second 50 and eventually exceeds the decision interval. An alarm is triggered and the CUSUM starts again.
CUSUM for Normal, Binomial, Poisson, and Inverse Gaussian Distributions
- CUSUM : Hawkins DM, Olwell DH (1997) Cumulative sum charts and charting for
quality improvement. Springer-Verlag New York. ISBN 0-387-98365-1 p 47-74
- Computer program to calculate CUSUM decision limits can be downloaded from
http://www.stat.umn.edu/cusum/software.htm
- Hawkins DM (1992) A fast accurate approximation for average run lengths
of CUSUM control charts. Journal of quality technology 24:1 (Jan) p 37-43 (this is the algorithm
used by StatTools in the CUSUM for Normally Distributed Means Program Page.
CUSUM for Proportions (Bernoulli) Distribution
- Reynolds Jr. MR and Stoumbos ZG (1999) A CUSUM Chart for Monitoring
a Proportion when Inspecting Continuously. Journal of Quality Technology
vol 31: No. 1. p.87 - 108
- Reynolds MR and Stoumbos Z G (2000) A general approach to modeling CUSUM charts for a proportion IIE Transactions 32:6 515-535
CUSUM for Exponential (Inverse Poisson) Distribution
- Gan FF (1994) Design of Optimal Exponential CUSUM Charts. Journal of Quality Technology 26:2 p. 109-124. Program
code in Fortran available at url = http://lib.stat.cmu.edu/jqt/26-2
- Gan FF and Choi KP (1994) Computing Average Run Lengths for Exponential CUSUM Schemes. Journal of Quality Technology
26:2 p. 134-143
Contents of k : 10
|