Seq unpaired exp

Methodological Considerations Terminology Settings

General discussions on sequential analysis are presented in Sequential Analysis Explained and will not be repeated here.

This page briefly explains the Triangular Test that was developed in the late 1980s and 1990s by Whitehead. For those interested in obtaining a full understanding of the theories and methodologies of the Triangular Test, Whitehead's text book (see reference) is highly recommended.

The Triangular test is a sequential statistical method for comparing two groups, based on the relationship between Fisher's information V (expression of the quantity of data) and the efficiency score Z (expression of the effect size). The calculation of V and Z depends on the nature of the measurements concerned, but the interpretations of their relationship are the same. V and Z can be calculated at any time during the study.

Statistical borders are drawn that allows the researcher to make one of 3 decision whenever the data is reviewed. These are to continue with the experiment, to reject the null hypothesis and stop the experiment, or to accept the null hypothesis and stop the experiment.

The baseline and two borders forms a triangle. While the V / Z plot remains within the triangle, no decision should be made other than to collect more data. If the plot crosses the outer border, then the null hypothesis can be rejected (significant difference exists). If the plot crosses the inner border than the null hypothesis can be accepted (no significant difference).

The primary straight line borders are calculated on the assumption that the data will be reviewed after the data is obtained from each case as they become available. These borders are narrowed to become more powerful if the data are reviewed less frequently, the extent of the narrowing depending on the sample size between the reviews. The final borders, with periodic narrowing, looks like a Christmas tree.

The methods of calculating the effects size for different types of data, and how the borders are defined, will not be explained in details in these pages. These are well described in Whitehead's book, and the algorithms can be easily obtained in published papers (see reference)

Please note that the coordinates defined in these pages are based on a 2 tailed test (detecting a difference in either direction). If a one tail test is to be used (one group more than the other but not interested if it is the other way around) then the type I Error (α) used should be doubled (eg. 0.1 instead of 0.05), and the stopping borders not related to the null hypothesis deleted from the plot.

Please also note that the stopping borders are not affected by the ratio of sample size between the two groups, as these are calculated from α, β(1-power), and the effect size θ. Discrepancies between the two sample sizes however affects the calculation of V and Z from the data, so will alter the predicted sample size requirement at the planning stage.

At the end of the analysis, a termination test can be done by calculating the final effect size Theta T=Z/sqrt(V). For the null hypothesis, T0=0 and SD(T0)=sqrt(V). The normalized z test can therefore be used to test whether T deviates from null.

In his book, Whitehead presented 5 models, for normally distributed means, Poisson distributed counts, binomially distributed proportions, survival rates, and non-parametric ordinal arrays. These are discussed individually as in the following sections.

Data Input Output Example

Input : In addition to α, power, and ratio that are common inputs to all models, the following inputs are used

Lamda (λ) : is the averaged event rate (λ=k/n) where k is the observed number of events (pregnancy, asthma attacks, car crashes, falls, number of cells), and n is number of units of observation (number of women years, number of children months, per thousand cars per year, per hundred beds per month in a hospital, number of microlitres of body fluid). λ₁ and λ₂ are the event rates of group 1 and 2 the study is design to detect.
Theta (θ) is the effect size, and calculated from λ₁ and λ₂ according to equation 3.12, p. 37
The data : are in 4 columns, separated by spaces or tabs. They are the number of units of observation and number of events observed in group 1, followed by that in group 2 (n₁ k₁ n₂ k₂). Each row represents the accumulated numbers found at each review, in temporal order.

Output : V, Z, and the Z values to reject or accept the null hypothesis are common to all data types, and in this model are calculated from the following observations

n1 and n2 are the number of observation units in the two groups
k1 and k2 are the number of observed events in the two groups
Z and V are calculated according to equations 3.10 (p.35) and 3.11 (p.36)
Z(sig) and Z(nsig) are adjusted according to equation 4.8 p. 82

Data Input / Output Example

Input : In addition to α, power, and ratio that are common inputs to all models, the following inputs are used

Difference Between means = mean(group 1) - mean(group 2). The study is then designed to detect this difference.
Within Group Standard Deviation (SD) is the expected background or population Standard Deviation. This is usually not available at the planning stage, and some sort of guesstimate is commonly used.
Theta (θ) = difference between means / SD
The data are in 6 columns, separated by spaces or tabs. They are sample size, mean, and Standard Deviation found in group 1, followed by the same for group 2 (n₁ mean₁ SD₁ n₂ mean₂ SD₂). Each row represents the accumulated numbers found at each review, in temporal order.

Output : V, Z, and the Z values to reject or accept the null hypothesis are common to all data types, and in this model are calculated from the following observations

n1 and n2 are the number of cases in the two groups
mean1 and mean2 are the mean values found in the two groups
sd1 and sd2 are the Standard Deviations found in the two groups
Z and V are calculated according to equations 3.36 and 3.37 (p.50)
Z(sig) and Z(nsig) are adjusted according to equation 4.8 p. 82

Data Input / Output Example

Input : In addition to α, power, and ratio that are common inputs to all models, the following inputs are used

Proportion + is the number of positives in the index of interest divided by the total number p=n(pos) / (n(pos)+n(neg)). In the example, the study is designed to detect an effect size where group 1 has 50% positives and group 2 20%
Theta (θ) is the effect size, and calculated according to equation 3.3 p 32
The data : are in 4 columns, separated by spaces or tabs. They are numbers positive in group 1 and group 2, then numbers negative in group 1 and group 2 (n₁(pos) n₂(pos) n₁(neg) n₂(neg)). Each row represents the numbers found at each review, in temporal order.

Output : V, Z, and the Z values to reject or accept the null hypothesis are common to all data types, and in this model are calculated from the following observations

Pos1, Pos2, Neg1, and Neg2 are the number of cases where the attribute of interest is positive or negative in the two groups
Prop1 and Prop2 are the proportions positive in the two groups, where prop = Pos/Neg
Z and V are calculated according to equations 3.8 and 3.7 (p.33)
Z(sig) and Z(nsig) are adjusted according to equation 4.8 p. 82

Data Input / Output Example

Input : In addition to α, power, and ratio that are common inputs to all models, the following inputs are used

Number of divisions is the number of cells in the arrays to be compared. Examples are 3 division in some pain scale (0=no pain, 1=some pain, 2=severe pain) or the 5 division Likert Scale (0=Strongly Disagree, 1=Disagree, 2=Neutral, 3=Agree, and 4=Strongly Agree).
Expected Proportions is the table of expected proportions with two columns for the two groups, and the number of rows the number of divisions. Each cell contains the proportion or probability for that division in that group, so that each column sums to the totality of 1. This defines what the study is designed to detect.
Theta (θ) is the effect size, and calculated from the table of proportions, according to equation 3.25 p. 46
The data : are in 2 columns, separated by spaces or tabs. The number of rows is the multiple of the number of reviews and number of divisions (rows=reviews x divisions) Each cell contains the number of cases in that group for that division in that review.

Output : V, Z, and the Z values to reject or accept the null hypothesis are common to all data types, and in this model are calculated from the following observations

n1 and n2 are the number of observation units in the two groups
mean1 and mean2 are the mean values observed in the two groups
Z and V are calculated according to equations 3.27 and 3.28 (p.47)
Z(sig) and Z(nsig) are adjusted according to equation 4.8 p. 82

Data Input / Output Example

Input Survival rates were initially used to define survival from cancer at a specified time after diagnosis or commencement of treatment. The related statistics however are applicable to any time related events.

Number of time intervals is the number of intervals of time that has elapsed at the time survival rate is evaluated. For example, in cancer, the 5 year survival rate uses a year as the time interval, and the rate is evaluated after 5 intervals of time (5 years). In testing the quality of light bulbs, the 100 hours survival rate (where the light bulb has not as yet burnt out) uses an hour as the time interval, and the rate is evaluated 100 hours after the light bulb is switched on. In testing a contraceptive, the survival rate (not pregnant) uses a month or menstrual cycle as the time interval, and the rate (proportion not pregnant) is evaluated after two years (24 months or cycles). The statistical analysis takes into consideration that, at a particular evaluation, all the cases have not been in the study for the same period of time, and the event of interest (death, burnt out, pregnancy) occurs at different times. The data is therefore said to be censored.
Theta (θ) is the effect size, and calculated according to equation 3.9 p. 35
The data : are in 4 columns, separated by spaces or tabs. The number of rows is the multiple of the number of reviews and number of time intervals (rows=reviews x time intervals)
The columns are the number who died (the event of interest has occurred) and the number who survived in group 1, followed by the same for group 2, for that time interval in that review.

Output :
V, Z, and the Z values to reject or accept the null hypothesis are common to all data types, and in this model are calculated from the following observations

TObs1 and TObs2 are the number of cases in the two groups that are in the study at that review
TEvs1 and TEvs2 are the number of cases that survived in the two groups at that review
Surv1 and Surv2 are the adjusted survival rates in the two groups at that review (taking into consideration of numbers lost to follow up, numbers not in the study long enough to be included, and the different times events occurred)
Z and V are calculated according to equations 3.20 and 3.21 (p.42)
Z(sig) and Z(nsig) are adjusted according to equation 4.8 p. 82

Whitehead John (1992). The Design and Analysis of Sequential Clinical Trials (Revised 2nd. Edition) . John Wiley & Sons Ltd., Chichester, ISBN 0 47197550 8.