Example 1 : Development of the reference pattern
Italian | -+--+ |
Italian | +--+- |
Italian | -++-- |
Italian | -++-- |
Italian | +-+-- |
Italian | +-+-- |
Italian | +--+- |
Italian | +-+-- |
Italian | +-+-- |
Italian | +--+- |
French | -+-+- |
French | -+--+ |
French | -++-- |
French | -+-+- |
French | -++-- |
French | +--+- |
French | +---+ |
French | +--+- |
French | +-+-- |
French | +--+- |
German | -+--+ |
German | -+--+ |
German | -+--+ |
German | -+-+- |
German | -+--+ |
German | -+--+ |
German | -++-- |
German | +--+- |
German | +---+ |
German | +-+-- |
The default example data from The Pattern Probability Analysis (Analysis Using Reference Data) Program Page
is used in this example. The data is computer generated to demonstrate the procedures, and not based on real observations. The three groups are replaced with ethnic identity of Italian, French, and German in this page.
We wish to develop a system of classification to distinguish Europeans into Italian, French, and German. We hope to develop a model which we can use in the future to classify any one we see into one of these 3 ethnic groups. The model will use hair and eye colors.
- Hair Color, -(false) or +(true) for dark color hair, and - or + for light color hair. Three patterns are therefore available
- +- for dark color hair
- -+ for light color hair
- -- for no information on hair color
- Eye Color, - or + for brown eyes, - or + for blue eyes, and - or + for other color eyes. Four patterns of eye colors are
therefore available
- +-- for brown eyes
- --+ for blue eyes
- -+- for eyes of any color other than brown and blue
- --- for no information on eye color
- Putting these together, we have a set of attributes or variables in 5 characters, the first two characters represent hair color,
and the last 3 eye color
Step 1 : We collected a reference sample of individuals to build our classification model, consisting of 10 each of Italians, French, and Germans. We noted the color of their hair and eyes and use these to build our model.
| Italian | French | German |
Ch 1 (dark hair +) | 7 | 5 | 3 |
Ch 2 (light hair +) | 3 | 5 | 7 |
Ch 3 (Brown eyes +) | 6 | 3 | 2 |
Ch 4 (Other color eyes +) | 3 | 5 | 2 |
Ch 5 (Blue eyes +) | 1 | 2 | 6 |
Step 2 : The program counts the number of trues (in this case +) in all the attributes in all the groups in all the cases of the reference population. The results are as shown in the table to the left.
Of the 10 Italians, 7 had dark color hair and 3 had light color hair. There were 6 with brown eyes, 1 with blue eyes, and 3 with other color eyes.
Of the 10 French, there were 5 each with dark and light color hair, 3 with brown eyes, 2 blue, and 5 other color eyes.
Of the 10 Germans, 3 had dark and 7 light color hair, 2 with brown eyes, 6 with blue eyes, and 2 with other color eyes.
| Italian | French | German |
---|
Ch 1 (dark hair +) | 0.7 | 0.5 | 0.3 |
Ch 2 (light hair +) | 0.3 | 0.5 | 0.7 |
Ch 3 (Brown eyes +) | 0.6 | 0.3 | 0.2 |
Ch 4 (Other color eyes +) | 0.3 | 0.5 | 0.2 |
Ch 5 (Blue eyes +) | 0.1 | 0.2 | 0.6 |
Step 3 : The program then calculates the relative frequencies of these attributes in the group, formally the probability of each attribute in each group, P(Attribute|Group), or commonly denoted P(x|j). The results are shown in the table to the right. The probability is calculated by dividing each count by the total counts in that group.
The P(x|j) table represents the relationship between the groups and attributes, and is used to model subsequent classifications on independent sets of individuals. The table to the right, without the first column of labels, is used by the
Pattern Probability (Use of Reference Pattern on New Data) Program Page
to allocate individuals into groups.
Step 4 : The model, in terms of the P(x|j) table, is validated using the same data that creates it. As the process is the same as for Maximum Likelihood in Example 2, it will be discussed there.
Example 2 : Use of established reference patterns on a set of data
Italian | French | German |
---|
0.7 | 0.5 | 0.3 |
0.3 | 0.5 | 0.7 |
0.6 | 0.3 | 0.2 |
0.3 | 0.5 | 0.2 |
0.1 | 0.2 | 0.6 |
This example demonstrates how the pattern established using a set of reference data, such as in Example 1, is used to classify a new set of data, assigning individuals to a group according to the relevant attributes. The calculations are as carried out in the
Pattern Probability (Use of Reference Pattern on New Data) Program Page
.
The table to the left shows the default example reference pattern, which is the P(Attribute|Group), or commonly denoted P(x|j), table obtained from the reference data in Example 1.
+-+-- |
+--+- |
+---+ |
-++-- |
-+-+- |
-+--+ |
---+- |
The default example data to be allocated into groups are as shown to the table to the right. The 7 individuals are
- +-+-- Dark hair brown eyes
- +--+- Dark hair other color eyes
- +---+ Dark hair blue eyes
- -++-- Light hair brown eyes
- -+-+- Light hair other color eyes
- -+--+ Light hair blue eyes
- ----+ Bald (color hair unknown) blue eyes
Row | Pattern | Groups P(x|j) |
| | Italian | French | German |
1 | +-+-- | 0.42 | 0.15 | 0.06 |
2 | +--+- | 0.21 | 0.25 | 0.06 |
3 | +---+ | 0.07 | 0.1 | 0.18 |
4 | -++-- | 0.18 | 0.15 | 0.14 |
5 | -+-+- | 0.09 | 0.25 | 0.14 |
6 | -+--+ | 0.03 | 0.1 | 0.42 |
7 | ----+ | 0.1 | 0.2 | 0.6 |
Step 1 : The establishment of a table of probability of each pattern for each group, the
P(pattern|Group), commonly denoted as the
P(x|j) table, where x represents the pattern, and j the group. This is calculated using the reference pattern table, multiplying all the probabilities in a group whenever a positive (+) attribute exists. The results are as shown in the table to the right.
This means that Italians have a 42%, French 15% and Germans 6% probability of having someone like that in row 1, dark hair and brown eyes (+-+-- row 1), and Italians have 10%, French 20% and Germans 60% probability of having someone who is bald and has blue eyes (----+) row 7.
This table is important, as it is used for all subsequent calculations.
Row | Pattern | Groups P(j|x) |
| | Italian | French | German |
1 | +-+-- | 0.67 | 0.24 | 0.10 |
2 | +--+- | 0.40 | 0.48 | 0.12 |
3 | +---+ | 0.20 | 0.29 | 0.51 |
4 | -++-- | 0.38 | 0.32 | 0.30 |
5 | -+-+- | 0.19 | 0.52 | 0.29 |
6 | -+--+ | 0.05 | 0.18 | 0.76 |
7 | ----+ | 0.11 | 0.22 | 0.67 |
Step 2 : The construction of the Maximum Likelihood Table also called the P(Group|Pattern) or more commonly denoted P(j|x) table, and represents the probability of belonging to each of the groups for the pattern in each row. This is calculated by dividing each probability in the P(pattern|Group) table by the total across all the groups.
The table is as shown to the right. This means that someone with dark hair and brown eyes (+-+-- row 1) has a 67% probability of being Italian, 24% probability of being French, and 10% probability of being German. He is then classified to the most likely group, Italian at 67% (in bold).
Likewise, someone with light color hair and blue eyes (-+--+ row 6) has a 5% probability of being Italian, 18% probability of being French, and 76% probability of being German. He is classified to the most likely group, German (in bold).
Where data is incomplete, as in the case of the bald man with blue eyes (no information on hair color ----+ row 7), the algorithm is still able to make a decision based on what is available, and assigned him to the German group (in bold).
The Maximum Likelihood is the first of the Bayesian Probability calculations, based only on the observed attributes, without taking anything else into consideration.
Step 3 : The construction of Bayesian Probability Table, taking into consideration the apriori probability of belonging to each of the groups. The table is also called P(group|pattern,π) or P(j|x,π).
The Maximum Likelihood is based on the assumption that the probability of being in any of the groups is the same, except for the observed attributes. This is seldom the case in reality. If we were to take our model to Rome, to Paris, or to Dresden, the probability of someone to be Italian, French, or German would be very different even before we observed the attributes. Such a probability, the apriori probability (π) needs to be taken into account.
The program takes π into consideration by using an array of apriori indicators. This is an array which contains relative probabilities of belonging to each group. The values to be entered by the user can be in any measurements (number of cases, probabilities, ratios), and the program normalize these values into probabilities before calculation.
The default example in the Pattern Probability (Use of Reference Pattern on New Data) Program Page
is "1 1 1", indicating that the apriori probability in the 3 groups are the same (normalized to 0.33 each). The results of the calculations will be the same as that from the Maximum Likelihood table.
If we are to use the reference patterns in say Zurich, a predominantly German speaking part of Switzerland, we may find that , for each Italian in town, there are 2 Frenchmen and 4 Germans, so the apriori probability is "1 2 4", or the probability of being a German is twice of being French and four times of being an Italian.
Row | Pattern | Groups P(j|x,π) |
| | Italian | French | German |
1 | +-+-- | 0.44 | 0.31 | 0.25 |
2 | +--+- | 0.22 | 0.53 | 0.25 |
3 | +---+ | 0.07 | 0.20 | 0.73 |
4 | -++-- | 0.17 | 0.29 | 0.54 |
5 | -+-+- | 0.08 | 0.43 | 0.49 |
6 | -+--+ | 0.02 | 0.10 | 0.88 |
7 | ----+ | 0.03 | 0.14 | 0.83 |
If we were to add such an Apriori array into calculation, the program will firstly normalize "1 2 4" to proportions of "0.14 0.29 0.57", meaning the apriori probabilities are 14% Italian, 29% French, and 57% German. The Bayesian Probability table taken apriori probabilities into consideration would be as shown to the right.
It can be seen that, in Zurich, someone with dark hair and brown eyes (+-+-- row 1) is still most probably Italian at 44%,
someone with dark hair and eyes of any color but blue or brown (+--+- row 2) most probably French at 53%, and most probably German with all other combinations of attributes.
Step 4 : The construction of Bayesian Probability Table, taking into consideration the apriori probability of belonging to each of the groups, and also include a cost function for error. The table is also called P(group|pattern,π,cost) or P(j|x,π,cost).
The cost function for a group conceptually represent a measurement of cost or loss, if a case erroneously fails to be assigned to that group. An obvious example is the diagnosis of a swelling on the face. It can be a bruise, an infection, or a cancer. To miss a cancer when there is one would be much more serious (greater cost) than that for an infection, than that for a bruise.
Common practice is to include the cost function after including the apriori probabilities. If cost is to be considered without apriori, the apriori array can be assigned equal values for all groups. The unit for cost can be any measurement, in money, time, or arbitrary units of judgement. The program normalized the array into fractions before use.
Row | Pattern | Groups P(j|x,π,cost) |
| | Italian | French | German |
1 | +-+-- | 0.70 | 0.17 | 0.13 |
2 | +--+- | 0.46 | 0.37 | 0.18 |
3 | +---+ | 0.19 | 0.18 | 0.64 |
4 | -++-- | 0.39 | 0.21 | 0.40 |
5 | -+-+- | 0.20 | 0.38 | 0.42 |
6 | -+--+ | 0.05 | 0.10 | 0.85 |
7 | ----+ | 0.10 | 0.13 | 0.77 |
The default example for costs in the Pattern Probability (Use of Reference Pattern on New Data) Program Page
is "1 1 1" indicating that there is no cost difference between the groups. However, if we are looking desperately for an Italian interpreter in Zurich for an important function, missing an Italian may cost 3 times as much as missing a Frenchman or a German, and we may use a cost array such as "3 1 1", which the program will normalize to "0.6 0.2 0.2", and the results can be seen as in the table to the right.
We can see now that we would assign anyone with dark hair and eyes not blue as Italians (first 2 rows), and everyone else as Germans, because most people in Zurich are Germans, and not correctly identifying an Italian costs 3 times more than similar mistakes for the other groups. We would not assign anyone to be French at all.
Apparent inconsistencies in tables on this page
The servers of php pages uses 32 bit mathematics, so calculations are precise to many decimal points. StatTools by default presents numbers to 4 decimal points precision. However, in this explanation page, results are presented only to 2 decimal point precisions to conserve space. Rounding errors may therefore make some results appear inconsistent. For an example, 0.1249 may be rounded to 0.12, and 0.1251 may be rounded to 0.13. Multiplying both by 2 produces expected results of 2 x 0.12 = 0.24 and 2 x 0.13 = 0.26 when in the program they are actually 2 x 0.1249 = 0.2498 and 2 x 0.1251 = 0.2502, both rounded to 0.25.
As most probability results are presented to 2 decimal point precision, the results produced are adequate for use. The confusion lies only in translating these results to this the explanation page, when all the interim results are also presented to two decimal point precision. Users should therefore be aware of this and not be confused or alarmed by the apparent inconsistencies.
Formatting data, both input and results
The Pattern Probability model allows the use of multiple variables or attributes to assign cases to groups, the attributes are commonly binary parameters, but sometimes can also contain multiple mutually exclusive categories. In the examples, this page uses
2 hair colors (binary) and 3 eye colors (3 mutually exclusive groups). The program will also produce results when confronted with missing information. Although the concept is straight forward, formatting the data input, combining the different types of attributes is a challenging problem.
Warners et.al., in 1961 (see reference), first describe the use of this model, to diagnose congenital Heart Disease in children. For a binary variable he used the frequency, or 1-frequency, depending on whether the attribute is true or false for that variable. For multiple categories, he listed the categories and the frequency for each, and chose the appropriate category. This allowed the creation of a simple two dimensional table of frequencies, but each attribute required its own unique management before it could be entered into calculations. In 1961, when most computation were manual, such a solution was time consuming but workable.
Overall and Klett, in 1972, used this model to classify psychiatric disorders, and used large number of diagnostic criteria, each having anything up to 10 categories, so the data had to be presented as large multi-dimensional tables. This was possible because the data were part of a Fortran Program, stored on punch cards, with a format unique to each run of the program. However such a format cannot be visualized or manipulated easily in the interactive web based environment of StatTools.
The data format used in these pages are therefore unique to StatTools, modifying the formats from these two predecessors to retain the advantages and overcoming the disadvantages. The principles are as follows
- The attributes are presented as a continuous sequence of + for present or - for absent.
- A binary attribute is presented in 2 columns. +- for true or yes, -+ for false or no, -- for missing information
- An attribute with 3 categories are presented in 3 columns. +-- for first, -+- for second, --+ for third, and --- for
missing information
- An attribute with 4 categories are presented in 3 columns. +--- for first, -+-- for second, --+- for third, ---+ for fourth,
and ---- for missing information
- And so on for 5, 6, and multiple categories.
- All the Attributes are then strung together in a single string for use.
Such an approach allows a large number of attributes with heterogeneous categories to be handled flexibly on the web page. It also
allows the inclusion of missing information without disrupting data processing.
The disadvantage is that such a string is not easily interpretable, and error prone if it is to be assembled manually. However, the data can be easily handled using the Excel spread sheet or a small Javascript program by anyone who is experienced with these tools.