The Modern Application of Statistical Studies
French mathematician and astronomer Pierre Simon de Laplace (1749–1827) said, ‘The most important questions of life are indeed, for the most part, really only problems of probability.’ De Laplace was an expert on probability. In Théorie Analytique des Probabilités (1812), he wrote, ‘The theory of probabilities is at bottom nothing but common sense reduced to calculus.’
The basic principle of probability states the likelihood that some random event will or will not happen; the more likely the higher the probability. When you toss a coin, there’s a 50% chance that it’ll land heads up and another 50% tails, so the probability of each is half. Probability is a branch of mathematics that studies the pattern in probability from a quantitative angle. Originating in 17th century calculations of risks involved in gambling and seafaring, as well as errors in measurement, it developed rapidly with advancements in scientific technology, and inspired and was inspired by other disciplines. It has wide application in modern technology, industrial production, finance and insurance, and forms the basis of quantitative statistics.
Probability theory is about ‘applying common sense to inference’, de Laplace said simply. The story of ‘The Boy Who Cried Wolf’ is a case in point, as quoted by Prof. Chan Ping-shing of the Department of Statistics at the CUHK Orientation Camp, to explain to students about statistics. When the sheep herder cried wolf the first time, the villagers thought it was true as it usually was. When he did it again, they still believed him. Every time he cried wolf, the probability of it being a lie is the villagers’ statistical reference. Once they have enough data, there would be a switch in their decision and they would stop believing his cries for help. Professor Chan remarked, ‘This fable shows a classical application of statistical data.’
The area of statistics that Professor Chan is most interested in is censored data, in other words, unidentified data that falls outside the net in statistical analyses. ‘For example, we want to measure the heights of a class of Primary 6 students, but because the ruler is only 150 cm long, only 25 students of a class of 30 can have their heights measured. For the remaining five, we only know that they are taller than 1.5 m. The heights of these five are the censored data.’
Censored data are a problem in all statistical analyses. The more serious becomes missing data, which happens when, say, only eight out of 10 questions in a questionnaire are answered or when a question that should have been posed was not due to oversight. Statisticians do not ignore censored or missing data. Using different mathematical models, they take such unapparent data together with all data collected into their calculations in order to arrive at more comprehensive and accurate statistical information. ‘The most commonly used method is conditional probability which infers censored or missing data and inputs it for calculation. The results are then compared to those obtained without that data. If the discrepancy is small, the data deduced are considered effective.’
Censored data are used in gauging the reliability and durability of industrial products, from furniture and electrical appliances to cars, airplanes, and even nuclear plants. Censored data are used to calculate their lifetime distribution, which is their performance under different circumstances and how long they’ll last. This is also Professor Chan’s present research focus. He explained that it is not easy to gather data to assess a product’s life-span under normal circumstances. At present, industry uses the accelerated life test which applies severe environmental stress to the product to induce rapid deterioration, and use the data inferred to calculate its life-span. The ones that survive the test are known as ‘censored observations’.
Professor Chan’s research involves making maximum likelihood statistical inferences from data obtained from these tests, and thereby identifying the most effective accelerated life test for different products to help industry design tests that are efficient, accurate and cost-effective. ‘This shortens the time taken for products to reach the market,’ he says. ‘But the question of how to design a product so that it’s considered reliable and durable is at times a commercial decision. We only provide scientific data for reference.’
Social Bookmarks