Newsletter No. 389

4 No. 389, 19.12.2011 法 國數學和天文學大師拉普拉斯(1749 – 1827)曾 說:「人生至為莫名者,十之八九惟或然之所困 惑也。」拉普拉斯是或然率論的巨擘,他在1812年出版的 《或然率的解釋理論》中更說:「或然之論無他,乃付常 理於推算而已。」 或然率亦稱「機率」,國內現通稱「概率」,基本概念是指 某隨機事件在同一情況下可能發生或者不發生,表示發生 可能性大小的量,就是或然率。以擲硬幣來說,硬幣掉在 地上時,有一半機會是頭像一面朝天,另一半機會是文字 一面,兩者的或然率都是二分一。或然率論是從數量角度 研究或然現象規律性的一門數學分支,源自十七世紀對賭 博、航海風險、測量誤差等問題的研究,其後隨科技的進 步而迅速發展,且與其他學科互相結合,在現代科技、工業 生產,以至金融及保險等經濟活動上的應用非常廣泛,更 是數量統計學的理論基礎。 或然率的理論不過是「付常理於推算」,拉普拉斯把話可 說得簡單。狼來了的故事就是好例子。統計學系 陳炳城 教授在中大迎新營中,便以這個故事向學生解釋統計學 的道理。當牧羊童第一次喊「狼來了」,村民在常理上都 相信的;第二次喊「狼來了」,常理上仍不虞有詐,每一次 喊「狼來了」,牧羊童在說謊的或然率就是村民的參考數 據。到收集了足夠的數據,村民的決定會改變,常理上不 會再相信「狼來了」這鬼話了。陳教授說:「這個故事是典 型的統計應用。」 陳教授最有興趣研究的是統計學裏的刪失數據,即是在進 行統計時,存在於收集範圍以外的不明數據。「例如我們 要量度一班六年級學生的身高,但量度用的尺只有一百五 十厘米長。結果一班三十人,有二十五個可以量度到真正的 高度,餘下五人只知道高於一米半。這五個同學的身高就 是刪失數據。」 所有數據都會出現刪失的問題,較嚴重的便成為遺失數 據,最明顯的例子是一份問卷十條問題只答了八條,或者 或此、或彼、或然 設計問卷時失誤,應問的沒有問。不過統計學家不會放棄 刪失或遺失數據,他們會利用不同的數學模型,把不明顯 的數據和所有收集得來的數據一併估算,以取得較完整和 準確的統計資料。「最常用的方法是利用條件或然率,把 刪失或遺失的數據推斷出來,輸入作估算,然後與不輸入 這些數據的估算結果比較。兩者若相差不遠,推斷的數據 便算有效。」 刪失數據主要應用在計算工業產品的可靠及耐用程度,小 至普通家庭用品如家具及電器,大如汽車、飛機,以至核 電廠,都利用刪失數據估算壽命分布,即是在不同情況下 的表現和可以使用多久。這也是陳教授現時專注的研究範 圍。他解釋說,要收集足夠數據來評估產品在正常情況下 的壽命並不容易,現時業界普遍採用「加速壽命試驗」進 行測試,把物件置於極大應力下加速其損壞,然後利用所 得數據,推算產品應有的壽命。那些在測試時間內仍倖存 的物件,稱為刪失觀測數據。 陳教授的研究,就是從利用測試實驗數據進行最可能統 計推斷,找出不同物件按不同標準應接受的最有效應力 測試,以助業界設計出省時、準確和符合經濟效益的測試 實驗。「新產品推出市場的時間由此可以縮短,」他說。 「不過產品應如何設計才算是可靠和耐用,有時候是商業 決定,我們只提供科學數據作參考。」 F rench mathematician and astronomer Pierre Simon de Laplace (1749–1827) said, ‘The most important questions of life are indeed, for the most part, really only problems of probability.’ De Laplace was an expert on probability. In Théorie Analytique des Probabilités (1812), he wrote, ‘The theory of probabilities is at bottom nothing but common sense reduced to calculus.’ The basic principle of probability states the likelihood that some random event will or will not happen; the more likely the higher the probability. When you toss a coin, there’s a Possibly, Probably, Maybe 50% chance that it’ll land heads up and another 50% tails, so the probability of each is half. Probability is a branch of mathematics that studies the pattern in probability from a quantitative angle. Originating in 17th century calculations of risks involved in gambling and seafaring, as well as errors in measurement, it developed rapidly with advancements in scientific technology, and inspired and was inspired by other disciplines. It has wide application in modern technology, industrial production, finance and insurance, and forms the basis of quantitative statistics. Probability theory is about ‘applying common sense to inference’, de Laplace said simply. The story of ‘The Boy Who Cried Wolf’ is a case in point, as quoted by Prof. Chan Ping-shing of the Department of Statistics at the CUHK Orientation Camp, to explain to students about statistics. When the sheep herder cried wolf the first time, the villagers thought it was true as it usually was. When he did it again, they still believed him. Every time he cried wolf, the probability of it being a lie is the villagers’ statistical reference. Once they have enough data, there would be a switch in their decision and they would stop believing his cries for help. Professor Chan remarked, ‘This fable shows a classical application of statistical data.’ The area of statistics that Professor Chan is most interested in is censored data, in other words, unidentified data that falls outside the net in statistical analyses. ‘For example, we want to measure the heights of a class of Primary 6 students, but because the ruler is only 150 cm long, only 25 students of a class of 30 can have their heights measured. For the remaining five, we only know that they are taller than 1.5 m. The heights of these five are the censored data.’ 統計學的現代應用 The Modern Application of Statistical Studies 洞 明 集

RkJQdWJsaXNoZXIy NDE2NjYz