Newsletter No. 540

04 # 5 4 0 | 1 9 . 0 6 . 2 0 1 9 to mimic this if it is to sound perfectly human. Through studying lexicons in their different manifestations in natural speech, Professor Meng’s team is able to deliver accurate and full meanings in synthesized speech. Further, speech conversion technology could transport the characteristics of a human voice in a language into another. If the characteristics of Hawking’s voice in his spoken English are captured and analysed, then it’s possible to re-present his voice speaking Chinese to a Chinese audience. In Hong Kong, around 50,000 people suffer from speech impairment. Forty per cent of them are unable to communicate orally. The Hospital Authority has published a special book which enables those with speech impairment to communicate by pointing at images. Professor Meng has gone a step further and developed a customizable version of the book—e- Commu-Book. Upon the user clicking an icon thereon, the e-Commu-Book will read out the corresponding lexicon. The user can then edit the content by, say, inputting the picture or the appellation of a family member. The e-Commu-Book will then convert the text into speech. Collaborating with Microsoft, Professor Meng’s team has developed the e-Commu-Book in 13 languages covering over 20 vernaculars. In recent years, Professor Meng has been dedicated to developing Cantonese smart speech systems for patients with stroke and cerebral palsy. Like other emerging technologies, speech technology comes with security problems. ‘Some security systems use speech for identification, making speech synthesis a convenient tool for sabotage. Our attention is also turned to creating “shields” to keep synthesized speech distinct and distinguishable from human speech,’ said Professor Meng. M. Mak W hen we talk about speech technology, Stephen Hawking, the late physicist, may come to mind. Tracing his eyeball movement, his speech synthesizer could pick up the letters one by one, and read out rather mechanically the words and sentences formed. Prof. Helen Meng (middle) of the Faculty of Engineering spoke on ‘Artificial Intelligence in Speaking and Listening for Learning and Well-Being’ in the fifth lecture of ‘The Pursuit of Wisdom’ Public Lecture Series on 3 June. She shared with the 200 audience members present how artificial intelligence may be applied to enhance speech technology and used in aid of communication and language learning. The speech recognition technology developed by Microsoft can transcribe human speech with an error margin of 5%, which is as good as human communication.When it is unsure about a certain sound, it can take into account the context of the speech to make better guesses, like humans do. Applying speech recognition technology, Professor Meng developed a language learning platform which can not only identify words accurately, but also detect mispronunciation and perform diagnosis. ‘Take, for example, the interdental fricative sounds which are absent in Cantonese and Putonghua. Speakers whose mother tongue is Cantonese may mispronounce “thick [ θ I k]” as [f I k] while those of Putonghua may mispronounce the same word as “sick [s I k]”,’ said Professor Meng. The platform can detect these discrepancies and generate corrective feedback. In addition to giving the correct pronunciation, the platform uses animation to illustrate the sound’s articulation. The challenges are amplified going from single words to sentences. The meaning of a sentence derives not only from the order of the words but also from whether the speaker has enounced some part of it in a special way to achieve a certain purpose. AI-synthesized speech has 提 起語音技術,大家可能想到已故物理學家霍金,他所使用的 語音合成器透過追蹤其眼球移動,逐一選擇字母,以機械化 的語音拼讀出單字和句子。在6月3日舉行的「智慧的探索」 公開講座系列第五場,工程學院 蒙美玲 教授(中)以「懂聽懂說的人工 智能如何改善人類的學習及生活」為題,和二百多位現場聽眾分享怎 樣用人工智能改進語音技術,同時應用這些技術於語言學習和輔助 溝通。 現時微軟的語音識別技術誤差率僅5%,與人類相同。遇上不確定的 語音,人工智能便會仿效人類,從上文下理推斷。蒙教授運用語音識 別技術,開發一套語言學習系統,除可頗準確地辨別語音外,還可顯 示發音錯誤之處。蒙教授舉例:「廣東話和普通話沒有齒間摩擦音, 母語為廣東話的人常把英文thick [ θ I k ]讀作 [ f I k ];母語是普通話的 則容易讀成sick[ s I k ]。」該系統可以顯示這些偏差,說話者便會「知 錯」。然後,系統產生糾正反饋,讀出正音,同時以視像示範發音部位 的變化,使用者便「能改」。 合成句子比單字層面挑戰更大。一般句子的意義除了根據單字的排 序外,也視乎說話人在個別字音會否以拉長、加重或調高音調來達到 特別效果。人工智能要完全模擬活人說話,便必須做到這一點。蒙教 授的團隊透過研究同一字詞不同處理方法的特徵差異,在合成發聲 中充分拿捏箇中分毫,從而傳達準確及完整的意思。另一方面,語音 轉換技術可以把某人在一個語言的聲音特徵轉移至另一個語言,例 如掌握了霍金說英語的聲音特徵,系統可以運用他的聲音合成維妙 維肖的中文發音。 現時香港約五萬人有口語障礙,當中四成沒有口語交流能力。醫院管 理局特製了供口語障礙者指圖示意的溝通書。蒙教授的團隊將之改 良,製作可個人化的電子版溝通書,用者點擊圖像後,電子書便會讀 出相關字詞,用者可按個人需要增減內容,例如輸入親友的相片和稱 謂,電子書會把文字轉換成語音讀出。蒙教授的團隊與微軟合作,至 今電子溝通書版本已增至十三種語言,涵蓋二十多種口音。蒙教授近 年還致力為中風及腦癱等病人研發粵語智能語音系統,以幫助他們 發聲。 與其他新興科技一樣,語音技術同樣會帶來安全問題。蒙教授說: 「一些保安系統是用語音來識別身分的,語音合成技術或者可以攻 破這些系統。我們也正在研究一些『盾』,以分辨合成的語音和自然 語音。」 Natural Language, Artificial Tongue 為人類發聲 A.I.

RkJQdWJsaXNoZXIy NDE2NjYz