New AI Approach by CUHK Engineering Investigates Multiple Gene Regulatory Mechanisms Concurrently for the Advancement of Biomedical Research

The following article is originally published in CUHK Press Releases on 30 August 2020.

Kevin Yip

A research team from the Department of Computer Science and Engineering (CSE) at The Chinese University of Hong Kong (CUHK) has developed a new Gene Expression Embedding frameworK (GEEK), which uses artificial intelligence technologies in machine learning and natural language processing to study the regulation of gene expression. In contrast to previous works that focused on one or a few regulatory mechanisms at a time, this new framework can study the joint effects of many mechanisms simultaneously. A research article describing this new study has been published in the renowned international science journal Nature Machine Intelligence. The framework may help study the causes of cancers and treatment methods.

Each human body contains tens of trillions of cells. While they mostly share the same DNA sequences, their gene activities can be markedly different. Such activities, referred to as “gene expression”, are affected by many regulatory mechanisms, such as transcription factor binding and protein interactions. In 2017, Prof. Kevin Yip from CUHK CSE and his research team studied one of the mechanisms that involves regulatory elements called enhancers. They investigated how enhancers are related to gene expression, and applied the results to discover three genes potentially related to liver cancer. This and other similar studies considered only individual gene regulatory mechanisms, and therefore could not fully understand the complex interplay between different mechanisms.

Prof. Yip used a metaphor to explain the intricate relationships among gene regulatory mechanisms. He said, “If you fail to turn on an electronic appliance using a remote controller, it seems like there is a problem with the controller, but the problem may also lie with the receiver or compatibility issues between the two. If we have a tool that can analyse the different components at the same time, it would be much easier to identify the root cause of the problem.”

The GEEK framework proposed by Prof. Yip’s team makes use of machine learning and natural language processing methods, treating genes as “words” to capture their relationships in “sentences”. In the published study, GEEK was used to study several diverse gene regulatory mechanisms, including contacts in three-dimensional genome architecture, protein interactions, genomic neighborhoods and broad chromatin accessibility domains. The results showed that gene expression could be better explained when these mechanisms were modeled together than when they were considered separately.

Cancer is caused by mutations that lead to abnormal cell proliferation. “GEEK represents a novel way to study gene expression in different types of cells, including cancer cells,” said Prof Yip. “We will work closely with medical experts to try explaining some causes of liver cancer using GEEK. In the long run, we hope to extend our research to other cancer types and contribute to the development of new prevention and treatment methods.”

Among cancer treatments, immunotherapies are receiving a lot of attention due to their much greater efficacy in some cancer types. Yet the treatment outcome varies from patient to patient. Prof. Yip hopes that artificial intelligence can be used in the future to predict patients’ responses to immunotherapies, which would improve treatment precision and reduce the burden on patients.

The research project was supported by the General Research Fund of the University Grants Council. Prof. Yip’s team took one and a half years to produce the results. In the area of gene regulation research, Prof. Yip has more than ten years of experience, and he was one of the first to use machine learning and natural language processing to study gene regulation.

 

本文轉載自【香港中文大學】2020年8月30日 新聞稿

香港中文大學(中大)計算機科學與工程學系的研究團隊,將機器學習和自然語言處理等人工智能技術應用於基因表達調控的研究,開發嶄新的「嵌入式基因表現框架」(Gene Expression Embedding frameworK,簡稱GEEK)。它可同時研究多種調控機制對基因表達的影響,突破以往只考慮單一或少量機制的傳統研究模式。論文已刊登於國際權威科學期刊Nature Machine Intelligence,研究成果或可延伸至探索癌症的成因及治療,推動醫學發展。

人體有數十兆個細胞,儘管它們都有相同的染色體,卻有著截然不同的基因表達。這些基因表達受多種機制調控,包括轉錄因子、蛋白質之間交互作用等。於2017年,中大計算機科學與工程學系葉旭立教授帶領其團隊集中研究基因組中的增強子(Enhancer)和基因的數據,發現兩者間的特定規律,並將其應用在肝癌研究中,找出可能誘發肝癌的三組基因。然而,這項研究以及其他相關研究只考慮了少量基因調控機制,無法全面了解各種機制之間的複雜相互作用。

葉旭立教授表示:「過去的研究大多只針對個別基因調控機制,但事實上各種機制之間會相互影響,存在十分微妙的關係——就如一部電器不能透過遙控器開機,表面上看來是遙控器出問題,但亦可以是接收器的問題,或者遙控器與電器不相容等。若能用一種工具處理及分析多個不同機制,便能更容易掌握問題根本所在。」

葉教授的研究團隊提出利用人工智能領域的機器學習和自然語言處理技術,把基因當作文字看待,再選取重要數據分析。研究團隊建立的GEEK框架,可同時研究基因表達和多種調節機制之間的關係,包括在三維基因組架構中不同DNA的互為接觸、蛋白質之間的相互作用、基因組鄰域、染色質的廣泛可及性等。結果顯示,當所有數據在GEEK框架下運算,整合多種基因機制,所顯現出的基因調控規律,跟採用單一或少量機制的運算效果比較,明顯更全面及完整。

基因變異會導致細胞不正常增生,是導致癌症的原因。葉旭立教授表示:「GEEK框架是探索基因調控的一個全新發明,我們將聯同醫學專家繼續研究,套用GEEK去了解誘發肝癌的成因,之後再擴展至其他癌症研究上,盼協助醫學界找出癌症的成因,從而開發出更有效的預防和治療方法。」癌症治療方法推陳出新,免疫治療是當中的「新寵」。其成效雖於個別癌症中遠比傳統療法為佳,但治療效果因人而異,不是對每位用藥者都有效。葉教授表示:「展望將來,科學家可善用人工智能,準確推算每位患者對免疫療法的反應,提升用藥的準確度及減輕病人的痛苦和試藥的負擔。」

是項研究獲得大學教育資助委員會的優配研究金(GRF)資助,團隊花了一年半時間便取得突破性成果。在基因表達調控研究的領域中,葉教授累積逾十年的研究經驗,亦是首批參考機器學習及自然語言處理模式來進行研究的團隊。

A schematic showing how to use machine learning and natural language processing investigate multiple gene regulatory mechanisms.
A schematic showing how to use machine learning and natural language processing investigate multiple gene regulatory mechanisms. 上圖顯示如何透過機器學習和自然語言處理技術研究多種基因調控機制。