AI Tool Pinpoints Genetic Mutations That Cause Disease

2023-10-07 02:13:24
关注

Google DeepMind has wielded its revolutionary protein-structure-prediction AI in the hunt for genetic mutations that cause disease.

A new tool based on the AlphaFold network can accurately predict which mutations in proteins are likely to cause health conditions — a challenge that limits the use of genomics in healthcare.

The AI network — called AlphaMissense — is a step forward, say researchers who are developing similar tools, but not necessarily a sea change. It is one of many techniques in development that aim to help researchers, and ultimately physicians, to ‘interpret’ people’s genomes to find the cause of a disease. But tools such as AlphaMissense — which is described in a 19 September paper in Science — will need to undergo thorough testing before they are used in the clinic.

Many of the genetic mutations that directly cause a condition, such as those responsible for cystic fibrosis and sickle-cell disease, tend to change the amino acid sequence of the protein they encode. But researchers have observed only a few million of these single-letter ‘missense mutations’. Of the more than 70 million possible in the human genome, only a sliver have been conclusively linked to disease, and most seem to have no ill effect on health.

So when researchers and doctors find a missense mutation they’ve never seen before, it can be difficult to know what to make of it. To help interpret such ‘variants of unknown significance,’ researchers have developed dozens of different computational tools that can predict whether a variant is likely to cause disease. AlphaMissense incorporates existing approaches to the problem, which are increasingly being addressed with machine learning.

Locating mutations

The network is based on AlphaFold, which predicts a protein structure from an amino-acid sequence. But instead of determining the structural effects of a mutation — an open challenge in biology — AlphaMissense uses AlphaFold’s ‘intuition’ about structure to identify where disease-causing mutations are likely to occur within a protein, Pushmeet Kohli, DeepMind’s vice-president of Research and a study author, said at a press briefing.

AlphaMissense also incorporates a type of neural network inspired by large language models like ChatGPT that has been trained on millions of protein sequences instead of words, called a protein language model. These have proven adept at predicting protein structures and designing new proteins. They are useful for variant prediction because they have learned which sequences are plausible and which are not, Žiga Avsec, the DeepMind research scientist who co-led the study, told journalists.

DeepMind’s network seems to outperform other computational tools at discerning variants known to cause disease from those that don’t. It also does well at spotting problem variants identified in laboratory experiments that measure the effects of thousands of mutations at once. The researchers also used AlphaMissense to create a catalogue of every possible missense mutation in the human genome, determining that 57% are likely to be benign and that 32% may cause disease.

Clinical support

AlphaMissense is an advance over existing tools for predicting the effects of mutations, “but not a gigantic leap forward,” says Arne Elofsson, a computational biologist at the University of Stockholm.

Its impact won’t be as significant as AlphaFold, which ushered in a new era in computational biology, agrees Joseph Marsh, a computational biologist at the MRC Human Genetics Unit in Edinburgh, UK. “It’s exciting. It’s probably the best predictor we have right now. But will it be the best predictor in two or three years? There’s a good chance it won’t be.”

Computational predictions currently have a minimal role in diagnosing genetic diseases, says Marsh, and recommendations from physicians’ groups say that these tools should provide only supporting evidence in linking a mutation to a disease. AlphaMissense confidently classified a much larger proportion of missense mutations than have previous methods, says Avsec. “As these models get better than I think people will be more inclined to trust them.”

Yana Bromberg, a bioinformatician at Emory University in Atlanta, Georgia, emphasizes that tools such as AlphaMissense must be rigorously evaluated — using good performance metrics — before ever being applied in the real-world.

For example, an exercise called the Critical Assessment of Genome Interpretation (CAGI) has benchmarked the performance of such prediction methods for years against experimental data that has not yet been released. “It’s my worst nightmare to think of a doctor taking a prediction and running with it, as if it’s a real thing, without evaluation by entities such as CAGI,” Bromberg adds.

This article is reproduced with permission and was first published on September 19, 2023.

参考译文
人工智能工具精准识别导致疾病的基因突变
谷歌DeepMind利用其革命性的蛋白质结构预测人工智能,寻找引发疾病的基因突变。一项基于AlphaFold网络的新工具能够准确预测蛋白质中的哪些突变可能引起健康问题——这是基因组学在医疗领域应用受限的一大挑战。研究人员表示,这种名为AlphaMissense的AI网络是一个进步,但不一定是颠覆性的变革。它只是众多正在开发的工具之一,旨在帮助研究人员,最终也帮助医生“解读”人类基因组以找到疾病的根源。但像AlphaMissense这样的工具在被应用于临床之前,还需要进行严格测试。许多直接导致疾病的基因突变,例如那些导致囊性纤维化和镰状细胞贫血的突变,往往会改变其所编码蛋白质的氨基酸序列。然而,研究人员只观察到了几百万个这样的单字母“错义突变”。在人类基因组中可能存在的7000多万个突变中,只有极小一部分已被明确与疾病相关联,而大多数似乎对健康没有不良影响。因此,当研究人员或医生发现一个从未见过的错义突变时,常常很难判断它意味着什么。为了帮助解读这些“意义未明的变异”,研究人员开发了几十种不同的计算工具,可以预测某个变异是否可能引发疾病。AlphaMissense整合了现有的方法,而这些方法正越来越多地借助机器学习来解决。DeepMind的副总裁兼研究负责人Pushmeet Kohli在一次媒体简报中表示,AlphaMissense的网络基于AlphaFold,后者可以根据氨基酸序列预测蛋白质结构。但AlphaMissense并没有像生物学中一个长期挑战那样,去判定突变对结构的影响,而是利用AlphaFold对结构的“直觉”来识别蛋白质中可能引发疾病的突变位置。AlphaMissense还整合了一种神经网络,这种网络的灵感来源于像ChatGPT这样的大型语言模型,但它训练的是数百万个蛋白质序列,而不是单词,这种模型被称为蛋白质语言模型。这些模型已被证明在预测蛋白质结构和设计新蛋白质方面非常出色。DeepMind的研究科学家、该研究的共同负责人Žiga Avsec告诉媒体记者,这类模型之所以对变异预测有用,是因为它们已经学会判断哪些序列是合理的,哪些不合理。DeepMind的网络在识别已知会导致疾病的变异方面似乎优于其他计算工具。它在实验室实验中也表现优异,这些实验能够同时测量数千个突变的影响。研究人员还使用AlphaMissense创建了人类基因组中所有可能的错义突变的目录,并判断出其中57%可能是无害的,32%可能导致疾病。在临床支持方面,斯德哥尔摩大学的计算生物学家Arne Elofsson表示,AlphaMissense相较于现有预测突变影响的工具是一个进步,但“并非巨大的飞跃”。爱丁堡大学MRC人类遗传学中心的计算生物学家Joseph Marsh同意,它的影响不会像AlphaFold那样巨大,后者开创了计算生物学的新时代。“这令人兴奋。它可能是我们目前最好的预测工具。但两年或三年后,它仍会是最好的吗?可能性不大。”Marsh表示,目前计算预测在诊断遗传病中所起的作用很小,医学界团体的建议也指出,这些工具只能作为将突变与疾病联系起来的辅助证据。Avsec指出,AlphaMissense能够自信地分类的错义突变比例比以往方法大得多。他说:“随着这些模型越来越好,我认为人们会越来越信任它们。”亚特兰大埃默里大学的生物信息学家Yana Bromberg强调,像AlphaMissense这样的工具必须在应用于现实世界之前,经过严格评估——使用良好的性能指标进行评估。例如,一项名为“基因组解读关键评估”(CAGI)的练习多年来一直在未发布的实验数据上对这类预测方法的性能进行基准测试。Bromberg补充道:“我最可怕的噩梦,就是医生在没有CAGI等机构评估的情况下,就盲目地将预测结果当作事实来使用。” 本文经授权转载,首次发布于2023年9月19日。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

广告

scientific

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

谷歌新研究:让AI替代人类训练AI?

提取码
复制提取码
点击跳转至百度网盘