Meta’s new Sphere AI tool will check the accuracy of Wikipedia entries

2022-07-19
关注

Researchers at Meta have created a new artificial intelligence tool, Sphere, that can access 134 million web pages and use them as a knowledge base for building AI systems. The first organisation to use the tool, which is being made available on an open-source licence, is Wikipedia, which will deploy it to scan hundreds of thousands of citations on the online encyclopedia to check they support the corresponding claims.

Meta’s new Sphere AI tool is being used by Wikipedia to check citations. (Photo by brightstars/iStock)

The dataset that can be accessed through Sphere is an order of magnitude larger than any previously released for AI research, Meta claims. For Wikipedia, it will call attention to questionable citations, highlighting those that human editors need to evaluate and change. If the citation proves irrelevant, the model can also suggest other applicable sources of information that do back up the claim in the text.

How will Meta Sphere work with Wikipedia?

Wikipedia relies heavily on citations written in the footnotes of its articles to verify claims made within the text. Its 6.5 million articles are updated regularly by volunteers, and sometimes the citations don’t back up the claims being made.

Using Sphere, Meta says the goal is to provide a platform for Wikipedia editors that can “systematically spot citation issues” and correct the citation or the content of the corresponding article at scale – rather than requiring manual trawls post-by-post.

Related

AI and automation

UK’s ‘collaborative’ approach to AI regulation may prove complex and burdensome

AI and automation

UK government to set out AI regulation plans

AI and automation

IBM acquires Databand to improve machine learning models

AI and automation

Vodafone pens ‘risky’ AI deal with Google Cloud

The tools are built on the back of an existing Meta AI model that integrates information retrieval and verification. It involved training neural networks to learn more nuanced representations of language to pinpoint source material. The latest changes to this involved significantly increasing the size of the pool of data the model can draw from.

This new version of the model, Sphere, references up to 134 million web pages. For Wikipedia, Meta fed it with four million claims from the online encyclopedia, teaching it to zero in on a single source from the vast pool to validate each statement.

The index produced in this process passes potential sources for a Wikipedia article on to an evidence-ranking model that compares the text to the citation and determines whether the citation matches and is a viable option for inclusion in the footnotes.

Content from our partners

How can digital technologies and big data transform the travel experience?

How clinical trials infrastructure is undergoing digital transformation

Webinar – Top 3 Ways to Build Security into DevOps

“Usually, to develop models like this, the input might be just a sentence or two,” a Meta statement said. “We trained our models with complicated statements from Wikipedia, accompanied by full websites that may or may not support the claims. As a result, our models have achieved a leap in performance in terms of detecting the accuracy of citations.”

The implications of Sphere for Meta and Wikimedia Enterprise

All of this work acts to improve Sphere and will in turn potentially allow for new AI systems that can make sense of the real world, according to Meta.

“Open source projects like these, which teach algorithms to understand dense material with an ever-higher degree of sophistication, help AI make sense of the real world,” a Meta blog post said. “While we can’t yet design a computer system that has a human-level comprehension of language, our research creates smarter, more flexible algorithms. This improvement will only become more important as we rely on computers to interpret the surging volume of text citations generated each day.”

Indeed, the company also argues that the breadth of sources used by Sphere means it provides more accurate results than other comparable systems. “Because Sphere can access far more public information than today’s standard models, it could provide useful information that they cannot,” the blog post added.

Data, insights and analysis delivered to you View all newsletters By The Tech Monitor team Sign up to our newsletters

For the Wikimedia Foundation, the non-profit organisation which oversees Wikipedia, ensuring accuracy is more important than ever before. Last month it launched Wikimedia Enterprise, a commercial product for businesses that require a high level of access to its databases.

Shani Evenstein Sigalov, a lecturer and researcher at Tel Aviv University, and vice-chair of the Wikimedia Foundation’s board of trustees described the new technique as a “powerful example of machine learning tools that can help scale the work of volunteers.”

“Improving these processes will allow us to attract new editors to Wikipedia and provide better, more reliable information to billions of people around the world,” he says.

Read more: Knowledge management is the fastest-growing area of AI spend

Topics in this article: Meta

参考译文
Meta的新Sphere人工智能工具将检查维基百科条目的准确性
Meta的研究人员开发了一款新的人工智能工具Sphere,它可以访问1.34亿个网页,并将其用作构建人工智能系统的知识库。第一个使用该工具的组织是维基百科(Wikipedia),该工具以开源许可的形式提供。维基百科将使用该工具扫描在线百科全书上的数十万条引文,以检查它们是否支持相应的声明。Meta声称,可以通过Sphere访问的数据集比之前发布的任何人工智能研究数据都要大一个数量级。对于维基百科来说,它将唤起人们对可疑引文的关注,突出那些需要人类编辑进行评估和修改的引文。如果引文证明不相关,该模型还可以建议其他适用的信息来源,以支持文本中的声明。维基百科在很大程度上依赖于其文章脚注中的引用来验证文本中的主张。它的650万篇文章由志愿者定期更新,有时引用并不支持所做的声明。Meta说,使用Sphere的目标是为维基百科编辑提供一个平台,可以“系统地发现引用问题”,并大规模地纠正引用或相应文章的内容,而不是需要人工逐篇搜索。这些工具建立在现有的Meta AI模型的基础上,该模型集成了信息检索和验证。它包括训练神经网络,以学习更细微的语言表征,从而精确地找到原始材料。对此的最新更改涉及显著增加模型可以从中提取的数据池的大小。该模型的新版本Sphere引用了多达1.34亿个网页。对于维基百科,Meta提供了来自在线百科全书的400万条声明,教它从庞大的信息库中锁定一个来源,以验证每个声明。在这一过程中产生的索引将维基百科文章的潜在来源传递给一个证据排名模型,该模型将文本与引文进行比较,并确定引文是否匹配,是一个纳入脚注的可行选择。“通常,开发这样的模型,输入可能只是一两句话,”Meta声明说。“我们用来自维基百科的复杂语句训练我们的模型,并配以可能支持也可能不支持这些声明的完整网站。因此,我们的模型在检测引用准确性方面取得了性能上的飞跃。Meta表示:“所有这些工作都是为了改进Sphere,进而可能会让新的AI系统能够理解现实世界。Meta博客上的一篇文章说:“像这样的开源项目,教算法以越来越复杂的程度理解密集的材料,帮助AI理解现实世界。”“虽然我们还不能设计出具有人类对语言理解能力的计算机系统,但我们的研究创造了更智能、更灵活的算法。随着我们依赖计算机来解读每天激增的文本引用量,这种改进只会变得更加重要。事实上,该公司还认为,Sphere使用的数据源的广度意味着它能提供比其他可比系统更准确的结果。“因为Sphere可以获得比今天的标准模型多得多的公共信息,它可以提供它们无法提供的有用信息,”这篇博文补充说。对于维基媒体基金会(负责监督维基百科的非盈利组织)来说,确保准确性比以往任何时候都重要。上个月,它推出了维基企业版(Wikimedia Enterprise),这是一款面向需要对其数据库进行高级访问的企业的商业产品。 特拉维夫大学(Tel Aviv University)讲师兼研究员、维基媒体基金会(Wikimedia Foundation)董事会副主席沙尼·埃文斯坦·西加洛夫(Shani Evenstein Sigalov)称,这项新技术是“机器学习工具的有力例子,可以帮助扩大志愿者的工作规模。”他表示:“改进这些流程将使我们能够吸引维基百科的新编辑,并为世界各地数十亿人提供更好、更可靠的信息。”
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

提取码
复制提取码
点击跳转至百度网盘