‘ChatGPT Detector’ Catches AI-Generated Papers with Unprecedented Accuracy

2023-11-11 15:01:48
关注

A machine-learning tool can easily spot when chemistry papers are written using the chatbot ChatGPT, according to a study published on 6 November in Cell Reports Physical Science. The specialized classifier, which outperformed two existing artificial intelligence (AI) detectors, could help academic publishers to identify papers created by AI text generators.

“Most of the field of text analysis wants a really general detector that will work on anything,” says co-author Heather Desaire, a chemist at the University of Kansas in Lawrence. But by making a tool that focuses on a particular type of paper, “we were really going after accuracy.”

The findings suggest that efforts to develop AI detectors could be boosted by tailoring software to specific types of writing, Desaire says. “If you can build something quickly and easily, then it’s not that hard to build something for different domains.”

The elements of style

Desaire and her colleagues first described their ChatGPT detector in June, when they applied it to Perspective articles from the journal Science. Using machine learning, the detector examines 20 features of writing style, including variation in sentence lengths, and the frequency of certain words and punctuation marks, to determine whether an academic scientist or ChatGPT wrote a piece of text. The findings show that “you could use a small set of features to get a high level of accuracy,” Desaire says.

In the latest study, the detector was trained on the introductory sections of papers from ten chemistry journals published by the American Chemical Society (ACS). The team chose the introduction because this section of a paper is fairly easy for ChatGPT to write if it has access to background literature, Desaire says. The researchers trained their tool on 100 published introductions to serve as human-written text, and then asked ChatGPT-3.5 to write 200 introductions in ACS journal style. For 100 of these, the tool was provided with the papers’ titles, and for the other 100, it was given their abstracts.

When tested on introductions written by people and those generated by AI from the same journals, the tool identified ChatGPT-3.5-written sections based on titles with 100% accuracy. For the ChatGPT-generated introductions based on abstracts, the accuracy was slightly lower, at 98%. The tool worked just as well with text written by ChatGPT-4, the latest version of the chatbot. By contrast, the AI detector ZeroGPT identified AI-written introductions with an accuracy of only about 35–65%, depending on the version of ChatGPT used and whether the introduction had been generated from the title or the abstract of the paper. A text-classifier tool produced by OpenAI, the maker of ChatGPT, also performed poorly — it was able to spot AI-written introductions with an accuracy of around 10–55%.

The new ChatGPT catcher even performed well with introductions from journals it wasn’t trained on, and it caught AI text that was created from a variety of prompts, including one aimed to confuse AI detectors. However, the system is highly specialized for scientific journal articles. When presented with real articles from university newspapers, it failed to recognize them as being written by humans.

Wider issues

What the authors are doing is “something fascinating,” says Debora Weber-Wulff, a computer scientist who studies academic plagiarism at the HTW Berlin University of Applied Sciences. Many existing tools try to determine authorship by searching for the predictive text patterns of AI-generated writing rather than by looking at features of writing style, she says. “I’d never thought of using stylometrics on ChatGPT.”

But Weber-Wulff points out that there are other issues driving the use of ChatGPT in academia. Many researchers are under pressure to quickly churn out papers, she notes, or they might not see the process of writing a paper as an important part of science. AI-detection tools will not address these issues, and should not be seen as “a magic software solution to a social problem.”

This article is reproduced with permission and was first published on January 27 2023.

参考译文
“ChatGPT检测器”以前所未有的准确率识别人工智能生成的论文
一项发表于11月6日《细胞报告:物理科学》(Cell Reports Physical Science)的研究显示,一种机器学习工具可以轻松识别出使用聊天机器人ChatGPT撰写的化学论文。该专用分类器的表现优于现有的两种人工智能(AI)检测工具,有助于学术出版商识别由AI文本生成器撰写的论文。堪萨斯大学劳伦斯分校的化学家、共同作者希瑟·德赛尔(Heather Desaire)表示:“文本分析领域的大多数人都希望有一个真正通用的检测器,能够适用于任何文本。”但通过研发一个专注于特定类型论文的工具,“我们追求的是准确性。”德赛尔表示,研究结果表明,通过为特定类型的写作定制软件,AI检测器的研发工作可以得到加强。她说:“如果你能快速轻松地开发出一种工具,那么为不同的领域开发类似工具也并不困难。”德赛尔和她的同事在6月首次描述了他们的ChatGPT检测器,当时他们将其应用于《科学》(Science)期刊上的观点文章(Perspective)。该检测器使用机器学习技术,分析20种写作风格特征,包括句子长度的变化、特定词汇和标点符号的频率,以判断一段学术文本是由研究人员还是ChatGPT撰写的。德赛尔表示,研究结果表明,“你只需要使用一小部分特征,就足以获得较高的准确性。”在最新研究中,该检测器在10种由美国化学学会(ACS)出版的化学期刊论文的引言部分上进行了训练。德赛尔表示,他们选择引言部分,因为如果ChatGPT能够访问背景文献,这一部分相对容易由其生成。研究人员用100篇已发表的引言作为人工撰写的文本进行训练,然后让ChatGPT-3.5以ACS期刊的风格撰写200篇引言。在这200篇中,有100篇提供了论文的标题,另外100篇则提供了摘要。在对人工撰写的引言与相同期刊AI生成的引言进行测试时,该工具基于标题识别ChatGPT-3.5撰写的部分准确率达到100%。而对于基于摘要生成的ChatGPT引言,其识别准确率为98%。该工具对最新版本ChatGPT-4撰写的文本也有相同的识别效果。相比之下,另一种AI检测工具ZeroGPT的识别准确率仅为35%至65%,具体取决于所使用的ChatGPT版本以及引言是基于标题还是摘要生成的。OpenAI(ChatGPT的开发商)开发的另一种文本分类工具表现也较差,识别AI撰写的引言准确率仅为10%至55%。这种新的ChatGPT检测工具不仅对训练所用的期刊引言有效,也对其他期刊的引言有效,并且能够识别由各种提示生成的AI文本,包括一些设计用来欺骗AI检测器的提示。然而,该系统高度专业化,仅适用于科学期刊文章。当面对大学报纸的真实文章时,它无法识别出这些文章是由人类撰写的。更广泛的问题柏林应用科学大学(HTW Berlin)研究学术剽窃的计算机科学家迪博拉·韦伯-沃尔夫(Debora Weber-Wulff)表示:“作者们所做的是一件非常引人注目的事。”她指出,许多现有工具试图通过搜索AI生成文本的预测性语言模式来判断作者身份,而不是通过分析写作风格的特征。“我从未想过将文体分析技术用于ChatGPT。”但韦伯-沃尔夫指出,学术界使用ChatGPT的背后还存在其他问题。她指出,许多研究人员面临快速发表论文的压力,或他们可能并不认为撰写论文的过程是科学研究的重要部分。AI检测工具无法解决这些问题,也不应被视为“解决社会问题的魔法软件方案”。本文经授权转载,首发于2023年1月27日。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

广告
提取码
复制提取码
点击跳转至百度网盘