Research Summaries Written by AI Fool Scientists

2023-01-15
关注

An artificial-intelligence (AI) chatbot can write such convincing fake research-paper abstracts that scientists are often unable to spot them, according to a preprint posted on the bioRxiv server in late December1. Researchers are divided over the implications for science.

“I am very worried,” says Sandra Wachter, who studies technology and regulation at the University of Oxford, UK, and was not involved in the research. “If we’re now in a situation where the experts are not able to determine what’s true or not, we lose the middleman that we desperately need to guide us through complicated topics,” she adds.

The chatbot, ChatGPT, creates realistic and intelligent-sounding text in response to user prompts. It is a ‘large language model’, a system based on neural networks that learn to perform a task by digesting huge amounts of existing human-generated text. Software company OpenAI, based in San Francisco, California, released the tool on 30 November, and it is free to use.

Since its release, researchers have been grappling with the ethical issues surrounding its use, because much of its output can be difficult to distinguish from human-written text. Scientists have published a preprint2 and an editorial3 written by ChatGPT. Now, a group led by Catherine Gao at Northwestern University in Chicago, Illinois, has used ChatGPT to generate artificial research-paper abstracts to test whether scientists can spot them.

The researchers asked the chatbot to write 50 medical-research abstracts based on a selection published in JAMAThe New England Journal of MedicineThe BMJThe Lancet and Nature Medicine. They then compared these with the original abstracts by running them through a plagiarism detector and an AI-output detector, and they asked a group of medical researchers to spot the fabricated abstracts.

Under the radar

The ChatGPT-generated abstracts sailed through the plagiarism checker: the median originality score was 100%, which indicates that no plagiarism was detected. The AI-output detector spotted 66% the generated abstracts. But the human reviewers didn't do much better: they correctly identified only 68% of the generated abstracts and 86% of the genuine abstracts. They incorrectly identified 32% of the generated abstracts as being real and 14% of the genuine abstracts as being generated.

“ChatGPT writes believable scientific abstracts,” say Gao and colleagues in the preprint. “The boundaries of ethical and acceptable use of large language models to help scientific writing remain to be determined.”

Wachter says that, if scientists can’t determine whether research is true, there could be “dire consequences”. As well as being problematic for researchers, who could be pulled down flawed routes of investigation, because the research they are reading has been fabricated, there are “implications for society at large because scientific research plays such a huge role in our society”. For example, it could mean that research-informed policy decisions are incorrect, she adds.

But Arvind Narayanan, a computer scientist at Princeton University in New Jersey, says: “It is unlikely that any serious scientist will use ChatGPT to generate abstracts.” He adds that whether generated abstracts can be detected is “irrelevant”. “The question is whether the tool can generate an abstract that is accurate and compelling. It can’t, and so the upside of using ChatGPT is minuscule, and the downside is significant,” he says.

Irene Solaiman, who researches the social impact of AI at Hugging Face, an AI company with headquarters in New York and Paris, has fears about any reliance on large language models for scientific thinking. “These models are trained on past information and social and scientific progress can often come from thinking, or being open to thinking, differently from the past,” she adds.

The authors suggest that those evaluating scientific communications, such as research papers and conference proceedings, should put policies in place to stamp out the use of AI-generated texts. If institutions choose to allow use of the technology in certain cases, they should establish clear rules around disclosure. Earlier this month, the Fortieth International Conference on Machine Learning, a large AI conference that will be held in Honolulu, Hawaii, in July, announced that it has banned papers written by ChatGPT and other AI language tools.

Solaiman adds that in fields where fake information can endanger people’s safety, such as medicine, journals may have to take a more rigorous approach to verifying information as accurate.

Narayanan says that the solutions to these issues should not focus on the chatbot itself, “but rather the perverse incentives that lead to this behaviour, such as universities conducting hiring and promotion reviews by counting papers with no regard to their quality or impact”.

This article is reproduced with permission and was first published on January 12 2023.

参考译文
由AI傻瓜科学家撰写的研究摘要
据bioRxiv服务器12月底发布的一份预打印报告显示,一个人工智能(AI)聊天机器人可以写出如此令人信服的假研究论文摘要,以至于科学家往往无法发现它们。研究人员对这对科学的影响存在分歧。“我非常担心,”Sandra Wachter说,她在英国牛津大学研究技术和监管,并没有参与这项研究。她补充说:“如果我们现在处于专家无法判断真假的情况下,我们就失去了我们迫切需要的中间人,他们可以在复杂的话题中指导我们。”聊天机器人ChatGPT根据用户提示创建现实且听起来智能的文本。这是一个“大型语言模型”,是一个基于神经网络的系统,通过消化大量现有的人类生成的文本来学习执行一项任务。位于加州旧金山的软件公司OpenAI于11月30日发布了这款工具,并且免费使用。自从它发布以来,研究人员一直在努力解决围绕其使用的伦理问题,因为它的大部分输出很难与人类书写的文本区分开来。科学家们已经发表了一篇预印本和一篇由ChatGPT撰写的社论。现在,伊利诺斯州芝加哥市西北大学凯瑟琳·高领导的一个小组已经使用ChatGPT生成人工研究论文摘要,以测试科学家是否能发现它们。研究人员要求聊天机器人根据发表在《美国医学会杂志》、《新英格兰医学杂志》、《英国医学杂志》、《柳叶刀》和《自然医学》上的精选文章撰写50篇医学研究摘要。然后,他们通过剽窃检测器和人工智能输出检测器将这些摘要与原始摘要进行比较,并要求一组医学研究人员找出捏造的摘要。chatgpt生成的摘要顺利通过了剽窃检查:原创性得分中值为100%,这表明没有检测到抄袭。人工智能输出检测器发现了66%的生成摘要。但是人工审稿人并没有做得更好:他们只正确识别了68%的生成摘要和86%的真正摘要。他们错误地将32%的生成摘要识别为真实摘要,14%的真实摘要识别为生成摘要。“ChatGPT写出了可信的科学摘要,”高和他的同事在预印本中说。“使用大型语言模型来帮助科学写作的道德和可接受的界限仍有待确定。”Wachter说,如果科学家不能确定研究是否正确,可能会有“可怕的后果”。这不仅会给研究人员带来问题,因为他们阅读的研究是捏造的,他们可能会被拉下有缺陷的调查路线,而且“对整个社会也有影响,因为科学研究在我们的社会中扮演着如此巨大的角色”。她补充说,例如,这可能意味着基于研究的政策决定是不正确的。但是新泽西州普林斯顿大学的计算机科学家Arvind Narayanan说:“任何严肃的科学家都不太可能使用ChatGPT来生成摘要。”他补充说,生成的摘要是否能被检测到是“无关紧要的”。“问题在于,该工具能否生成准确而令人信服的摘要。它不能,所以使用ChatGPT的好处是微乎其微的,而缺点是显著的,”他说。艾琳·索莱曼(Irene Solaiman)在总部位于纽约和巴黎的人工智能公司hug Face研究人工智能的社会影响,她对依赖大型语言模型进行科学思考感到担忧。她补充说:“这些模型是根据过去的信息进行训练的,而社会和科学的进步往往来自于与过去不同的思维,或开放思维。” 这组作者建议,那些评估科学传播的人,比如研究论文和会议记录,应该制定政策,杜绝使用人工智能生成的文本。如果机构选择允许在某些情况下使用该技术,他们应该就披露建立明确的规则。本月早些时候,将于7月在夏威夷檀香山举行的第40届国际机器学习大会(第40届国际机器学习大会是一场大型AI会议)宣布禁止使用ChatGPT和其他AI语言工具撰写的论文。Solaiman补充说,在假信息可能危及人们安全的领域,比如医学,期刊可能必须采取更严格的方法来验证信息的准确性。Narayanan说,这些问题的解决方案不应该集中在聊天机器人本身,“而应该集中在导致这种行为的不正当动机上,比如大学在进行招聘和晋升审查时,只计算论文数量,而不考虑它们的质量或影响”。本文经许可转载,首次发表于2023年1月12日。
  • en
您觉得本篇内容如何
评分

相关产品

EN 650 & EN 650.3 观察窗

EN 650.3 version is for use with fluids containing alcohol.

Acromag 966EN 温度信号调节器

这些模块为多达6个输入通道提供了一个独立的以太网接口。多量程输入接收来自各种传感器和设备的信号。高分辨率,低噪音,A/D转换器提供高精度和可靠性。三路隔离进一步提高了系统性能。,两种以太网协议可用。选择Ethernet Modbus TCP\/IP或Ethernet\/IP。,i2o功能仅在6通道以太网Modbus TCP\/IP模块上可用。,功能

雷克兰 EN15F 其他

品牌;雷克兰 型号; EN15F 功能;防化学 名称;防化手套

Honeywell USA CSLA2EN 电流传感器

CSLA系列感应模拟电流传感器集成了SS490系列线性霍尔效应传感器集成电路。该传感元件组装在印刷电路板安装外壳中。这种住房有四种配置。正常安装是用0.375英寸4-40螺钉和方螺母(没有提供)插入外壳或6-20自攻螺钉。所述传感器、磁通收集器和壳体的组合包括所述支架组件。这些传感器是比例测量的。

TMP Pro Distribution C012EN RF 音频麦克风

C012E射频从上到下由实心黄铜制成,非常适合于要求音质的极端环境,具有非常坚固的外壳。内置的幻像电源模块具有完全的射频保护,以防止在800 Mhz-1.2 Ghz频段工作的GSM设备的干扰。极性模式:心形频率响应:50赫兹-18千赫灵敏度:-47dB+\/-3dB@1千赫

ValueTronics DLRO200-EN 毫欧表

"The DLRO200-EN ducter ohmmeter is a dlro from Megger."

评论

您需要登录才可以回复|注册

提交评论

scientific

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

数字鸿沟:物联网弥合鸿沟的两种方式

提取码
复制提取码
点击跳转至百度网盘