‘We feel awful about this’ – OpenAI fixes ChatGPT bug that may have breached GDPR

2023-03-26
关注

OpenAI could be in breach of GDPR legislation after the titles assigned to users’ ChatGPT conversations were randomly exposed to other users without consent. The company described it as a “significant issue” with a third-party open-source library that has since been fixed. A legal expert said that any action would depend on the level of harm caused by the titles appearing in the account of another user, and what that information includes.

ChatGPT generates titles automatically for each chat session that can be adapted by the user (Photo: Ascannio / Shutterstock)
ChatGPT generates titles automatically for each chat session that can be adapted by the user. (Photo: Ascannio/Shutterstock)

Co-founder and CEO Sam Altman disclosed the problem on Twitter, saying: “we feel awful about this”.

we had a significant issue in ChatGPT due to a bug in an open source library, for which a fix has now been released and we have just finished validating.

a small percentage of users were able to see the titles of other users’ conversation history.

we feel awful about this.

— Sam Altman (@sama) March 22, 2023

In ChatGPT, when a new conversation with the chatbot is started a note is created in the sidebar and as the conversation goes on this is given an AI-generated title. The text can be changed by the user or the note deleted. A small group of users were shown other users’ titles by mistake.

Since its launch in November 2022, ChatGPT has become one of the fastest-growing consumer apps in history, hitting 100 million unique monthly users in January alone. It has sparked a flurry of activity with companies like Microsoft, a major investor in OpenAI, and Google launching their own chatbots and integrating generative AI tools into products.

It has also sparked calls for regulation and clarity on where the technology falls within legislation such as GDPR and the upcoming EU AI Act. ChatGPT is built on top of OpenAIs GPT-4 multi-modal large language model which was trained on data scraped from the internet, massive datasets from the likes of Wikipedia and law libraries and other information not disclosed by the company.

Altman says there will be a “technical postmortem” into what caused the glitch and information used in prompts and responses may be used in training the model but only after personally identifiable information has been removed.

Need for regulation of AI

Countries around the world are actively exploring the impact of this type of phenomenon and how to regulate for it and ensure user data is protected. The UK is also working on a new task force to examine the impact of large language models on society, the economy and individuals.

Content from our partners

How the logistics sector can address a shift in distribution models

Fashion brands must seek digital solutions that understand the sector’s unique needs

Fashion brands must seek digital solutions that understand the sector’s unique needs

Banks must better balance compliance with customer outreach

Banks must better balance compliance with customer outreach

Lillian Edwards, professor of law at Newcastle University, says the Information Commissioners Office (ICO) may examine the type of breach experienced by OpenAI to see if UK data was exposed. In the event of a breach, the regulator will most likely ask the company to ensure it doesn’t happen again rather than take any action. Tech Monitor has asked the ICO for comment.

View all newsletters Sign up to our newsletters Data, insights and analysis delivered to you By The Tech Monitor team

Caroline Carruthers, CEO and co-founder of Carruthers and Jackson, says protecting user data was a core requirement of any organisation, particularly a data-rich organisation like OpenAI and breaches such as this could erode confidence in its business. Worse, she said, it also highlights the potential data pitfalls of AI.

“Platforms like ChatGPT rely on user data to function, but acquiring that data means users have to be able to trust that their information will be secure,” Carruthers says. “This should serve as a lesson to be learned to other businesses looking to utilise AI: you need to get your data governance basics right before you can graduate on to AI and ML.”

Ali Vaziri, legal director in the data and privacy team at Lewis Silkin said the issue with the AI titles being shared with other users and whether it is a data protection issue depends on whether the original user can be identified from the titles alone. “If the only information available to those other users are the conversation history titles, unless the titles themselves contain information from which the original user can be identified, it probably won’t be a personal data breach as far as a loss of confidentiality is concerned.”

Even if the titles were to contain personally identifiable information, whether it becomes a regulatory issue would depend on the level of harm. “If harm to users is likely, then that will be the trigger for any regulatory notifications which might need to be made,” said Vaziri.

“However, data protection law also requires controllers to ensure the accuracy of personal data they process, so displaying the wrong conversation history titles to a user might amount to a breach of that principle; and since doing so may have affected the integrity of personal data in that user’s account, the incident might constitute a personal data breach on that basis,” he added.

Data privacy and control

Vlad Tushkanov, lead data Scientist at Kaspersky told Tech Monitor users should have had “zero expectation of privacy” as OpenAI warns that any conversation could be viewed by AI trainers and urges users not to share any sensitive information in conversations. He urged users to “treat any interaction with a chatbot (or any other service, for that matter) as a conversation with a complete stranger: you don’t know where the content will end up, so refrain from revealing any personal or sensitive information about yourself or other people.”

Despite the warnings, some users have responded to Altman on Twitter claiming they had titles that included personal and “highly sensitive” information. The bigger issue, says Edwards, is the potential for sensitive information scraped from the internet to leak out in responses.

“It is well known these models leak personal data like sieves,” she warned, adding that “their training datasets contained infinite amounts of personal and often sensitive data and it may emerge randomly in response to a prompt at any point.”

Read more: These companies are creating ChatGPT alternatives

Topics in this article : AI , ChatGPT , OpenAI

参考译文
“我们对此感到很糟糕”——OpenAI修复了可能违反GDPR的ChatGPT漏洞
OpenAI可能违反了GDPR法规,因为分配给用户ChatGPT对话的标题在未经同意的情况下随机暴露给其他用户。该公司将其描述为第三方开源库的一个“重大问题”,该问题已被修复。一位法律专家表示,任何行动都将取决于其他用户账户中出现的标题所造成的损害程度,以及这些信息包括什么。联合创始人兼首席执行官山姆·奥特曼在推特上披露了这一问题,他说:“我们对此感到很糟糕。”我们在ChatGPT中遇到了一个重大问题,原因是一个开源库中的bug,现在已经发布了修复程序,我们刚刚完成了验证。一小部分用户能够看到其他用户的对话历史记录的标题。我们对此感到很难过。在ChatGPT中,当与聊天机器人开始新的对话时,侧边栏中会创建一个注释,随着对话的进行,会给出一个人工智能生成的标题。用户可以更改文本,也可以删除注释。一小群用户被错误地展示了其他用户的标题。自2022年11月推出以来,ChatGPT已成为历史上增长最快的消费应用之一,仅在1月份就达到了1亿独立用户。OpenAI的主要投资者微软(Microsoft)和谷歌等公司推出了自己的聊天机器人,并将生成式人工智能工具集成到产品中,这引发了一系列活动。这也引发了要求监管和明确该技术在GDPR和即将出台的欧盟人工智能法案等立法范围内的呼声。ChatGPT是建立在OpenAIs GPT-4多模态大型语言模型之上的,该模型是基于从互联网上抓取的数据、维基百科和法律图书馆等海量数据集以及其他公司未公开的信息进行训练的。奥特曼表示,将对导致故障的原因进行“技术事后分析”,提示和响应中使用的信息可能会用于训练模型,但必须在删除个人身份信息之后。世界各国都在积极探索这类现象的影响,以及如何对此进行监管,确保用户数据受到保护。英国还在组建一个新的工作组,研究大型语言模型对社会、经济和个人的影响。纽卡斯尔大学(Newcastle University)法学教授莉莲•爱德华兹(Lillian Edwards)表示,英国信息专员办公室(ICO)可能会调查OpenAI遭遇的入侵类型,以确定英国的数据是否被泄露。一旦数据泄露,监管机构很可能会要求该公司确保此类事件不再发生,而不是采取任何行动。Tech Monitor已向ICO征询意见。Carruthers and Jackson首席执行官兼联合创始人卡罗琳•卡拉瑟斯(Caroline Carruthers)表示,保护用户数据是任何组织的核心要求,尤其是像OpenAI这样数据丰富的组织,这样的违规行为可能会削弱人们对其业务的信心。她说,更糟糕的是,这也凸显了人工智能潜在的数据陷阱。卡拉瑟斯说:“像ChatGPT这样的平台依赖于用户数据来运行,但获得这些数据意味着用户必须能够相信他们的信息是安全的。“这应该成为其他希望利用人工智能的企业可以吸取的教训:在你能够顺利进入人工智能和机器学习之前,你需要掌握好数据治理的基础知识。”Lewis Silkin数据和隐私团队的法律总监阿里·瓦齐里(Ali Vaziri)表示,人工智能标题与其他用户共享的问题,以及这是否是数据保护问题,取决于能否仅从标题中识别出原始用户。“如果其他用户唯一可以获得的信息是对话历史标题,除非标题本身包含可以识别原始用户的信息,那么就保密性损失而言,这可能不会是个人数据泄露。” 即使这些标题包含个人身份信息,它是否成为监管问题将取决于损害的程度。瓦齐里说:“如果可能对用户造成伤害,那么这将触发可能需要发出的任何监管通知。”“然而,数据保护法还要求控制者确保他们处理的个人数据的准确性,因此向用户显示错误的对话历史标题可能违反了这一原则;由于这样做可能会影响该用户帐户中个人数据的完整性,因此该事件可能构成个人数据泄露,”他补充说。卡巴斯基首席数据科学家弗拉德·图什卡诺夫告诉Tech Monitor, OpenAI警告说,任何对话都可能被人工智能培训师看到,并敦促用户不要在对话中分享任何敏感信息,用户应该对隐私抱有“零期望”。他敦促用户“将与聊天机器人(或任何其他服务)的任何互动视为与一个完全陌生的人的对话:你不知道内容会在哪里结束,所以不要透露任何关于你自己或其他人的个人或敏感信息。”尽管有警告,一些用户还是在推特上回应奥尔特曼,称他们的标题中包含个人和“高度敏感”的信息。爱德华兹说,更大的问题是,从互联网上抓取的敏感信息可能会被泄露出来。她警告说:“众所周知,这些模型像筛子一样泄露个人数据。”她补充说,“它们的训练数据集包含无限多的个人数据,而且往往是敏感数据,这些数据可能在任何时候随提示随机出现。”
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

techmonitor

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

文心一言能带百度起飞吗?

提取码
复制提取码
点击跳转至百度网盘