Databricks open-sources its Dolly large language AI model

2023-03-29
关注

In an attempt to open up its technology to a wider audience, enterprise software company Databricks has released Dolly, a large language model and its associated training code under an open-source licence. Despite being based on a much smaller underlying model, the company says it has ChatGPT-like functionality and can be run “in-house”.

Databricks says it was able to achieve similar chat-like functionality from an older, smaller language model (Photo: rarrarorro / Shutterstock)
Databricks says it was able to achieve similar chat-like functionality from an older, smaller language model. (Photo: rarrarorro/Shutterstock)

The move was inspired by the success of OpenAI’s natural language platform ChatGPT, which became one of the fastest-growing consumer apps within a couple of months of its release in November last year. It has since caused some of the world’s largest companies including Microsoft and Google to pivot and release generative and natural language AI tools.

“We show that anyone can take a dated off-the-shelf open source LLM and give it magical ChatGPT-like instruction-following ability by training it in 30 minutes on one machine, using high-quality training data,” Databricks wrote in a blog post explaining the decision.

It found that the type of instruction-following used in ChatGPT “does not seem to require the latest or largest models”, and claims that from just six billion parameters, compared to 175 billion in GPT-3 and many more in GPT-4 or Google’s PaLM, it was able to recreate the functionality of ChatGPT.

“We believe models like Dolly will help democratise LLMs, transforming them from something very few companies can afford into a commodity every company can own and customise to improve their products,” the company said.

Large language models: from LLaMA to Alpaca to Dolly

Developers like OpenAI, Anthropic, AI21 Labs, as well as Microsoft, Google and IBM charge end-users for access to their large language models through API calls. This can become expensive very quickly if you need to make a lot of calls on a regular basis. Alternatively, training those same models is an expensive endeavour that takes hundreds of GPU hours and trillions of words from datasets.

Then Meta released the weights for its high-quality language model, LLaMA, to researchers. It had been trained using more than 80,000 GPU hours, with Stanford University-built Alpaca, on top of LLaMA, tuned to a subset of 50,000 human-like questions and answers. This led to it exhibiting ChatGPT-like functionality from a relatively small training dataset.

Dolly, from Databricks is able to deliver what the company describes as a “surprising degree of instruction-following capabilities” but from a much smaller model. Where the Alpaca team demonstrated that a state-of-the-art model could be used as a chatbot engine, Databricks says even years-old models can be tweaked to have those same types of behaviours if fine-tuned on a small corpus of instruction training data.

Content from our partners

Are we witnessing a new 'Kodak moment'?

Are we witnessing a new ‘Kodak moment’?

How the logistics sector can address a shift in distribution models

Fashion brands must seek digital solutions that understand the sector’s unique needs

Fashion brands must seek digital solutions that understand the sector’s unique needs

“Dolly works by taking an existing open-source six-billion-parameter model from EleutherAI and modifying it ever so slightly to elicit instruction following capabilities such as brainstorming and text generation not present in the original model, using data from Alpaca,” the company explained.

View all newsletters Sign up to our newsletters Data, insights and analysis delivered to you By The Tech Monitor team

The team were surprised it worked so well given the older and smaller nature of the underlying model compared to those provided by OpenAI or Google. “This suggests that much of the qualitative gains in state-of-the-art models like ChatGPT may owe to focused corpuses of instruction-following training data, rather than larger or better-tuned base models.”

“We’re calling the model Dolly — after Dolly the sheep, the first cloned mammal — because it’s an open-source clone of an Alpaca, inspired by a LLaMA. We’re in the earliest days of the democratisation of AI for the enterprise, and much work remains to be done, but we believe the technology underlying Dolly represents an exciting new opportunity for companies that want to cheaply build their own instruction-following models,” said Databricks in a blog post.

Using an open model rather than sending data to a centralised LLM makes sense for companies with highly sensitive and proprietary data. Handing it over to a third party may be unpalatable to some companies and so making trade-offs in terms of model quality and cost, against the security of using in-house models have to be considered.

Dolly will be available on Databricks with the trained weights available to anyone wanting to experiment with the model. This is the first in a series of announcements from the company which is switching its focus to helping organisations harness large language models. “We believe in the incredible power of artificial intelligence to transform the productivity of every organisation and individual, and welcome you to join us on this journey. Stay tuned for more in this area in the coming weeks.”

Read more: UK AI regulation white paper dodges ChatGPT questions

Topics in this article : AI , Cloud , Databricks

参考译文
Databricks开源其Dolly大语言人工智能模型
为了向更广泛的受众开放其技术,企业软件公司Databricks在开源许可下发布了大型语言模型Dolly及其相关的训练代码。尽管基于一个小得多的底层模型,该公司表示,它有类似chatgpt的功能,可以“内部”运行。此举的灵感来自于OpenAI自然语言平台ChatGPT的成功,该平台在去年11月发布后的几个月内就成为增长最快的消费应用之一。此后,包括微软和谷歌在内的一些世界上最大的公司开始转向并发布生成式和自然语言人工智能工具。Databricks在一篇解释这一决定的博客文章中写道:“我们证明,任何人都可以使用过时的现成开源LLM,并通过在一台机器上使用高质量的训练数据,在30分钟内训练它,赋予它神奇的chatgpt般的指令遵循能力。”它发现,ChatGPT中使用的指令跟踪类型“似乎不需要最新或最大的模型”,并声称,与GPT-3中的1750亿个参数和GPT-4或谷歌的PaLM中的更多参数相比,仅60亿个参数就能够重建ChatGPT的功能。该公司表示:“我们相信,像Dolly这样的模型将有助于llm的民主化,将它们从极少数公司能负担得起的东西转变为每个公司都能拥有和定制的商品,以改善他们的产品。”OpenAI、Anthropic、AI21 Labs等开发商,以及微软、谷歌和IBM都向终端用户收取通过API调用访问其大型语言模型的费用。如果你需要定期打很多电话,这很快就会变得很贵。或者,训练这些相同的模型是一项昂贵的工作,需要数百个GPU小时和来自数据集的数万亿字。然后Meta向研究人员公布了其高质量语言模型LLaMA的权重。它已经使用超过80,000个GPU小时进行了训练,在LLaMA之上,斯坦福大学建造的Alpaca被调整为5万个类似人类的问题和答案的子集。这导致它从一个相对较小的训练数据集中展示出类似chatgpt的功能。来自Databricks的Dolly能够提供该公司所称的“惊人程度的指令遵循能力”,但它的型号要小得多。羊驼团队证明了一个最先进的模型可以用作聊天机器人引擎,Databricks表示,即使是多年的模型,如果在一个小的指令训练数据语料库上进行微调,也可以被调整为具有相同类型的行为。该公司解释说:“Dolly的工作原理是采用EleutherAI现有的60亿参数开源模型,并对其进行轻微修改,以获得原始模型中不存在的指令遵循功能,如头脑风暴和文本生成,使用的数据来自Alpaca。”考虑到与OpenAI或谷歌提供的底层模型相比,它的底层模型更老、更小,团队对它的工作效果如此之好感到惊讶。“这表明,ChatGPT等最先进模型的大部分定性收益可能归功于集中的指令遵循训练数据语料库,而不是更大或更好调优的基础模型。”“我们称这个模型为多利,以第一只克隆哺乳动物多利羊命名,因为它是羊驼的开源克隆,灵感来自美洲驼。我们正处于企业人工智能民主化的早期阶段,还有很多工作要做,但我们相信,对于那些希望廉价构建自己的指令遵循模型的公司来说,Dolly背后的技术代表了一个令人兴奋的新机会,”Databricks在一篇博客文章中表示。 对于拥有高度敏感和专有数据的公司来说,使用开放模型而不是将数据发送到集中的LLM是有意义的。把它交给第三方可能会让一些公司感到不快,因此必须考虑在模型质量和成本方面做出权衡,以及使用内部模型的安全性。Dolly将在Databricks上提供训练过的重量,供任何想要试验模型的人使用。这是该公司一系列公告中的第一个,该公司将重点转向帮助组织利用大型语言模型。“我们相信人工智能具有不可思议的力量,可以改变每个组织和个人的生产力,欢迎您加入我们的旅程。未来几周,我们将继续关注这一领域的更多消息。”
  • models
您觉得本篇内容如何
评分

相关产品

Jewell Instruments 杰威尔 Models 59560 & 59562 倾角传感器

双轴900型是一种廉价的重力基准测斜仪,具有模拟电压输出和紧凑的尺寸。它的体积小,性能好,是许多OEM、测试和测量应用的理想选择。它有高增益、标准和广角版本,每个版本都有不同的角度范围。900型接受广泛的输入电压范围,并提供高水平的单端输出,易于用任何电压表或数字记录系统测量。粘性阻尼和温度测量可供选择。

Bronkhorst 布琅轲锶特 Models P-502CM 压力仪表

Bronkhorst高科技公司的金属密封压力表和控制器的特点是其独特的专利金属密封结构,具有优异的再密封能力。此外,它们还具有高表面质量,因此特别适合于满足半导体工业要求以及其他高纯度气体应用。压力表和控制器的底座有1/4端面密封外螺纹(VCR)或出口工艺连接件,最低量程7…。。。350毫巴(0,1。。。5 psi)绝对或相对。最大范围1,28。。。64巴(18。。。900磅/平方英寸)绝对或相对。,今天的仪器配备有数字pc板,提供高精度、优异的温度稳定性和快速响应。,特点:

Visual Sound LGS-300 扬声器

.","• 8 Ohms and 70 Volts models. • 6.5\" and 4\" speaker models, • 15, 40, 60 & 100 Watt models. • 360

Cole-Parmer GO-39800-40 红外线温度计

Use the plastics models in thin film (under 0.4 mm) plastics applications such as lamination and filmThe metals models are ideal for forging, forming, and extruding operations.Note: The metals models are not recommended for use with aluminum.The plastics models 39800-42 and -43 and metals models 39800-47 and -48 feature single lasers that indicate|Choose from two laser types-class IIIa laser models for maximum brightness, or class II models where

Pearson Electronics 皮尔逊 4160 电流传感器

Accuracy ±1% or better, initial pulse response for all models, with a high impedance load such as 1 megOhmAll models listed below come with a BNC connector except as noted and are to be used with a 50 Ohm coaxial

Rosemount / Emerson 罗斯蒙特 Model 3900 氧化还原电位(ORP)电极

Models 3900 and 3900VP are provided with a double junction reference, which protects the reference element

OMEGA Engineering, Inc. 欧米茄 PSW21 & PSW22 Series 真空开关

• Adjustable Setpoint • 1\/8 NPT or Center Spout for 1\/8 ID Tubing • Pressure and Vacuum Models Available

Myron L 麦隆 751II 水质检测仪器

All models are corrected to 25°C. The TC may be disabled to conform with USP requirements.","Standard on all controller models is a heavy-duty, 10-amp output relay, operating on either increasing,"Digital and Analog 750 Series II models have an IP65\/NEMA 4X water-resistant & corrosion-proof ratedAt 152 x 122 mm\/6 x 4.8 in., all models are suitable for panel, bench or surface mounting."

Parker Hannifin / Instrumentation Group 派克汉尼汾 F150-AHR-0 转子流量计

Models F65 and F150 Forged Body Flowmeters are variable area flowmeters featuring a compact, one-pieceBoth models have a wraparound window for full 180° visibility of the flow tube and are available withaluminum, brass or 316 stainless steel wetted parts. - Models F65 and F150 Forged Body Flowmeters areBoth models have a wraparound window for full 180° visibility of the flow tube and are available with

Sitron CF420RM 液体流量计

Both of these models offer reliable liquid flow monitoring, with the flexibility of a separate panelAll models can be ordered with a great variety of threaded, flange, or sanitary process connections."

评论

您需要登录才可以回复|注册

提交评论

techmonitor

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

英国政府成立人工智能工作组来研究基础模型

提取码
复制提取码
点击跳转至百度网盘