Compute power is becoming a bottleneck for developing AI. Here’s how you clear it.

2022-12-16 09:06:40
关注

In less than a week, OpenAI’s first chatbot tool ChatGPT went viral, with billions of requests being made to put the much-hyped system through its paces. Interest was so high that the company had to implement traffic management tools including a queuing system and slowing down queries in order to cope with demand, and the incident highlights the vast amounts of compute power required to sustain large language models like GPT-3, the system on which ChatGPT is built.

OpenAI has been forced to introduce a queuing system and other traffic shaping measure due to demand for ChatGPT
OpenAI has been forced to introduce a queuing system and other traffic shaping measures due to demand for ChatGPT

As this and other types of advanced AI become more commonplace and are put to use by businesses and consumers, the challenge will be to maintain sufficient compute capacity to support them. But this is easier said than done, and one expert told Tech Monitor a bottleneck is already being created by a lack of compute power holding back AI development. Turning to supercomputers, or entire new hardware architectures, could be potential solutions

The scale of compute power required to run ChatGPT

A large language model such as GPT-3 requires a significant amount of energy and computing power for its initial training. This is in part due to the limited memory capacity of even the largest GPUs used to train the systems, requiring multiple processors to be running in parallel.

Even querying a model using ChatGPT requires multi-core CPUs if done in real-time. This has led to processing power becoming a major barrier limiting how advanced an AI model can become.

Companies Intelligence

View All

Reports

View All

Data Insights

View All

GPT-3 is one of the largest ever created with 175bn parameters and, according to a research paper by Nvidia and Microsoft Research “even if we are able to fit the model in a single GPU, the high number of compute operations required can result in unrealistically long training times” with GPT-3 taking an estimated 288 years on a single V100 Nvidia GPU.

Using processors running in parallel is the most common solution to speed things up but it has its limitations, as beyond a certain number of GPUS the per-GPU batch size becomes too small and increasing numbers further becomes less viable while increasing costs.

Hardware has already become a bottleneck for AI

Professor Mark Parsons, director of EPCC, the supercomputing centre at the University of Edinburgh told Tech Monitor a realistic limit is about 1,000 GPUs and the most viable way to handle that is through a dedicated AI supercomputer. The problem, he said, is even if the GPUs can become faster the bottleneck will still exist as the interconnectors between GPUs and between systems isn’t fast enough.

“Hardware has already become a bottleneck for AI,” he declared. “After you have trained a subset of data on one of the GPUs you have to bring the data back, share it out and do another training session on all GPUs which takes huge amounts of network bandwidth and work off GPUs.”

Content from our partners

Technology and innovation can drive post-pandemic recovery for logistics sector

Technology and innovation can drive post-pandemic recovery for logistics sector

How to engage in SAP monitoring effectively in an era of volatility

How to engage in SAP monitoring effectively in an era of volatility

How to turn the evidence hackers leave behind against them

How to turn the evidence hackers leave behind against them

“GPT and other large language models are being continuously developed and some of the shortcomings in training in parallel are being solved,” Parsons adds. “I think the big challenge is a supercomputing challenge which is how we improve data transfer between GPU servers. This isn’t a new problem and one that we’ve had in supercomputing for some time, but now AI developers are turning to supercomputers they are realising this issue”

View all newsletters Sign up to our newsletters Data, insights and analysis delivered to you By The Tech Monitor team

He isn’t sure how quickly the speed of interconnects will be able to catch up as the fastest in the works have a throughput of about 800 Gbps which “is not fast enough today”.

“Computer networking speeds are improving but they are not increasing at the speed AI people want them to as the models are growing at a faster rate than the speed is increasing,” he says. “All people selling high-performance interconnects have roadmaps, have done designs and know where we are going in next five years – but I don’t know if the proposed 800Gbps will be enough to solve this problem as the models are coming with trillions upon trillions of parameters.”

He said it won’t be a major problem as long as the AI developers continue to improve the efficiency of their algorithms, if they don’t manage to do that then there “will be a serious problem” and delays until the hardware can catch up with the demands of the software.

Will new architectures be needed to cope with AI?

OpenAI’s upcoming large language model, GPT-4, is due to be released next. While rumoured to be an order of magnitude larger than GPT-3 in terms of power, is also thought to be aiming to deliver this increased ability for the same server load.

Mirco Musolesi, professor of computer science at University College London told Tech Monitor said developing large language models further will require improved software and better infrastructure. A combination of the two, plus hardware not yet developed, will end the bottleneck, he believes.

“The revolution is also architectural, since the key problem is the distribution of computation in clusters and farms of computational units in the most efficient way,” Professor Musolesi says. “This should also be cost-effective in terms of power consumption and maintenance as well.

“With the current models, the need for large-scale architectures will stay there. We will need some algorithmic breakthroughs, possibly around model approximation and compression for very large models. I believe there is some serious work to be done there.”

The problem, he explained, is that AI isn’t well-served by current computing architectures as they require certain types of computations, including tensor operations, that require specialist systems and the current supercomputers tend to be more general purpose.

New AI supercomputers, such as the ones in development by Meta, Microsoft and Nvidia, will solve some of these problems “but this is only one aspect of the problem,” said Musolesi. “Since the models do not fit on a single computing unit, there is the need of building parallel architectures supporting this type of specialised operations in a distributed and fault-tolerant way.  The future will be probably about scaling the models further and, probably, the “Holy Grail” will be about “lossless compression” of these very large models”.

This will come at a huge cost and to reach the “millisecond” speed a search engine can deliver thousands of results, AI hardware and software will “require substantial further investment”.

He says new approaches will emerge including through new mathematical models requiring additional types of operations not yet known, although Musolesi added that “current investments will also steer the development of these future models, which might be designed in order to maximise the utilisation of the computational infrastructures currently under development – at least in the short term”.

Read more: Will ChatGPT be used to write malware?

Topics in this article : AI , ChatGPT , GPT-3 , OpenAI

参考译文
计算能力正在成为发展人工智能的瓶颈。以下是如何清除它的方法。
在不到一周的时间里,OpenAI 的第一款聊天机器人工具 ChatGPT 爆火,亿万次请求不断涌来,人们都想亲自测试这套备受期待的系统。由于需求激增,公司不得不采取交通管理措施,包括设置排队系统以及减缓查询速度来应对,这一事件也凸显了维持像 GPT-3 这样大型语言模型所需的巨大计算能力。由于 ChatGPT 的需求激增,OpenAI 被迫引入排队系统和其他流量管理措施。随着 ChatGPT 和其他类型的高级人工智能变得更加常见,被企业与消费者广泛使用,挑战在于如何维持足够的计算能力来支持它们。然而,说起来容易做起来难。一位专家告诉 Tech Monitor,目前已经因为计算能力不足而形成一个瓶颈,阻碍了人工智能的发展。转向超级计算机或全新的硬件架构可能是潜在的解决方案。运行 ChatGPT 所需的计算能力规模像 GPT-3 这样的大型语言模型在初始训练阶段需要大量的能量和计算资源。这在一定程度上是由于即使是目前最大型的 GPU,其内存容量也有限,因此需要多个处理器并行运行。即使通过 ChatGPT 查询模型,如果要在实时中完成,也需要使用多核 CPU。这导致计算能力成为限制人工智能模型发展程度的一个主要障碍。GPT-3 是有史以来规模最大的模型之一,拥有 1750 亿个参数。根据英伟达和微软研究院的一篇论文,“即使我们能够把模型装在一个 GPU 上,所需的大量计算操作也会导致训练时间长到不切实际”,GPT-3 在单个 V100 英伟达 GPU 上预计需要 288 年。使用多个处理器并行运行是加快计算的最常见方法,但这种方法也有其局限性,因为当 GPU 的数量超过一定数量时,每个 GPU 的批量大小会变得过小,增加更多 GPU 反而效率降低,同时成本也会上升。爱丁堡大学超级计算中心 EPCC 的主任、教授 Mark Parsons 告诉 Tech Monitor,现实中的极限大约是 1000 块 GPU,而处理这一问题最可行的方式是通过专用的人工智能超级计算机。他表示,即使 GPU 的速度可以提高,瓶颈依然存在,因为 GPU 之间以及系统之间的连接速度不够快。“硬件已经成为人工智能的瓶颈。”他说道,“你在一个 GPU 上完成了一部分数据的训练后,必须将数据带回、分享,并在所有 GPU 上进行新一轮的训练,这会消耗大量网络带宽,而且需要 GPU 之外的处理。” Parsons 补充道:“GPT 和其他大型语言模型正在持续开发,其中一些在并行训练中存在的问题也正在被解决。我认为,真正的挑战其实是一个超级计算的挑战——我们如何提高 GPU 服务器之间的数据传输速度。这不是一个新问题,我们已经在超级计算领域存在这个问题一段时间了,但现在 AI 开发者转向超级计算机后才意识到这一点。”他说目前还不确定连接速度能以多快的速度赶上,目前在研的最快连接速度也只有大约 800 Gbps,这“在今天还不够快”。他说:“计算机网络的速度确实在提升,但并没有像 AI 从业者希望的那样快,因为模型的增长速度比网络速度的提升还要快。所有销售高性能连接设备的公司都有自己的路线图,设计规划也都有,他们知道未来五年的方向——但我并不确定他们提出的 800Gbps 是否足够解决这个问题,因为未来模型的参数数量将达到数万亿。”他补充说,只要 AI 开发者继续提高其算法的效率,这就不至于成为大问题。但如果他们无法做到这一点,那么就会“出现严重问题”,直至硬件能够赶上软件的需求。为了应对人工智能,是否需要全新的架构?OpenAI 即将推出的大型语言模型 GPT-4 即将发布。据传,它的能力将是 GPT-3 的一个数量级规模,但据说它还旨在在相同服务器负载下提供这种能力。伦敦大学学院计算机科学教授 Mirco Musolesi 告诉 Tech Monitor,进一步开发大型语言模型需要改进软件和更好的基础设施。他相信,两者的结合,加上尚未开发的硬件,将最终突破瓶颈。Musolesi 教授表示:“这场革命也是一场架构上的革命,因为关键问题在于如何以最有效的方式在计算集群和数据中心中分配计算。这在能源消耗和维护方面也必须是经济的。”他补充说,现有的模型仍然需要大规模的架构,我们需要一些算法上的重大突破,例如围绕模型近似和压缩的突破。他相信,在这方面还有大量严肃的工作要做。Musolesi 解释说,AI 并不能很好地适应当前的计算架构,因为它们需要某些类型的计算,包括张量运算,而这需要专门的系统,而目前的超级计算机则更偏向于通用目的。Meta、微软和英伟达正在开发的新型人工智能超级计算机将解决其中的一些问题。Musolesi 说:“但这只是问题的一个方面。由于这些模型无法适应单个计算单元,因此需要构建能够支持这类专门操作的分布式且容错的并行架构。”“未来可能是进一步扩大模型规模,而真正的‘圣杯’可能是对这些庞大模型进行‘无损压缩’。这将带来巨大的成本,为了达到搜索引擎能在‘毫秒’内返回上千个结果的速度,人工智能的硬件和软件将‘需要进行大量的进一步投资’。”他表示,新的方法将不断涌现,包括通过新的数学模型,引入目前尚未被认知的操作类型。尽管 Musolesi 补充说,“当前的投资也将引导这些未来模型的发展,这些模型可能会在短期内被设计为最大化利用目前正在开发的计算基础设施”。阅读更多:ChatGPT 会被用来编写恶意软件吗?本文涉及的主题:人工智能,ChatGPT,GPT-3,OpenAI
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

广告

techmonitor

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

被Meta裁掉后,我拿到了108万赔偿金

提取码
复制提取码
点击跳转至百度网盘