在往期的集微访谈栏目中,爱集微有幸采访了Axelera AI首席执行官兼联合创始人Fabrizio Del Maffeo。集微访谈就关于RISC-V开源技术、AI芯片发展、存内计算、新创企业模式、数据访问权限等一系列问题,收到了十分有启发的答复。
问:我的第一 个问题是,与冯·诺依曼结构相比,内存计算的特点是什么?与传统架构相比,它有什么优势?
答:感谢提问。存内计算的优势在于,您可以用一种特定的方式进行并行计算,因为从本质上讲,您是在转换一个存储阵列,这个阵列通常很大,可以是26万个元件,也可以是一百万个元件,并将其用作计算引擎。然后优点就在于高度并行化,这意味着高吞吐量,低数据传输量。因为你在存储内进行计算,这意味着低功耗和低成本,因为你合并了存储区域和计算元件。
这样,整体区域的面积就更小,芯片成本也就更低。存内计算分为两种,一种是模拟存内计算,另一种是数字存内计算。在模拟存内计算中,你利用电流和晶体管和存储单元中电压之间的关系,来进行所有的神经网络中的计算和向量矩阵乘法。
这是一种方法,对吧?但是当你在模拟域中进行处理,你输入一个数字信号数据。你需要先将其转换为模拟信号,进行计算,最后再将结果转换回数字信号。模拟存内计算问题就在于模拟域中存在噪声。当噪声出现,它就会影响计算的结果。一个典型的模拟内存计算芯片。通常它们的精度都不高。你必须对计算网络和芯片进行微调才能得到一个不错的精度。在我们的AXELERA中,我们有模拟存内计算的技术,但我们不使用它。我们使用与之不同的数字存内计算,因为这样我们就不需要进行数模和模数转换,不需要在模拟域计算。我们只是在SRAM单元。在靠近每个单元的地方嵌入了一个元件,一个计算元件来做乘法运算。然后我们用一个加法器树来做累加。从而实现在数字领域进行计算,让我们能够把内存和计算集中在一个小区域内。同时该结构也让我们实现了并行化计算。
这意味着能实现高吞吐量,低成本——因为芯片更小了,更少的数据传输——这意味着低功耗,以及高精度——因为我们在数字域做计算。
问:您认为内存计算会扩展到更通用的计算领域还是更适合专用计算?
答:对,你使用存内计算。你只用来做一件事,那就是在向量和矩阵中做乘法运算。而如果你深入研究神经网络、递归神经网络、卷积神经网络、LSTM 网络、Transformer网络,70%到90%的计算都只是向量矩阵乘法。
而你可以通过存内计算来做所有的向量矩阵乘法。存内计算可以完成这所有的一切。存内计算跑不了激活函数。你不在存内计算中做这些。存内计算只需要做乘法和累加——即对数字求和。仅此而已。但这些计算占了神经网络计算量中的70%到90%。这就是为什么我们在人工智能和机器学习进行深度学习的领域中使用它的一个非常重要的原因。
但在其他任何领域,你都不会使用存内计算,除非你就是要做向量矩阵乘法运算。
问:您认为存内计算是否是突破存储墙(Memory Wall)的一种解决方案?
答:存内计算是向量矩阵乘法的解决方案,仅此而已。要打破存储墙,还有其他的一些方法,比如邻存计算,这种方法略有不同。比如你有一个更通用的计算元件,非常小,放在靠近存储的地方。
这样,你可以使用数千个小 CPU 和数千个小存储而不是使用巨大的CPU和巨大的存储。我认为这是解决内存墙的最佳方案,但这并不是真正的存内计算,而是邻存计算,因为两者的区别在于,在邻存计算中,你仍然有一个存储阵列和一个计算元件。而在存内计算中,你要分解存储阵列,并在阵列中放入计算元件。存内计算智能用于乘法和累加,没有其他用处。
问:RISC-V这种开源技术能否成为“人工智能民主化”愿景的一部分?
答:是的。总的来说,在加速器上,我们尽所能保持我们的软件栈的开放。我们在使用开放源代码,也在编译器的后端使用了TVM,还在固件中使用了由英特尔支持的开源项目Zephyr。我们还尝试使用oneAPI,我们正在尝试使用尽可能多的开源软件,同时也回馈社区。
在加速器领域,有很多人在RISC-V社区都非常活跃。然后我们想回馈社区。我们想开发一些东西,创建我们自己的架构和产品,但仍然基于开源。但我认为,当我提出我们想民主化人工智能时,也就意味着我们想要一个功能强大、可用性强、成本低的产品。
例如,如果你使用我们设计的解决方案,它实现了超过 200TOPs的算力,我们将它定价在149美元的卡片中,因为我们希望人们使用它。我们想让人们获得这个强大的解决方案。赚钱总是有时间的。但首先要让人们能够利用我们的技术创造出伟大的东西。如果他们成功了,我们也就成功了。然后我们认为,重要的是要有一个易于使用、高性能、低成本的东西,你可以在网上买到它,在世界各地都能买到它。你随处都可以做出好产品。我们希望激发创新的活力。
问:对于 AI 加速芯片来说,采用 RSIC-V 有何优势?
答:优点是我们都可以掌控它,因为它是开源的,所以我们可以设计它,可以控制它。我们不需要回到任何人那里去征求许可或者询问编译器的源代码。无论你使用的是CAD、Synopsys还是Arm的任何IP,他们都一样。你都无法获得所有的权限,你就只得开始依赖他们。这可能会成为一个隐患。因此,从长远来看,有了RISC-V,你可以完全控制你的架构。同时很好的一点是它是一个经过大型社区测试的平台。而且你还可以对它进行扩展和开发。比如,我们正在开发一种特定的向量指令集单元,这将集成在下一代产品中。我们可以自己做,因为我们有知识。而且它是一个开源的平台,这样我们就不必与供应商协商解决问题了。
问:您觉得 AI 应用会成为 RISC-V 生态的重要推动力么?
答:我认为在针对特定应用设定的芯片中使用RISC-V比在通用芯片中更容易。因为在特定应用芯片中,你可以使用RISC-V,并针对你想做的事情优化 RISC-V。
然后,你必须对它进行验证,以满足你的要求。但是如果你想把RISC-V用作通用处理器,你想用它来和英特尔或者AMD最先进的CPU竞争,那就得另当别论了。用RISC-V实现要困难得多,需要的资源和时间也更多。因为这是一个新的架构,它还没有得到所有人的高度认可。从某种意义上来讲,当达到芯片如此复杂的阶段,你需要一整个生态系统的支持,你需要驱动程序,你需要来自社区、微软、Ubuntu、Linux 的支持。总的来说,这将变得更加困难。因此,我认为目前得益于AI, RISC-V 将会发展壮大,。但要想成为真正的通用解决方案,来替代现在的产品,还需要 5 到 10 年的时间。我们还需要给RISC-V一些时间才会看到它在诸如手机上应用。
问:就计算效率而言,也许数据中心拥有更好的基础设施,因为他们有更好的基础设施。他们拥有更稳定的计算能力。为什么我们需要边缘人工智能?
答:正如你所说,你不需要边缘人工智能来提高效率,您提的观点是正确的。数据中心的效率比边缘计算好太多了,因为你把所有东西都集中起来了,尤其是利用率,比效率优势来说,数据中心的利用率还要更高,对吧?但你需要边缘人工智能,因为隐私、数据安全、安全、金融。想想看,你不会容许你的汽车向云端询问,我应该向右转还是向左转吧?你的汽车需要有足够的计算能力,能够在没有延迟的情况下及时做出反应,几乎不需要向云端查询就能对发生的任何事情做出反应。甚至因为在某些领域,云计算还没有覆盖。
其次,从经济学角度来讲,把所有数据都发送到云端也是没有意义的。想想监控系统,复杂的监控系统,有大量的高分辨率的摄像头。想把所有这些数据都发送到云端的成本是极高的,因为其中有95%或者98%的数据都是无用的。因为你想要了解或你想要识别的东西才是有用的,随便举个例子,比如某人在火车站里丢了行李。后者警察正在寻找某位正在移动的逃犯。对于这些你不必知道的事,为什么你要把所有的数据都发送到云端?你可以在边缘提取正确的信息,这样做的成本会更低。而且,其实在许多地区云计算还没覆盖。事实上,可能根本没有一个良好的网络连接。这就存在一个基础设施的问题,你无法解决整个体系的问题,将数据发送到云端。那么边缘计算就有意义了。它对许多不同的应用都是必要的,无人机、机器人、汽车……甚至监控系统。
问:对于边缘人工智能的解决方案,我们还需要克服什么问题?你已经提到了能源消耗,可能平台可能没有那么充足的能量。还有运行条件、光线、延迟、成本或者维护能力,这些呢?
答:是的,我认为对我来说,障碍是不一样的。在云计算领域,参与的玩家不多,比如中国也就两三四家云计算提供商,美国和欧洲也是如此。就提供商而言,中国和美国在云计算领域处于领先地位。有几家公司正在建立云计算数据中心并提供服务。
因此,设计一项技术并提供给他们是很容易的,因为你只需面对一个大客户,他们只用同一套他们需要的功能清单,等等。但当涉及到边缘人工智能的时候,你有一千到几千个客户,每个客户都有不同的要求。而且许多客户都不具备了解你的技术的背景,也无法按照他们的需求来调校。
那么,你需要解决边缘计算的问题就不同了。如果你想让边缘计算取得成功,你就需要有明显的性价比高的硬件,因为你需要成本效益高的解决方案,因为在成本方面边缘计算客户比云端客户更敏感。你需要提高能效,因为你有限制。数据中心没有能源限制,通常它旁边就是电厂。而在边缘计算场景下,你会遇到一些限制。因此,你要注重能效。此外还要注重易用性。你需要提供即插即用的产品。客户,有90%边缘计算的客户,他们不可能拥有百度那样的工程师。公司的情况是不一样的,对吧?因为他们都是中小型公司。
所以你需要给他们提供所有的软件栈和工具。让他们能够以容易简单的方式高效地使用你的解决方案。这就是为什么要注重易用性。如今,例如,在边缘领域,虽然英伟达的AI硬件产品很强大,比如性能以及平台,但它太贵了,限制了它的普及。你总不能想着在一个准备卖 500 美元的机器人里面塞一颗价值 1000 美元的芯片对吧?这肯定不现实。
我发现有一些解决方案很好,但很贵,也有一些解决方案很便宜,但很难使用。找到一个好的折中方案很重要。
问:您能告诉我们更多关于维持能力的事吗?还有让Edge AI芯片保持Edge AI解决方案的易用性有多重要?
答:我可以告诉你,首先,客户他们用云端做所有的事情,甚至是训练算法。如果你是一家中小型企业,你想在人工智能领域有所作为,你必须连接到亚马逊或者百度之类的。不管选择哪家的服务,总之你都必须回到云端系统,使用云端中的典型工具。那么怎么把AI从云端拿出来?是网络,是训练网络应用程序。
那么问题就是如何在边缘使用这些功能。我们Axelera AI,我们必须为客户提供一个简单的软件堆栈,这让他们在云端做的事情在边缘也可以运行。你必须确定,不知道什么是量化,云计算的客户可能知道这是啥,但是在边缘计算,90%、95%的客户不知道也不在乎单精度浮点型和整型之间有什么区别。所以这个坑只能我们来填。让他们无论在云端做了什么。我们必须在边缘侧为他们提供使用相同应用或相同网络的工具。然后,边缘服务提供商需要建立一个更柔性的堆栈,允许客户使用他们现在正在使用的东西,但要部署在边缘。我们应该负责部署落地,而不是造新轮子,因为客户他们不想学习新的东西。
如果你去找客户说,听着,我有很好的硬件,但是你必须学习我的软件。他们会说,不,我没有时间整这些,我不需要,凭啥要我从头学起?你必须去找他们说,听着,我有一个很棒的硬件和软件堆栈,你要做的就是继续用你习惯的方案,按下按钮,然后就能直接用了。或者继续用你的方案,然后只要做这几步,就能直接用了。部署落地应该做得非常简单。这也是很多公司我认为他们没有考虑到的关键问题。他们认为效率更重要。诚然,效率是很重要,但效率并不是全部。你需要把:效率、吞吐量、成本和软件堆栈这几方面放一起考虑。甚至还有客户关心的是总体拥有成本。如果你对客户说,听着,用我的产品,你每年可以节省30万欧元,但客户更换软件需要花费100万欧元,那么很容易预想,他们就不会选择购买。然后你必须从全局的高度来思考其中隐含的意义。
问:边缘 AI 所面对的各种工况是否意味着,其采用的芯片类型和数据中心所使用的芯片的侧重点各不相同?边缘 AI 可能更倾向于专芯专用?
答:如果涉及到消费者边缘设备,它们是高度定制化的。例如,在电视中,电视算边缘设备。它使用专有的解决方案,并且具有很多由人工智能驱动的功能,因此它的SoC必须极其定制化。其必须以非常特定的方式进行设计,它必须具有低功耗特性,因为电视必须保持低功耗。你不能塞个风扇或者计算机进去。因此,电视是高度定制化的。手机也是如此。手机上的话定制程度就更高了,因为手机是用电池驱动的,那么可能不会运行浮点网络,而是运行二进制网络,因为对于大多数用户来说二进制网络已经足够满足需求,他们对此不太敏感。当涉及到自动化时,你必须找到一个合理的折衷方案,因为有时仍然存在功耗限制,但你没法妥协说,我使用二进制计算的网络,因为使用自动化和网络时,需要依赖高精确度的结果。
你需要在效率、吞吐量和准确性之间找到一个良好的平衡。尽管边缘设备有一些限制,但仍需要努力达到云计算级别的精确性,因此边缘设备也是定制的,但是多样化的,它是更可编程的解决方案。
而当你转向云端时,正如你所说,在云端什么都有。在云端,你会发现,专业化程度越来越高。不同之处在于,在数据中心中,你开始拥有越来越多针对特定工作负载设计的专用机器。虽然云端的效率要求不像边缘设备那样严格,但仍然是必要的。在边缘设备上,你尝试实现每瓦15Tops、20Tops或30Tops等高性能。而在今天的云端,工作负载的运行效率通常只有每瓦0.1Tops甚至更低。因为通用计算平台的效率很低。此外,即使在数据中心中,你也会看到越来越多地使用专用硬件,如张量处理单元(TPU)、GPU和CPU、ASIC。这是一个趋势,并且基于工作负载的不同,它们会将任务分配给不同的硬件。在数据中心中,我也看到了这样的趋势。
问:边缘 AI 碎片化的产品需求是否意味着更不容易被大公司垄断,小公司会有更多机会?答:是的,完全正确。从传统来看确实是这样的。如果我们考虑云计算,在过去的20到30年中,一直都有英特尔、AMD,最近还加入了英伟达,然后实际上有2到3家公司占据了云计算市场的98%,而新兴公司或其他公司只占很小一部分份额。
但是,当我们转向边缘计算领域,历史上一直存在着众多的参与者,比如英特尔、AMD、英伟达、高通、恩智浦、德州仪器、瑞萨电子、意法半导体、英飞凌、联发科技、Cirrus Logic、Umbrella Silicone等公司。边缘计算市场更加专用化,有非常多不同的应用领域。这导致了市场的碎片化,而大型企业并不喜欢这种情况。顾客往往需要特定的应用处理器来满足他们在边缘设备上的需求。这就是为什么边缘计算市场存在更多的空间容纳更多参与者。我预计在边缘计算领域也会出现整合,但不像云计算那样,我预计在边缘计算领域会看到的公司会多得多,虽然边缘计算的半导体公司体量较小,但是数量更多,而在云计算领域的玩家则相对较少。
问:您是如何看待 CUDA 的?
答:这是英伟达的成功。我的意思是,英伟达之所以取得今天的成功,要归功于CUDA,该技术在2003、2004年左右开发出来的,如果没记错的话。起初,人们对英伟达持怀疑态度,因为他们开发CUDA是为了科学研究和并行计算等领域,而不是专门为了人工智能。然而,它逐渐成为了该领域的参考标准。生态系统也是非常重要的。你能会看到英伟达的CUDA,但你也能看到诸如Pytorch和TensorFlow等开源平台。
然后,有一些工具被广泛使用。因此,我认为公司应该始终将自己的架构整合到生态系统中。对于我们来说,我们不能进入英伟达内部,因为我们是竞争对手,但是我们需要找到一种方法,让英伟达的客户能够轻松地使用我们的硬件。所以我们需要将我们的架构插入到所有部分的后端,因为生态系统就是一切。我们做的大概是我们把芯片比作汽车的发动机,而汽车本身就是系统或者主板,软件则是驾驶员,在这个情况下,数据就是驱动汽车的燃料。因此,你必须始终全面考虑这个系统。
如果你设计一个引擎,你必须知道将引擎安装在哪款车上,你必须知道谁是驾驶员,你必须知道车内使用的是什么燃料。当你设计某样具体的产品时,你必须始终考虑整体情况。否则设计出来的东西就没法用,如果你设计了错误的引擎,试图将其安装在错误的车辆上,显然是行不通的。对吧?所以你必须始终思考生态系统层面的问题。你不能认为每辆车配备了几千名驾驶员,因为我的芯片将能够运行数千种软件,而且会与不同类型的数据一起使用,比如用于图像卷积神经网络或者用于LSTM(长短期记忆网络)的样本音频数据。因此,你必须从一开始就牢记这一点。
问:您认为在未来边缘 AI 和数据中心的集中 AI 会相互融合相互配合?我们现在处于什么状态?
答:我们将始终保持云端与边缘的整合。如今已经实现了整合,因为我们始终会在云端训练网络。我的意思是,没有必要在边缘设备上训练网络,因为网络的训练需要在有限的时间内用大量的计算资源,因为时间有限。因此,最高效的方式始终是使用云端计算。
然后,在边缘设备上进行微调,因为边缘和云端的网络模型已经有非常强大的关联。你在云端进行训练,然后在边缘设备上进行微调,然后只把需要的相关数据发送回云端以进行更新,优化网络(模型)。现在边缘和云端模型已经关联了。而且这种关联将始终存在,因为总会有一些工作负载,没道理在边缘设备上运行。比如取决于每天的时间点的工作流。例如峰值负载,边缘设备很难处理峰值负载,因为计算能力无法瞬间扩展,你无法持续扩展计算能力。如果我现在同时启动20个不同的应用程序,系统会崩溃,因为它无法处理那么多负载。
如果我决定连接到云端同时运行20个应用程序,这根本不算问题,因为云端可以为我分配足够的计算资源。当需要动态分配工作负载时,云端是最佳解决方案。不论是奥斯汀的网站、研究还是其他应用程序,你需要在云端运行。而我的期望是拥有更多的分散式计算,从边缘提取尽可能多的信息,并只将相关数据发送到云端进行。边缘计算就像是过滤器一样,我们同时产生了很多垃圾和信息,而边缘设备应该过滤垃圾并将真正的信息发送到云端。我们应该只是过滤,边缘应该作为第一个过滤器。最终,云端应该生成并将资源分发回边缘设备。那么,云端和边缘端计算缺一不可。
问:实现如上愿景共有哪些难点(例如隐私保护,协议的互通等)?
答:当今集成的主要问题是云端的资源有限,而边缘端自身就有限制——这是主要的矛盾:你不能简单地将云端应用程序直接转移到边缘设备上,而是需要进行适应性调整。这是边缘与云端集成的主要挑战。你提到了一个重要观点,即安全和隐私问题,特别是在欧洲这样的地区。我们在欧洲非常重视这一点。我们有《通用数据保护规则》(GDPR),并且非常担心过多地共享数据。因此,在某些应用中使用云端出现了越来越多的问题,我们需要边缘计算来解决隐私和安全问题。
但这并不是一件容易解决的事情,它涉及实施适当的软件解决方案。此外,我认为真正的主要问题在于前面所说的很多在云端运行的东西无法在边缘设备上运行,因为必须进行重新编程、重新训练、重新适应等操作,原因是硬件不同,对吧?
问:您认为克服这样的问题,目前有什么可能的方法?(例如联邦学习,制定行业规范等)答:问题通常在于缺乏标准。没有标准的协议。你提到联邦学习是个好东西,但每个参与者都有自己的实现方式,并受限于他们使用的硬件条件。因此,你需要标准化来规范设备,你需要将设备标准化才能使不同设备能够相互连接。如今,大公司不喜欢标准化,希望保持封闭生态系统以确保盈利,因此许多玩家就没有兴趣开放源代码或开放其他设备的连接。这就是问题所在,也就是连接的问题。联邦学习是个好东西,但目前没有真正的标准,仍处于暂定状态。虽然安全方面已经有一些解决方案,现在有些公司正尝试采用同态加密来解决数据隐私和安全问题。要解决这些重大问题,需要让所有参与者坐到一起,就标准化事项达成一致。就像我们都同意使用PCIe、USB这些连接设备的总线一样,我们还应该在协议级别和应用层面上达成共识。但是人们并不喜欢这样做,因为他们更想在应用层竞争,没人关心在PCIe、USB的低层级上竞争。但对于大公司来说,在高层级的竞争非常重要,这使得情况变得复杂,非常困难。我认为政府可以发挥作用,通过强制规定来规范,比如说某种方式应该是这样的,否则不允许使用。这可能是实现目标的唯一方法,如果我们真的想让联邦学习取得成功的话。
问:Axelera AI 总部位于埃因霍温,这是一座高科技实力雄厚的小城市。 埃因霍温凭借怎样的炼金术打造出如此伟大的半导体产业集群?
答:埃因霍温因飞利浦而存在,而飞利浦这个品牌来自埃因霍温,而飞利浦先生也来自这个城市。他对这座城市做出了巨大贡献。如果回顾历史,你会发现飞利浦实际上是TSMC的合作创始公司之一。台积电是由飞利浦半导体和台湾省政府共同创办的。
而飞利浦将知识产权交给了TSMC,并将光刻部门分拆出来,成立了ASML,这是全球最重要的光刻机制造公司。飞利浦还将半导体部门分拆出去,如今被称为恩智浦。恩智浦就是飞利浦的半导体部门。后来,飞利浦收购了Freescale,如今成为一家大型企业,也是有飞利浦的贡献。此外,飞利浦还在医疗保健领域担当重要角色,是全球最大的医疗保健公司之一。在埃因霍温,一切都围绕着飞利浦展开。ASML、NXP和飞利浦构建了极其强大的生态系统。埃因霍温可谓“风水宝地”,隔壁比利时鲁汶就是校际微电子中心(imec),是推动纳米技术发展的巨型中心。
因此,埃因霍温地区绝对是理想之地。在Axelera AI,正如你所知,我们拥有来自英特尔的专业人士,他们来自埃因霍温,还有来自苏黎世联邦理工学院(ETH Zurich)的人员。我们在瑞士设有一个大型办公室,我们还有来自IBM苏黎世实验室的人员。此外,我们还有来自五个制造商人才组成的存内计算团队。我们Axelera AI在欧洲各地拥有员工。目前,我们有140名员工,其中超过50名拥有博士学位,但他们分布在欧洲不同地区。我们致力于招聘在欧洲找到的最优秀的人才。
问:您认为对于半导体初创公司来说,最需要什么样的支持(上下游供应链?资金支持?人才生态等等)
答:对于我们这样的公司来说,我认为雇佣人才并不困难,因为我们从事很酷的工作。我们在欧洲开发尖端的人工智能芯片。在欧洲,从事我们所做的工作的公司并不多,因为欧洲的大型企业从事其他领域的业务。比如恩智浦、意法半导体、英飞凌和博世,它们在汽车和工业领域非常强大,但在人工智能方面并不那么强大。我们给人们提供了加入一家建立人工智能的公司的机会,他们非常乐意加入我们。我们拥有非常有才华的员工,因为他们喜欢我们所从事的工作。对我们这样的创业公司来说,最困难的是融资,因为我们既不在美国也不在中国,在欧洲为硬件项目获得资金是困难的。对于软件来说很容易获得资金,但对于其他方面来说却非常困难,因为人们持怀疑态度。在过去的几十年里,欧洲对硬件领域没有进行太多投资。
而现在,我们有了《欧洲芯片法案》,由于政府认识到硬件的重要性,情况有所改善,但仍然极为困难——这是一点。另外获取数据非常困难,由于隐私问题等原因,小公司很难获取数据,没有地方获取数据,没有可以访问的数据库,现在对我们来说没啥问题,因为我们只是设计芯片的。但从长远来看,你应该拥有数据、获取数据、来训练网络、学习等等。目前,只有大公司才能获取广泛的数据集。第三个挑战是政治局势。美中紧张关系对欧洲也产生了影响,尽管欧洲与中国历史上一直保持着良好的关系。然而,这种政治紧张局势会引发不确定性和担忧,即使对于像我们这样的公司也是如此。因为人们害怕这种不确定性,这是一个问题,因为政治给公司的扩张和建立关系带来了风险和限制。我认为这是一个非常严重的问题,需要得到解决,因为它对每个人和经济增长都有着负面影响。
我理解您的观点,因为对我来说,这个法律有些严格。我的意思是,如果我创造了一些东西并分享出来,你使用了它,我应该得到一些回报。就像在欧洲网络新闻一样,最终欧盟对网站进行了处罚,并告诉包括Facebook在内的大型企业,如果你想使用新闻,你必须付费。一方面,我们应该找出一种机制并支付报酬。
另一方面,我们应该获得数据的访问权限。我认为我们应该寻求一种折中的解决方案。我理解日本采取了极端的做法,但这是唯一的方式,如果没有数据,就无法设计人工智能。我是指人们因为数据而抱怨,在中国,大公司可以轻松获取数据并进行训练。在美国情况非常相似,在欧洲由于GDPR等规定,情况更加困难,非常危险。我们必须找到一种方式来获取数据,就算是加密数据也可以,因为我们不在意数据内容是什么。我的意思是,你可以对数据进行加密,删除敏感信息,然后提供剩下的信息。如果这是有版权的数据,我认为完全支持征税或直接支付费用给原始数据提供方。我完全赞同,但我们必须获得数据。
以下是采访原文(英文):
Q:what are the characteristics of the in-memory computing compared to the von Neumann architectures? And what advantages does it have over the traditional architectures?
A:Yes. Thanks for asking. In-memory computing as the advantage that you can parallelize computations in a unique way, because essentially, you are transforming a memory array, which is typically large, can be 260,000 elements or a million elements and use this as a computational engine.
Then the advantage is that you have high parallelization, which means high throughput, low data movement, because you compute the calculation in the memory, which means low power consumption and low cost, because you merged the memory area with the computing element.
And then the area is smaller and means low cost for the chip. And there are two kinds of in-memory computing. One is analog in-memory computing. The other is digital in-memory computing. In analog in-memory computing, you use the relationship between the current and the tension that you have in transistors, that you have in the memory cell, to do the computations, to do the vector matrix multiplication, which you have in all neural network.
And this is one way, right? But when you do in analog domain, you means that you have a data coming in digital data. You convert in analog. You do the computation. Then you convert it back in digital. The problem of analog in-memory computing is that there is noise in the analog domain. And then you have noise, and the noise changes the result of the calculations. Then a typically analog in-memory computing chips. They don't have high accuracy and high positions. You have to fine tune the network, fine tune the silicon to get back a decent accuracy. In we accelerate, we have this technology, but we don't use this. We use digital in-memory computing, which is different, because what we do,We don't convert. We don't do calculation in analog. We just take the estram sale. And close to us is each cells. We embedded an element, a computing element to do the multiplication. And then we have an adder tree that make the accumulations. This allow us to make calculations in the digital domain, allow us to put together the memory and the computation in a small area. This allows us also to parallelize the computation.
And then we will have a very high throughput, low cost, because it's the cheapest, small low data movement, which means low power consumptions and high precision because we stay in digital.
Q:Is in-memory computing more suitable for special-purpose computing for specific algorithms rather than general-purpose computing?
A:Definitely, you use in-memory computing. You use only to do one thing, multiplication between vector and matrix. And if you look inside the neural networks, recursive neural networks, convolutional neural networks, classic networks, transformer networks, 70 to 90 % of the calculations are just vector matrix multiplication.
And you do in-memory code you use in-memory computing to do all it is. In-memory computing can do all of this. You cannot do activation functions. You don't do this within memory computing. You just do the multiplication and the accumulations, the sums when you have to sum up the numbers. That's it. But this these calculations represent 70% to 90% of what you have in any neural network. And this is the reason why it's important to use it in AI and machine learning in deep learning.
But you don't use in-memory computing in any other domain. Because unless you have to do vector matrix multiplication.
Q:Do you think in-memory computing is a solution to break through the memory wall?
A:In-memory computing is the solution for the vector matrix multiplication, not more than this.
To break the memory wall, there are other approaches, which is near-memory computing, which is slightly different, where you have a more generic computing element, very small, and you put the memory close by.
Then instead of having a larger CPU and a larger memory, you have thousands small CPU with thousands small memories close by. I think this is the best solution to solve the memory wall, but it's not really in-memory computing, but there is near-memory computing. Because the difference is that in near-memory computing, you still have an array of memory and a computing element. While in in-memory computing, you break down the array of memory, and you put inside the array of the computing elements. You can do it all if you do multiplication accumulation. Otherwise, it's useless.
Q:Can RISC-V be part of vision of“democratization of Artificial Intelligence”?
A:Yes, it is.RISC-V is one element. In general, in accelerate, we try to keep open as much as we can, our software stack. We are using open source code. We are using TVM in the back end of the compiler, we are using the fair in the firmware, which is an open source for supported by Intel. We tried to use also one API, and we are trying to use as much as possible open source, and also to give back the community.
In accelerate, I most of the other many other guys are very active in the RISC-V communities. And then we want to give back the community. We want to develop things, create our own architecture and our own product. But still, based on open sources. But I think that when I say that we want to democratize the AI, it's also mean that we want to have a product which is powerful, usable, and low cost.
For example, if you take our solution that we design, which is a cheap of more than 200 tops, we are positioning this in already in a card at $149, because we want people to use it. We want to give the access to a powerful solution to people. There is always time to make money. But the first things is to have people to create great things using our technology. If they succeed, we succeed. Then we think that it's important to have something that's easy to be used, high performance, low cost that you can buy online, that you can get it everywhere in the world. You can just do great product around. We want to unleash innovation.
Q:As for AI accelerators, what are the advantages of using RSIC-V?
A:Well, it's the advantages that we can control it, because it's open source, we can design, we can control it. We don't have to go back to anyone and ask permission to or ask a source code of the compiler. If you use whatever IP from CAD and Synopsis, doesn't matter. You cannot access to everything you start to rely on them to. And this is can be a problem. In the long run, therefore, with RISC-V, you can just control completely your architecture. And it's a platform which is tested by a large community, which is good. And you can extend and develop it. For example, we are developing a vector instruction, a specific veterans instruction set units, which will be integrated in next generation. And we can do it by ourselves because we have the knowledge. And it's an open source platform, then we don't have to negotiate with supplier to solve the problem.
Q:Do you think AI applications will become an important driving force for the RISC-V ecosystem?
A:I think It's easier to use RISC-V in an application specific shape than in a general purpose. Because in application specific chip, you can use the RISC-V and optimize the RISC-V for what you want to do.
And then you have to verify it only for what you want to do. But if you want to use RISC-V in it as a general purpose processor, and you want to use it for to compete with a cutting edge, Intel, CPU or cutting edge, AMD CPU, then is a different story. It's a way more difficult, and it requires way more resources, way more time, because it's a new architecture. And it is not so highly verified by everybody. In the sense, when you go to complex things, you need an ecosystem around, you need the drivers, you need support from the community, from Microsoft, from you go into from Linux. In general, then it becomes more difficult. Then I think that RISC-V will grow. Now thanks to AI. And it will take still 5 to 10 years to become a real general purpose solution alternative to what you have today is still take time. It will take time to have at least five running a mobile phone.
Q:In terms of the computing efficiency, maybe the data centers has better infrastructure because they have better infrastructures. They have more constant computing power. So why do we need the Edge AI?
A:You don't need Edge AI for efficiency, as you said, is correct. the center, it's way better because you concentrate everything you can eat, especially for utilization, more than the efficiency itself. Utilization is way higher, right? In the center of sun. But you need Edge AI because of privacy, security of data, safety, economics. Think about it, if your car, you cannot have a car that is asking to the cloud, should I turn right or left? Your car needs to have the computing power to react on time without a latency, almost to whatever happening without checking with cloud. Even because in some area, you don't have even coverage.
Second of all, it doesn't make sense even from economics to send everything to cloud, think about surveillance, sophisticated surveillance system, where you have plenty of camera, high resolution camera. It's extremely expensive to think to take all these data and send to cloud, because 95% or 98% of this data is useless. Because you want to understand that you want to identify the things like, I don't know the baggage that someone drop in a railway station or the specific person that is running. And the police is looking for. For these things you don't have to know, why should you send all the data to the cloud? You can extract the right information at the edge, then it's even cheaper to do it. And still, there are plenty of in many area. There is not even coverage. Actually, you don't even have a good connection. Then there is still an infrastructure problem where you can't solve everything, sending data to the cloud, then it makes sense edge computing. It's necessary for many different application, drones, robotics, car, automotive. It makes it up and even surveillance, actually.
Q: What's the problem we need to overcome for the Edge AI solutions? You have already mentioned the power consumption, the platform maybe didn't have so much powers, maybe. And the operating conditions, the light, the latency, the cost, or the maintain abilities?
A:Yeah, I think that the obstacle, for me, it's different is that in the cloud, if you in the cloud, you have few players in China, for example, you have 234 cloud providers, the same in the United States and in Europe. Chinese and Americans are leading the cloud in terms of providers. And there are a few company building up largely the center and providing services.
Therefore, it's easy to design a technology and to provide to them, because it's you have one big customer with one set off a list of features that they need requisite and so on. But when it comes to the edge, you have 1,000 or several thousand of customers, each of them with asking different things. And many of this customer didn't have the background to understand your technology and to twist it in the way they need.
Then the problem of the edges that you need to differ. If you want edge to succeed, you need to have clearly cost effective hardware, because you need to cost effective solution, because the edge customer is more sensitive than a cloud customer. In terms of this, you need to have power efficiency, because you have constraints. You don't in the center of no constraint, you have a power plants close by. But at the edge, you have some constraints. You have to have efficient, but also usability. You need to have something that is plug and play. Customers, they don't have…… 90% of the customer of the edge. They cannot have the engineers so that Baidu can have. It's different, right? It's because they are medium, small companies.
And then you need to give them all the software stack, all the instrument to use very efficiently. But in an easy, simple way, your solution. And there is……Today, for example, at the edge, you have greater envy as a great hardware in terms of performance and platform, but it's too expensive to scale. You can't use $1,000 hardware, a probably in a small robot that you want to sell at $500, right? You can't simply.
And then I think there are solutions which are good, but expensive or there are solution that are cheap, but it is difficult to be used. And it's important to find a good compromise.
Q:So can you tell us more about the maintain ability? And how easy to use for the how important to make the Edge AI chip stay Edge AI solution is easy to use, because we know customers.
A:I can tell you, first of all, customers, they use the cloud to do everything even to train the algorithm. If you are a small medium enterprise and you want to do something in AI, you have to connect to Amazon or bite or whatever. It doesn't matter which kind of player you have to go back to the cloud system and use the the typical tools that you have in the cloud. Where do you get out from it? It’s a network, a training network and applications.
Then the problem is how to use this in the edge. Then if we as excel and we have to give customer a simple, softer stack, which allowed them to take what they did in the cloud and run it at the edge. You have to be sure that customer didn't know what is quantization in the cloud, but in the edge, 90% of 95% of the customer, they don't know what was the difference between floating . 32 and intake. They don't care. We have to solve that problem. They should do whatever they want in the cloud. And then we have to give them the tools to use the same application or the same network of the edge. Then an edge provider needs to build up a softer stack which allow customer to use what they are using today, but deployed at the edge. Company like Ccash, we should be responsible of the deployment, not of the development because customer, they don't want to learn a new things.
If you go to customer and you say, listen, I have a great hardware, but you have to learn my software. They will say, no, I don't have time. I don't want. What should I do it? You have to go to them and say, listen, I have a great hardware and software stack. What you have to do is just take what you have. Push button. I any runs or take what you have, do this abc any runs. It should be very simple. And this is the key aspect that a lot of companies, I think they don't think about it. They think that it's important to be efficient. Yes, the efficiency is important, but it's not only that you need a mix of things, efficiency, throughput cost, and the software stack, even because customer cares about the total cost of ownership, if you go to customer and say, listen, with my cheap, you save, I don't know, 300,000 euro per year, but the customer to change the software need to spend 1 million, then they will not do it simply. Then you have to think at the picture at a big level the implication.
Q:They so is data center more client to use the general purpose, ai computing power. While the edge AI chips may be going the other ways they will design for the pacific use case. And all the customize is something like that.
A:If you go to consumer edge, it’s super customized, because in a television, your television is edge. You have a super solution, and you have a lot of features that are AI-generated, and then it’s super customized. SOC has to do all a few things in a very specific way. It has to be low power consumption, because the television must be at low power consumption. It cannot have a fan or a computer running inside. Then it's highly customized. In the phone it’s the same. In the phone, if super customized is battery power, then probably you don't run floating point network, you run binary networks, and it's good enough, because the customers are not really sensible. When you go to automation, you have to go to find a good compromise, because you have still limitation in power sometimes, but you can compromise and say, I use this net for binary, because probably if using automation and network, you should have high accuracy.
And then you have to find a good compromise between efficiency, throughput, accuracy, then having some limitation of the edge, but still try to look for the precision that you have in cloud computing, then it’s still customized, but it's different, it’s more programmable solution.
Then when you go to cloud, as you said, in a cloud, you have everything. But in a cloud, if you see, there is more and more specialization. The difference is that essentially the crowd in data center, you start to have more and more specialized machine for specialized workload. Because even there is a need of efficiency, not like at the edge, but still it's necessary. At the edge, you try to get 15 Tops per watt, 20, 30, whatever. In the cloud today, the workloads are running at the 0.1 Tops per watt even less. Because if you take a general computing a platform, it's very low deficiency. And then even in the data center, you see the trend to have the trends to have the tensor processing unit, GPU, CPUs, etc. It's a kind of trend and based on the workload, they start to allocate to the different hardware. Then I see trends in the data center to do it.
Q:Does the fragment to the product demand of the ai indicated that it's less likely be model class by large companies, such as the Nvidia, AMD, centralized that. And the small company might have more opportunities in this area.
A:Yes, absolutely. It's traditionally, it's like this. If you think about cloud computing, we always, in the last 20, 30 years, you had always Intel, AMD, Nvidia more recently, then you have and you still have actually 2,3 players that are dominating 98% of the cloud, and a very small portion for new startup or other players.
But if you got at the edge, historically, you have plenty of players, because you have still Intel, AMD, Nvidia, Qualcomm, NXP, Texas Instrument, Renesas, ST Microelectronics, Infineon, and MediaTek, Cirrus Logic, Umbrella Silicone. I can go on, right? You have a lot of players, because as you said, the edge is more specialized, you have plenty of applications. It's very fragmented, and the big players, they don