Loading...
「ツール」は右上に移動しました。
15いいね 1072回再生

An FPGA platform for Reconfigurable Heterogeneous HPC and Cloud Computing

Bernard Metzler, IBM Research Zurich
Dr. Bernard Metzler is a Principal Research Staff Member and Technical Leader at IBM Zurich Research Laboratory. His main research interests are in enhancing network and storage IO of distributed systems, and the integration of modern high performance IO hardware with distributed applications. He contributes to the design, standardization and implementation of IO subsystem components, such as network protocols, storage stacks, and APIs. In this role he authors several publications in major scientific conferences and journals, as well as an IETF RFC. Bernard substantially contributed to the BlueGene Active Storage project, providing an RDMA-based communication subsystem and the integration of non-volatile memory into a novel heterogenous memory hierarchy on BG/Q supercomputers. Bernard is an open source evangelist, contributing to several open source projects, including Apache Crail (incubating) and the Linux kernel, where he contributed and now actively maintains the SoftiWarp RDMA driver. Bernard represents IBM at the Board of Directors of the OpenFabrics Alliance. Bernard received his PhD from the University of Braunschweig, Germany.

Energy and compute efficient devices such as GPUs and Field-Programmable Gate Arrays (FPGAs) are entering modern data centers (DCs) and HPC clusters to address the end of Dennard scaling and the wind down of Moore's law. While these heterogeneous devices seem to be a natural extension for current compute servers, their traditional bus-attachment limits the number of accelerators that can be deployed per server. With cloudFPGA, we introduce a new platform that decouples the reconfigurable accelerators from the CPU of the server by connecting the FPGAs directly to the DC network. This approach turns the FPGAs into a disaggregated standalone computing resources that can be deployed at large scale in DCs. Each FPGA implements an integrated 10GbE network controller interface (NIC) with TCP/IP offload engine. This deployment of network-attached FPGAs is particularly cost- and energy-efficient as the number of spread-out FPGAs becomes independent of the number of servers. During the presentation, we describe the cloudFPGA platform which integrates up to 1024 FPGAs per DC rack. We focus on the networking aspects of the architecture, the development toolchain required to seamlessly migrate today's workload, and we discuss the potential applicability of standalone network-attached FPGAs for different HPC and Cloud Computing workloads.