Network interconnects play an enabling role in HPC systems – and this will be even truer for the coming Exascale systems that will rely on higher node counts and increased use of parallelism and communication. Moreover, next-generation HPC and data-driven systems will be powered by heterogeneous computing devices, including low-power Arm and RISC-V processors, high-end CPUs, vector acceleration units and GPUs suitable for massive single-instruction multiple-data (SIMD) workloads, as well as FPGA and ASIC designs tailored for extremely power-efficient custom codes.
These compute units will be surrounded by distributed, heterogeneous (often deep) memory hierarchies, including high-bandwidth memories and fast devices offering microsecond-level access time. At the same time, modern data-parallel processing units such as GPUs and vector accelerators can crunch data at amazing rates (tens of TFLOPS). In this landscape, the network may well become the next big bottleneck, similar to memory in single node systems.
RED-SEA prepares a new-generation network interconnect to power the future EU Exascale systems
RED-SEA will build upon the European interconnect BXI (BullSequana eXascale Interconnect), together with standard and mature technology (Ethernet) and previous EU-funded initiatives to provide a competitive and efficient network solution for the exascale era and beyond. This involves developing the key IPs and the software environment that will deliver:
- scalability, while maintaining an acceptable total cost of ownership and power efficiency;
- virtualization and security, to allow various applications to efficiently and safely share an HPC system;
- Quality-of-service and congestion management to make it possible to share the platform among users and applications with different demands;
- reliability at scale, because fault tolerance is a key concern in a system with a very large number of components;
- support of high-bandwidth low-latency HPC Ethernet, as HPC systems increasingly need to interact securely with the outside world, including public clouds, edge servers or third party HPC systems;
- support of heterogeneous programming model and runtimes to facilitate the convergence of HPC and HPDA;
- support for low-power processors and accelerators.
RED-SEA in the Modular Supercomputing Architecture
RED-SEA supports the Modular Supercomputing Architecture (MSA) that underpins all of the SEA projects. In the MSA, BXI is the HPC fabric within each compute module, delivering low-latency, high bandwidth and all required HPC features, whereas Ethernet is the high-performance federative network that offers interface to storage and with other compute modules. RED-SEA will design a seamless interface between BXI and Ethernet via a new Gateway solution.
Enable the design of a new generation of high performance network interconnect
- Leveraging existing European technology (BXI, Exanest …)
- Able to power the future EU Exascale systems
Explore new innovative solutions
End-to-end network services – from programming models to reliability, security, low latency, and new processors
Develop the ecosystem and create a broader community of users and developers
Leveraging open standard and compatible API to develop innovative re-useable libraries and Fabrics management solutions
The RED-SEA network architecture
The RED-SEA project is coordinated by Atos, and brings together the top European academic centres and the key European industrial forces in the domain of interconnect networks. The RED-SEA consortium gathers 12 partners from 6 countries.
|Project name||Network Solution for Exascale Architectures|
|Coordinator||Atos (Bull SAS)|
|Total budget||€ 7 993 710|
|EU funding||€ 3 996 855,01|
- D2.3 RoCE and IPoverBXI Evaluation ReportEdited by Nikolaos D. Kallimanis (FORTH), Gregoire Pichon (Atos) Authors Giorgos Saloustros (FORTH), Nikolaos D. Kallimanis (FORTH), Nikolaos Chrysos (FORTH), Jonathan Espié Caullet (Atos), Sylvain Goudeau (Atos), Grégoire Pichon (Atos) Executive summary In the RED-SEA […]
- D4.3 Planned MPI-related optimizationsEdited by Hugo Taboada (CEA) Authors Gilles Moreau (CEA), Hugo Taboada (CEA), Marc Pérache (CEA), Simon Pickartz (ParTec), Carsten Clauss (ParTec) Executive summary This document presents the two contributions from Commissariat à l’énergie atomique et […]
- RED-SEA: Network Solution for Exascale ArchitecturesThis paper was accepted for the special session: “European Projects in Digital Systems Design (EPDSD) at the 25th Euromicro Conference on Digital System Design (DSD) […]
- Influence of Network Performance Variability on Application ScalabilityThis research paper prepared by ETH Zürich was published in the journal “Proceedings of the ACM on Measurement and Analysis of Computing Systems“. Abstract Cloud […]
- Building blocks for network-accelerated distributed file systemsResearch paper by our partner ETHZ, accepted at the SC22 Conference that took place online from 14 to 18 November 2022 in Dallas, TX, USA. This paper […]
- NeVerMore: Exploiting RDMA Mistakes in NVMe-oF Storage ApplicationsResearch paper by our partner ETHZ, accepted at the ACM Conference on Computer and Communications Security (CCS) that took place from 7 to 11 November […]
- Lifting C semantics for dataflow optimizationResearch paper by our partner ETHZ, accepted at the ICS ’22 International Conference on Supercomputing that took place online from 28 to 30 June 2022. […]
- KafkaDirect: Zero-copy Data Access for Apache Kafka over RDMA NetworksResearch paper presented at the ACM SIGMOD/PODS Conference that took place in Philadelphia from 12 to 17 June 2022. Abstract Apache Kafka is an open-source […]
- Optimized Page Fault Handling During RDMAThis research paper was accepted by the journal IEEE Transactions on Parallel and Distributed Systems and will be published in vol. 33, no. 12, pp. […]
- A RDMA Interface for Ultra-Fast Ultrasound Data-Streaming over an Optical LinkResearch paper presented at the DATE 2022 conference (Design, Automation and Test in Europe conference) that took place online from 14 to 23 March 2022. […]
- Flare: flexible in-network allreduceResearch paper presented at the SC21 conference (St Louis, USA and online) Abstract The allreduce operation is one of the most commonly used communication routines in distributed […]
- A RISC-V in-network accelerator for flexible high-performance low-power packet processingResearch paper presented at the ISCA 2021 conference (online) Abstract The capacity of offloading data and control tasks to the network is becoming increasingly important, […]