Edited by Claire Chen (BULL/Eviden) and Pascale Bernier-Bruna (BULL/Eviden)
Executive summary
The upcoming generation of Exascale systems is heavily reliant on a streamlined network infrastructure. This network must be capable of accommodating massively parallel processing systems, consisting of hundreds of thousands of nodes and millions of cores. It should offer a range of functionalities to enable applications to scale effectively at Exascale and beyond, while also being adaptable for power-efficient accelerators and computing units. Furthermore, it should support a wide array of prevalent and emerging data-centric and AI-driven applications
In order to enable Exascale computing, next generation interconnection networks must scale to hundreds of thousands of nodes, and must provide features to also allow the HPC, HPDA, and AI applications to reach Exascale, while benefiting from new hardware and software trends.
The RED-SEA consortium has been dedicated to address the goal by leveraging key European expertise and background, including BullSequana eXascale Interconnect (BXI), the key production-proven European Interconnect, as well as results from a number of EU-funded projects on interconnects and HPC systems.
The RED-SEA project has been actively engaged in various aspects of European Exascale interconnect technologies to facilitate the development of the next generation of European Exascale interconnects, which includes preparations in the BXI technology. Specifically, the project has addressed the following aspects:
- Specification of the new architecture through hardware-software co-design, focusing on a set of representative applications from the realms of HPC, HPDA, and AI.
- Testing, evaluation, and implementation of new architectural features across multiple levels, including mathematical analysis, modelling, simulation, and FPGA-based implementations.
- Development of a high-performance, low-latency Ethernet gateway to facilitate seamless communication within and between resource clusters.
- Implementation of efficient network resource management to enhance congestion control, virtualisation, adaptive routing, and collective operations.
- Exploration of the BXI ecosystem to accommodate a variety of applications and hardware, with improvements in end-to-end network services such as programming models, reliability, security, low-latency, and support for new processors.
- Utilisation of open standards and compatible APIs to create innovative reusable libraries and solutions for Fabrics management.
After 36 months project lifespan, RED-SEA has successfully attained all objectives outlined in the Description of Actions (DoA). This report endeavours to outline the main scientific and technological results of the project. One significant outcome lies in the advancement of the European Interconnect network BXI, particularly in improving the current version (BXIv2) and preparing for its next generation (BXIv3). Another achievement of the project is its contribution to new, efficient network resource management schemes. These advancements enhance congestion control, virtualisation, adaptive routing, and collective operations. Additionally, the project has extended the BXI ecosystem, aiming to expand the applicability of the interconnect to various applications and hardware. Moreover, enhancing the tools and simulators chosen by the RED-SEA project is crucial for these achievements. Finally, several additional developments have been undertaken to enhance European technology Intellectual Properties (IPs) within the field of interconnect networks. These results will be detailed in the paragraph 3 of this document.
The current document represents one of the latest documents from the RED-SEA project. Its aim is to present the project’s main outcomes in a manner easily understandable to the public, with a focus on exploitation and impact aspects. For a comprehensive understanding of the project’s overall results, readers are encouraged to review deliverable D1.4 titled “Report on holistic evaluation of RED-SEA network technologies”.
Beginning with a summary of the project objectives in paragraph 2, it proceeds to describe the main results in paragraph 3. Following this, paragraph 4 delves into the impact generated by the project, along with general findings and lessons learned.