This paper was accepted at IPDPS 2023, the 37th IEEE International Parallel and Distributed Processing Symposium that was held from 15 to 19 May in Saint Petersburg, Florida, USA. This paper was prepared by ETH Zürich in collaboration with Microsoft.
Abstract
High performance is needed in many computing systems, from batch-managed supercomputers to general-purpose cloud platforms. However, scientific clusters lack elastic parallelism, while clouds cannot offer competitive costs for highperformance applications. In this work, we investigate how modern cloud programming paradigms can bring the elasticity needed to allocate idle resources, decreasing computation costs
and improving overall data center efficiency. Function-as-aService (FaaS) brings the pay-as-you-go execution of stateless functions, but its performance characteristics cannot match coarse-grained cloud and cluster allocations. To make serverless computing viable for high-performance and latency-sensitive applications, we present rFaaS, an RDMA-accelerated FaaS platform. We identify critical limitations of serverless – centralized
scheduling and inefficient network transport – and improve the FaaS architecture with allocation leases and microsecond invocations. We show that our remote functions add only negligible overhead on top of the fastest available networks, and we decrease the execution latency by orders of magnitude compared to contemporary FaaS systems. Furthermore, we
demonstrate the performance of rFaaS by evaluating real-world FaaS benchmarks and parallel applications. Overall, our results show that new allocation policies and remote memory access help FaaS applications achieve high performance and bring serverless computing to HPC.
Authors
Marcin Copik (ETH Zürich), Konstantin Taranov (Microsoft), Alexandru Calotoiu (ETH Zürich), Torsten Hoefler (ETH Zürich)
DOI: 10.1109/IPDPS54959.2023.00094
The software prototype, data, analysis scripts, and replication scripts for this paper are available from Zenodo>>