Research paper presented at the SC21 conference (St Louis, USA and online) Abstract The allreduce operation is one of the most commonly used communication routines in distributed applications. To improve its bandwidth and to reduce network traffic, this operation can be accelerated by offloading it to network switches, that aggregate the data received from the hosts, and […]
Flare: flexible in-network allreduce
