Performance Benchmark on Disaggregated Networks
Disaggregated networking, where hardware and software are separated, provides significant economies of scale and an open architecture. But is it too good to be true? What are the pros and cons and potential trade-offs?
Traditionally, network switch software came bundled and on the hardware, quality-tested, and supported by a single-vendor. Since disaggregated switch hardware and operating software could come from different vendors, organizations implementing disaggregated networking need to perform performance benchmarking to ensure it all works together as it should.
In this blog, I will apply benchmark test methodologies to qualify the performance of the IP routing information base (RIB) and forwarding information base (FIB) against a white box switch with a couple of network operating systems (NOSs).
I asked our system QA group to build an IP CLOS data center network with white box switches and a couple of popular NOSs so we could benchmark the differences.
We designed the network and test topology as follows:
- 4 spine switches (white box + NOS)
- 1 leaf switch (white box + NOS) with 4 x 100G Ethernet connected with spine switches
- IxNetwork chassis to emulate a leaf switch L2 (with 4 x 100G ports connected with spines), a rack of simulated servers R2 behind L2, and a rack of servers R1 behind L1 with 100G connected with L1
- Each spine and leaf was configured as EBGP for IP routing and equal cost multi path (ECMP) (4 paths)
I would typically run through the RFC 2544/2889 to qualify the switch fabric performance, but decided to stress the IP CLOS RIB/FIB performance first. I picked the test methodology defined in IETF RFC 7747 section 5.1 RIB-IN convergence. This test case was designed to characterize how quickly BGP routers install routes in RIB and push down to forwarding fabric FIB.
The test starts by pumping traffic (from P1 to L1) toward a set of BGP routes advertise by L2 (emulated by IxNetwork). The time to advertise BGP routes was recorded, and the time traffic converged at the destination ports was recorded by the IxNetwork test ports.
With these two timestamps, we can understand how efficient an IP CLOS network can converge on new routes and the switch fabric accurately forwards traffic to the destination with no packet loss.
Since this is CLOS topology (spine and leaf) with ECMP, we need to consider that the source traffic (100G from P1) will be load-balanced across 4 different paths. A successful convergence should be confirmed by the total throughput across all ECMP ports with no packets dropped.
Because this was a performance benchmark test, we now needed to define a comparable baseline:
- We tested with 4,000 and 8,000 IPv4 routes with /31 mask.
- Each test ran 3 iterations to obtain average number and also ensure the SUT (system under test) could recover reliably after each iteration.
- Lastly, we installed different NOSs in L1.
Conclusion for RIB-IN Convergence Test
Disaggregated networks provide flexibility in feature add-on/update and no vendor lock-in, however, the performance could vary depending on the combination of hardware and software selection. Benchmark test methodology is crucial to identify the deviation in performance and provide comparable metrics for end users. Please stay tuned for my upcoming blogs on fabric forwarding performance testing (RFC 2544/2889) and BGP ECMP failover testing (RFC 7747 section 5.2).
This black book explains the test methodology required to fully qualify hyperscale network infrastructure based on disaggregated design.