Validate Hyperscale Data Center Network Designs
Hyperscale or warehouse-scale data centers have unique requirements to support a very large number of servers. As concluded in recent RFC 7938, using BGP and L3/IP CLOS topology (a.k.a. spine and leaf networks) is the most preferable solution to scale “horizontally” and has uniform design to meet the requirements. This paper will highlight how to use the IxNetwork test solution to validate key network designs that are used to build hyperscale data centers.
We designed the network and test topology as follows:
- 4 spine switches (white box + NOS or brand switches)
- 1 leaf switch (could be a white box+NOS or a brand switch) with 4 x 100G Ethernet connected with spine switches
- IxNetwork test system to emulate a leaf switch L2 (with 4 x 100G ports connected with spines), a rack of simulated servers R2 behind L2, and a rack of servers R1 behind L1 with 100G connected with L1
- Each spine and leaf was configured as EBGP for IP routing and equal cost multi path (ECMP) (4 paths)
BGP RIB/FIB Convergence Test per RFC 7747
This test is designed to measure the time it takes for the system under test (SUT) (L1 and S1 through S4) to install BGP routes advertised by L2 (thousands of IPv4 and IPv6 hosts by Ixia IxNetwork BGP emulation) and starting to forward packets from L1 to L2. The test methodology has been recommended by the IETF Benchmark Working Group (BMWG) in RFC 7747 section 5.1.1 to benchmark BGP RIB and FIB performance. With the integrated BGP emulation and traffic generator/analyzer of IxNetwork, we can easily emulate a leaf switch with ECMP links and thousands of IP hosts on server racks.
ECMP Failover Performance Test
Using the same test topology, we can force one of the IxNetwork test ports (P2 to P5) to simulate link disconnect. That will cause the L1 switch to rebalance the traffic over the remaining active ECMP ports. IxNetwork is capable of measuring the duration between the time of link disconnect and the time of traffic converged on remaining ECMP ports. This time duration (failover convergence time) is a vital performance indicator for benchmarking ECMP implementations.
ECMP Failover Resiliency Analysis
After the link disconnect/failure, IxNetwork can also analyze how the impacted traffic flows have been redistributed to the remaining ECMP links. Once the link failure is restored, the IxNetwork can further analyze how the traffic flows have been restored. Ideally, we want to observe the minimum traffic flows redistribution during ECMP failover events. The less churns on flow distribution, the higher resiliency and faster convergence can be achieved on ECMP performance.
Explicit Congestion Notification (ECN) Performance Test
The ECN, as defined in RFC 3168, is the key enabler to avoid TCP retransmission by notifying the traffic source when the queue of switching devices is over the threshold. With IxNetwork’s unique traffic flow ingress/egress analysis, we can design a test topology as depicted above. The traffic source is controlled by constant loading and single-burst packet sending on P1 and P2 test ports. The egress port 3 will be overloaded and resulted in congestion. IxNetwork ingress/egress flow analysis can be deployed on port 3 to observe the ECN bits marking when congestion happens at the device under test (DUT). By controlling how many packets burst above full line rate of egress port 3, we can measure the mechanism and effectiveness of the DUT’s queue management and ECN remarking.