In the first part of this post (http://www.definethecloud.net/inter-fabric-traffic-in-ucs) I discuss server traffic flows within a UCS system focusing on End-Host mode (EH mode.) EH mode is the default and recommended mode for the majority of UCS implementations, but the system can also be used in ‘Switch mode’ which causes the Fabric Interconnects to operate as standard L2 switches. This post will focus on server-to-server communication in switched mode. For more information on when/where to use switch mode and the recommended upstream connectivity options see Brad Hedlund’s post in HD video: http://bradhedlund.com/2010/06/22/cisco-ucs-networking-best-practices/ and the white paper he co-authored on the subject: http://bradhedlund.com/2010/12/01/cisco-nexus-7000-connectivity-solutions-for-cisco-ucs/. Both of these are must reads for anyone designing UCS solutions as well as great traffic flow information on the Nexus 7000 as a whole.
Note: Remember that switched-mode is rarely recommended and has become less important with the 12/2010 release of UCSM 1.4 which allows for ‘Appliance’ and ‘Storage’ ports in EH Mode. For more information on the new port types see Dave Alexander’s post on the new feature: http://www.unifiedcomputingblog.com/?p=187.
The only time I would recommend using switch mode is when locally switch traffic is required within the UCS system itself. Lets take a quick look at the typical UCS connectivity diagram:
In the diagram above we see the basic connectivity for a UCS system. In the default EH mode the only connections supported between the Fabric interconnects are the cluster links shown which do not carry data traffic. This means that all switching from Fabric A to Fabric B must traverse the uplinks and be handled by an upstream device.
When the Fabric interconnects are moved into Switched Mode data links are now supported between Fabric Interconnect A and B. Lets take a look at how this works.
The only change in the above drawing is that I’ve replaced the cluster links with 10GE port-channel carrying data traffic. In this diagram and the following I have removed the cluster links for visual clarity, but they would still be required and follow the same rules as in EH mode. With port-channel in place it is possible for data to be switched between the Fabric Interconnects, there are however some considerations.
When the Fabric Interconnects are placed in Switch Mode they begin to operate as traditional switches, this includes participating in Spanning-Tree Protocol (STP) for loop avoidance. In the case of UCS the STP protocol used is Per-VLAN Rapid Spanning Tree+ (PVRST+.) PVRST+ is a faster converging version of traditional spanning tree that operates independently on a per VLAN basis. PVRST+ is commonly used, standards based and backward compatible with other STP versions. With Switch Mode running the Fabric Interconnects will send and receive Bridge Protocol Data Units (BPDU) and block ports based on the network topology information in those BPDUs. Now let’s take a closer look at how this looks within UCS.
In the diagram above we can see that moving to switch mode and connecting the Fabric Interconnects together via data links we’ve created loops which will have to be closed by STP. STP utilizes a loop avoidance algorithm based on a root bridge which acts as the base of the network topology and a loop free branch topology is built providing each ‘leaf’ one path to the root by blocking redundant links.
Best practices dictate that the root bridge be manually configured as a highly available switch typically in the aggregation or core layer, for performance and stability reasons. With this in mind we can assume that the UCS Fabric Interconnects will not be the STP root bridge. This means that our UCS network diagram will look similar to the following diagram.
In the above diagram we can see an example where the top left upstream switch is the root for a given VLAN. In order to avoid looped behavior STP will block the links between the two Fabric Interconnects. This means that no traffic will pass between the Fabric Interconnects for that VLAN and all Fabric A to B communication will be passed upstream the same way it would in EH mode. This will occur for any VLANs that exist in both the upstream network and the Fabric Interconnects. This behavior would be the same regardless of which upstream switch were acting as the Root Bridge for that VLAN. This means that even in switch mode there is no Fabric A to Fabric B switching handled locally for common VLANs. Common VLANs is the key phrase in that sentence, let’s take a look at how we get Fabric A to B switching handled locally.
In the above diagram we see that VLAN 10 is common upstream and on the Fabric Interconnects. Assuming best practices are in place VLAN 10’s Root Bridge will be upstream and therefore VLAN 10 will be blocked on the link between Fabric Interconnects. VLAN 20 on the other hand only exists within the UCS system and therefore there is no loop in place or requirement for blocked links. For VLAN 20 one of the Fabric interconnects will operate as the root bridge and traffic will be forwarded across the links. This UCS only VLAN can be used for server to server communication, one example is depicted in the following diagram.
In the example above we see web access on VLAN 10 incoming from the upstream network. This VLAN is blocked across the connections between Fabric Interconnects because it is common upstream and within UCS. VLAN 20 is used for the web servers to access the database servers and is only needed within the UCS system. Because this VLAN only exists internally traffic is forwarded across the links between Fabric interconnects and is switched locally. Local VLANs can be used to utilize the high-bandwidth low-latency of the UCS switching system for server to server communication.
Utilizing switched mode enables inter-fabric traffic to be switched locally but there are design considerations that must be addressed. When deciding between modes for these purposes remember that UCS is a very low-latency system with sub 7us switching latency which means that with the appropriate switch hardware upstream total latency for round-trip inter-fabric traffic will still be below 40-50us or faster depending on hardware. Also remember that intra-fabric traffic (A to A or B to B) is always switched locally sub 7us regardless of mode. In most cases it is best to design your applications to utilize the same fabric if they communicate frequently rather than designing a switched mode solution.