SDN – Centralized Network Command and Control

Software Defined Networking (SDN) is a hot topic in the data center and cloud community.  The geniuses <sarcasm> over at IDC predict a $2 billion market by 2016 (expect this number to change often between now and then, and look closely at what they count in the cost.) The concept has the potential to shake up the networking business as a whole (http://www.networkcomputing.com/next-gen-network-tech-center/240001372) and has both commercial and open source products being developed and shipping, but what is it, and why?

Let’s start with the why by taking a look at how traditional networking occurs.

 

Traditional Network Architecture:

 

image

 

The most important thing to notice in the graphic above is the separate control and data planes.  Each plane has separate tasks that provide the overall switching/routing functionality.  The control plane is responsible for configuration of the device and programming the paths that will be used for data flows.  When you are managing a switch you are interacting with the control plane.  Things like route tables and Spanning-Tree Protocol (STP) are calculated in the control plane.   This is done by accepting information frames such as BPDUs or Hello messages and processing them to determine available paths.  Once these paths have been determined they are pushed down to the data plane and typically stored in hardware.  The data lane then typically makes path decisions in hardware based on the latest information provided by the control plane.  This has traditionally been a very effective method.  The hardware decision making process is very fast, reducing overall latency while the control plane itself can handle the heavier processing and configuration requirements.

This method is not without problems, the one we will focus on is scalability.  In order to demonstrate the scalability issue I find it easiest to use Quality of Service (QoS) as an example.  QoS allows forwarding priority to be given to specific frames for scheduling purposes based on characteristics in those frames.  This allows network traffic to receive appropriate treatment in times of congestion.  For instance latency sensitive voice and video traffic is typically engineered for high priority to ensure the best user experience.  Traffic prioritization is typically based on tags in the frame known as Class of Service (CoS) and or Differentiated Services Code Point (DSCP.)  These tags must be marked consistently for frames entering the network and rules must then be applied consistently for their treatment on the network. This becomes cumbersome in a traditional multi-switch network because the configuration must be duplicated in some fashion on each individual switching device.

An easier example of the current administrative challenges consider each port in the network a management point, meaning each port must be individually configured.  This is both time consuming and cumbersome.

Additional challenges exist in properly classifying data and routing traffic.  A fantastic example of this would be two different traffic types, iSCSI and voice.  iSCSI is storage traffic and typically a full size packet or even jumbo frame while voice data is typically transmitted in a very small packet.  Additionally they have different requirements, voice is very latency sensitive in order to maintain call quality, while iSCSI is less latency sensitive but will benefit from more bandwidth.  Traditional networks have few if any tools to differentiate these traffic types and send them down separate paths which are beneficial to both types.

These types of issues are what SDN looks to solve.

The Three Key Elements of SDN:

  • Ability to manage the forwarding of frames/packets and apply policy
  • Ability to perform this at scale in a dynamic fashion
  • Ability to be programmed

Note: In order to qualify as SDN an architecture does not have to be Open, standard, interoperable, etc.  A proprietary architecture can meet the definition and provide the same benefits.  This blog does not argue for or against either open or proprietary architectures.

An SDN architecture must be able to manipulate frame and packet flows through the network at large scale, and do so in a programmable fashion.  The hardware plumbing of an SDN will typically be designed as a converged (capable of carrying all data types including desired forms of storage traffic) mesh of large lower latency pipes commonly called a fabric.  The SDN architecture itself will in turn provide a network wide view and the ability to manage the network and network flows centrally.

This architecture is accomplished by separating the control plane from the data plane devices and providing a programmable interface for that separated control plane.  The data plane devices receive forwarding rules from the separated control plane and apply those rules in hardware ASICs.  These ASICs can be either commodity switching ASICs or customized silicone depending on the functionality and performance aspects required.  The diagram below depicts this relationship:

image

In this model the SDN controller provides the control plane and the data plane is comprised of hardware switching devices.  These devices can either be new hardware devices or existing hardware devices with specialized firmware.  This will depend on vendor, and deployment model.  One major advantage that is clearly shown in this example is the visibility provided to the control plane.  Rather than each individual data plane device relying on advertisements from other devices to build it’s view of the network topology a single control plane device has a view of the entire network.  This provides a platform from which advanced routing, security, and quality decisions can be made, hence the need for programmability.  Another major capability that can be drawn from this centralized control is visibility.  With a centralized controller device it is much easier to gain usable data about real time flows on the network, and make decisions (automated or manual) based on that data.

This diagram only shows a portion of the picture as it is focused on physical infrastructure and serves.  Another major benefit is the integration of virtual server environments into SDN networks.  This allows centralized management of consistent policies for both virtual and physical resources.  Integrating a virtual network is done by having a Virtual Ethernet Bridge (VEB) in the hypervisor that can be controlled by an SDN controller.  The diagram below depicts this:

image

This diagram more clearly depicts the integration between virtual networking systems and physical networking systems in order to have cohesive consistent control of the network.  This plays a more important role as virtual workloads migrate.  Because both the virtual and physical data planes are managed centrally by the control plane when a VM migration happens it’s network configuration can move with it regardless of destination in the fabric.  This is a key benefit for policy enforcement in virtualized environments because more granular controls can be placed on the VM itself as an individual port and those controls stick with the VM throughout the environment.

Note: These diagrams are a generalized depiction of an SDN architecture.  Methods other than a single separated controller could be used, but this is the more common concept.

With the system in place to have centralized command and control of the network through SDN and a programmable interface more intelligent processes can now be added to handle complex systems.  Real time decisions can be made for the purposes of traffic optimization, security, outage, or maintenance.  Separate traffic types can be run side by side while receiving different paths and forwarding that can respond dynamically to network changes.

Summary:

Software Defined Networking has the potential to disrupt the networking market and move us past the days of the switch/router jockey.  This shift will provide extreme benefits in the form of flexibility, scalability and traffic performance for datacenter networks.  While all of the aspects are not yet defined SDN projects such as OpenFlow (www.openflow.org) provide the tools to begin testing and developing SDN architectures on supported hardware.  Expect to see lots of changes in this eco system and many flavors in the vendor offerings.

GD Star Rating
loading...

Inter-Fabric Traffic in UCS

It’s been a while since my last post, time sure flies when you’re bouncing all over the place busy as hell.  I’ve been invited to Tech Field Day next week and need to get back in the swing of things so here goes.

In order for Cisco’s Unified Computing System (UCS) to provide the benefits, interoperability and management simplicity it does, the networking infrastructure is handled in a unique fashion.  This post will take a look at that unique setup and point out some considerations to focus on when designing UCS application systems.  Because Fibre Channel traffic is designed to be utilized with separate physical fabrics exactly as UCS does this post will focus on Ethernet traffic only.   This post focuses on End Host mode, for the second art of this post focusing on switch mode use this link: http://www.definethecloud.net/inter-fabric-traffic-in-ucspart-ii.  Let’s start with taking a look at how this is accomplished:

UCS Connectivity

image

In the diagram above we see both UCS rack-mount and blade servers connected to a pair of UCS Fabric Interconnects which handle the switching and management of UCS systems.  The rack-mount servers are shown connected to Nexus 2232s which are nothing more than remote line-cards of the fabric interconnects known as Fabric Extenders.  Fabric Extenders provide a localized connectivity point (10GE/FCoE in this case) without expanding management points by adding a switch.  Not shown in this diagram are the I/O Modules (IOM) in the back of the UCS chassis.  These devices act in the same way as the Nexus 2232 meaning they extend the Fabric Interconnects without adding management or switches.  Next let’s look at a logical diagram of the connectivity within UCS.

UCS Logical Connectivity

imageIn the last diagram we see several important things to note about UCS Ethernet networking:

  • UCS is a Layer 2 system meaning only Ethernet switching is provided within UCS.  This means that any routing (L3 decisions) must occur upstream.
  • All switching occurs at the Fabric Interconnect level.  This means that all frame forwarding decisions are made on the Fabric Interconnect and no intra-chassis switching occurs.
  • The only connectivity between Fabric Interconnects is the cluster links.  Both Interconnects are active from a switching perspective but the management system known as UCS Manger (UCSM) is an Active/Standby clustered application.  This clustering occurs across these links.  These links do not carry data traffic which means that there is no inter-fabric communication within the UCS system and A to B traffic must be handled upstream.

At first glance handling all switching at the Fabric Interconnect level looks as though it would add latency (inter-blade traffic must be forwarded up to the fabric interconnects then back to the blade chassis.)  While this is true, UCS hardware is designed for low latency environments such as High Performance Computing (HPC.)  Because of this design goal all components operate at very low latency.  The Fabric Interconnects themselves operate at approximately 3.2us (micro seconds), and the Fabric Extenders operate at about 1.5us.  This means total roundtrip time blade to blade is approximately 6.2us right inline or lower than most Access Layer solutions.  Equally as important with this design switching between any two blades/servers in the system will occur at the same speed regardless of location (consistent predictable latency.)

The question then becomes how is traffic between fabrics handled?  The answer is that traffic between fabrics must be handled upstream (next hop device(s) shown in the diagrams as the LAN cloud.)  This is an important consideration when designing UCS implementations and selecting a redundancy/load-balancing behavior for server NICs.

Let’s take a look at two examples, first a bare-metal OS (Windows, Linux, etc.) next a VMware server.

Bare-Metal Operating System

image In the diagram above we see two blades which have been configured in an active/passive NIC teaming configuration using separate fabrics (within UCS this is done within the service profile.)  This means that blade 1 is using Fabric A as a primary path with B available for failover and blade 2 is doing the opposite.  In this scenario any traffic sent from blade 1 to blade 2 would have to be handled by the upstream device depicted by the LAN cloud.  This is not necessarily an issue for the occasional frame but will impact performance for servers that communicate frequently.

Recommendation:

For bare-metal operating systems analyze the blade to blade communication requirements and ensure chatty server to server applications are utilizing the same fabric as a primary:

  • When using a card that supports hardware failover provide only one vNIC (made redundant through HW failover) and place its primary path on the same fabric as any other servers that communicate frequently.
  • When using cards that don’t support HW failover use active/passive NIC teaming and ensure that the active side is set to the same fabric for servers that communicate frequently.

VMware Servers

image

In the above diagram we see that the connectivity is the same from a physical perspective but in this case we are using VMware as the operating system.  In this case a vSwitch, vDS, or Cisco Nexus 1000v will be used to connect the VMs within the Hypervisor.  Regardless of VMware switching option the case will be the same.  It is necessary to properly design the the virtual switching environment to ensure that server to server communication is handled in the most efficient way possible.

Recommendation:

  • For half-width blades requiring 10GE or less total throughput, or full-width blades requiring 20GE or less total throughput provide a single vNIC with hardware failover if available or use an active/passive NIC configuration for the VMware switching.
  • For blades requiring the total active/active throughput of available NICs determine application profiles and utilize port-groups (port-profiles with Nexus 1000v) to ensure active paths are the same for application groups which communicate heavily.

Summary:

UCS utilizes a unique switching design in order to provide high bandwidth, low-latency switching with a greatly reduced management architecture compared to competing solutions.  The networking requires a  thorough understanding in order to ensure architectural designs provide the greatest available performance.  Ensuring application groups that utilize high levels of server to server traffic are placed on the same path will provide maximum performance and minimal additional overhead on upstream networking equipment.

GD Star Rating
loading...