Much Ado About Something: Brocade’s Tech Day

Yesterday I had the privilege of attending Brocade’s Tech Day for Analysts and Press.  Brocade announced the new VDX 8770, discussed some VMware announcements, as well as discussed strategy, vision and direction.  I’m going to dig in to a few of the topics that interested me, this is no way a complete recap.

First in regards to the event itself.  My kudos to the staff that put the event together it was excellent from both a pre-event coordination and event staff perspective.  The Brocade corporate campus is beautiful and the EBC building was extremely well suited to such an event.  The sessions went on smoothly, the food was excellent and overall it was a great experience.  I also want to thank Lisa Caywood (@thereallisac) for pointing out that my tweets during the event were more inflammatory then productive and outside the lines of ‘guest etiquette.’  She’s definitely correct and hopefully I can clear up some of my skepticism here in a format left open for debate, and avoid the same mistake in the future.  That being said I had thought I was quite clear going in on who I was and how I write.  To clear up any future confusion from anyone:  if you’re not interested in my unfiltered, typically cynical, honest opinion don’t invite me, I won’t take offense.  Even if you’re a vendor with products I like I’ve probably got a box full of cynicism for your other product lines.

During the opening sessions I observed several things that struck me negatively:

On the positive side Brocade has some vision that’s quite interesting as well as some areas where they are leading by filling gaps in industry offerings.

On the financial side Brocade has been looking good and climbed over $6.00 a share.  There are plenty of conversations stating some of this may be due to upcoming shifts at the CEO level.  They’ve reported two great quarters and are applying some new focus towards federal government and other areas lacking in recent past. I didn’t dig further into this discussion.

During lunch I was introduced to one of the most interesting Brocade offerings I’d never heard of: ‘Brocade Network Subscription”: http://www.brocade.com/company/how-to-buy/capital-solutions/index.page.  Basically you can lease your on-prem network from Brocade Capitol.  This is a great idea for customers looking to shift CapEx to OpEx which can be extremely useful.  I also received a great explanation for the value of a fabric underneath an SDN network from Jason Nolet (VP of Data Center Networking Group.)  Jason’s position (summarized) is that implementing SDN adds a network management layer, rather than removing one.  With that in mind the more complexity we remove from the physical network the better off we are.  What we’ll want for our SDN networks is fast, plug-and-play functionality with max usable links and minimal management.  Brocade VCS fabric fits this nicely.  While I agree with that completely I ‘d also say it’s not the only way to skin that particular cat.  More to come on that.

For the last few years I’ve looked at Brocade as a company lacking innovation and direction.  They clung furiously to FC while the market began shifting to Ethernet, ignored cloud for quite a while, etc.  Meanwhile they burned down deals to purchase them and ended up where they’ve been.  The overall messaging, while nothing new, did have undertones of change as a whole and new direction.  That’s refreshing to hear.  Brocade is embracing virtualization and cloud architectures without tying their cart to a single hypervisor horse.  They are positioning well for SDN and the network market shifts.  Most impressively they are identifying gaps in the spaces they operate and executing on them both from a business and technology perspective.  Examples of this are Brocade Network Subscription and the VXLAN gateway functionality respectively.

Things are looking up and there is definitely something good happening at Brocade.  That being said they aren’t out of the woods yet.  For them, as a company, purchase is far fetched as the vendors that would buy them already have networking plays and would lose half of Brocade’s value by burning OEM relationships with the purchase.  The only real option from a sale perspective is for investors looking to carve them up and sell off pieces individually.  A scenario like this wouldn’t bode well for customers.  Brocade has some work to do but they’ve got a solid set of products and great direction.  We’ll see how it pans out.  Execution is paramount for them at this point.

Final Note:  This blog was intended to stop there but this morning I received an angry accusatory email from Brocade’s head of corporate communications who was unhappy with my tweets.  I thought about posting the email in full, but have decided against it for the sake of professionalism.  Overall his email was an attack based on my tweets.  As stated my tweets were not professional, but this type of email from someone in charge of corporate communications is well over the top in response.  I forwarded the email to several analyst and blogger colleagues, a handful of whom had similar issues with this individual.  One common theme in social media is that lashing out at bad press never does any good, a senior director in this position should know such, but instead continues to slander and attack.  His team and colleagues seem to understand social media use as they’ve engaged in healthy debate with me in regards to my tweets, it’s a shame they are not lead from the front.

Thoughts From a Tech Leadership Summit

This week I attended a tech leadership Summit in Vail Colorado for the second time.  The event is always a fantastic series of discussions and brings some of the top minds in the technology industry.  Here are some thoughts on the trends and thinking that were common at the event.

Virtualization and VDI:

There was a lot less talk of VDI and virtualization then in 2011.  These conversations were replaced with more conversations about cloud and app delivery.  Overall the consensus felt to be that getting the application to the right native environment on a given device was a far better approach then getting the desktop there.

Hypervisors were barely mentioned except in a recurring theme that the hypervisor itself has hit commodity.  This means that management and upper layer feature set are the differentiators.  Parallel to this thought was that VMware no longer has the best hypervisor yet their management system is still far superior to the competition (KVM was touted as the best hypervisor several times.)

The last piece of the virtualization discussion was around VMware’s acquisition of Nicira.  Some bullet points on that:

Storage:

There was a lot of talk about both the vision and execution of EMC over the past year or more.  I personally used ‘execution machine’ more than once to describe them (coming from a typically non-EMC Kool-Aid guy.)  Some key points that resonated over past few days:

I also participated in several discussions around flash and flash storage.  Some highlights:

The last point that struck me was a potential move from shared storage as a whole.  Microsoft would rather have you use local storage, clusters and big data apps like Hadoop thrive on local storage and one last big shared storage draw is going away: vMotion.  Once shared storage is no longer need for live virtual machine migration there will be far less draw for expensive systems.

Cloud:

The major cloud discussion I was a part of (mainly observer) involved OpenStack.  Overall OpenStack has a ton of buzz, and a plethora of developers.  What it’s lacking is customers, leadership and someone driving it who can lead a revolution.  Additionally it’s suffering from politics and bureaucracy.  It was described as impossible to support by one individual who would definitely know one way or another.  My thinking is that if you have CloudStack sitting there with real customers, an easily deployed system, support and leadership why waste cycles continuing down the OpenStack path?  The best answer I heard for that: Ego.  Everyone wants to build the next Amazon and CloudStack is too baked to make as much of a mark.

Overall it’s an interesting topic but my thought is: with limited developers the industry should be getting behind the best horse and working together.

Big Data:

Big Data was obviously another fun topic.  The quote of the week was ‘There are ten people, not companies, that understand Big Data.  6 of them are at Cloudera and the other 4 are locked in Google writing their own checks.’  Basically Big Data knowledge is rare and hiring consultants is not typically a viable option because you need people holding three things: Knowledge of big data processing, knowledge of your data, and knowledge of your business.  These data scientists aren’t easy to come by.  Additionally contrary to popular hype, Hadoop is not the end-all be-all of big data, it’s a tool in a large tool chest.  Especially when talking about real-time you’ll need to look elsewhere.  The consensus was that we are with big data where we were with cloud 2-3 years ago.  That being said CIO’s may still need to show big data initiatives (read: spend) so you should see $$ thrown at well packaged big data solutions geared toward plug-n-play in the enterprise.

All in all it was an excellent event and I was humbled as usual to participate in great conversations with so many smart people who are out there driving the future of technology.  What I’ve written here is a a summary from my perspective on the one summit portion I had time to participate in.  There is always a good chance I misquoted/misunderstood something so feel free to call me out.  As always I’d love your feedback, contradictions or hate mail comments.

SDN – Centralized Network Command and Control

Software Defined Networking (SDN) is a hot topic in the data center and cloud community.  The geniuses <sarcasm> over at IDC predict a $2 billion market by 2016 (expect this number to change often between now and then, and look closely at what they count in the cost.) The concept has the potential to shake up the networking business as a whole (http://www.networkcomputing.com/next-gen-network-tech-center/240001372) and has both commercial and open source products being developed and shipping, but what is it, and why?

Let’s start with the why by taking a look at how traditional networking occurs.

 

Traditional Network Architecture:

 

image

 

The most important thing to notice in the graphic above is the separate control and data planes.  Each plane has separate tasks that provide the overall switching/routing functionality.  The control plane is responsible for configuration of the device and programming the paths that will be used for data flows.  When you are managing a switch you are interacting with the control plane.  Things like route tables and Spanning-Tree Protocol (STP) are calculated in the control plane.   This is done by accepting information frames such as BPDUs or Hello messages and processing them to determine available paths.  Once these paths have been determined they are pushed down to the data plane and typically stored in hardware.  The data lane then typically makes path decisions in hardware based on the latest information provided by the control plane.  This has traditionally been a very effective method.  The hardware decision making process is very fast, reducing overall latency while the control plane itself can handle the heavier processing and configuration requirements.

This method is not without problems, the one we will focus on is scalability.  In order to demonstrate the scalability issue I find it easiest to use Quality of Service (QoS) as an example.  QoS allows forwarding priority to be given to specific frames for scheduling purposes based on characteristics in those frames.  This allows network traffic to receive appropriate treatment in times of congestion.  For instance latency sensitive voice and video traffic is typically engineered for high priority to ensure the best user experience.  Traffic prioritization is typically based on tags in the frame known as Class of Service (CoS) and or Differentiated Services Code Point (DSCP.)  These tags must be marked consistently for frames entering the network and rules must then be applied consistently for their treatment on the network. This becomes cumbersome in a traditional multi-switch network because the configuration must be duplicated in some fashion on each individual switching device.

An easier example of the current administrative challenges consider each port in the network a management point, meaning each port must be individually configured.  This is both time consuming and cumbersome.

Additional challenges exist in properly classifying data and routing traffic.  A fantastic example of this would be two different traffic types, iSCSI and voice.  iSCSI is storage traffic and typically a full size packet or even jumbo frame while voice data is typically transmitted in a very small packet.  Additionally they have different requirements, voice is very latency sensitive in order to maintain call quality, while iSCSI is less latency sensitive but will benefit from more bandwidth.  Traditional networks have few if any tools to differentiate these traffic types and send them down separate paths which are beneficial to both types.

These types of issues are what SDN looks to solve.

The Three Key Elements of SDN:

Note: In order to qualify as SDN an architecture does not have to be Open, standard, interoperable, etc.  A proprietary architecture can meet the definition and provide the same benefits.  This blog does not argue for or against either open or proprietary architectures.

An SDN architecture must be able to manipulate frame and packet flows through the network at large scale, and do so in a programmable fashion.  The hardware plumbing of an SDN will typically be designed as a converged (capable of carrying all data types including desired forms of storage traffic) mesh of large lower latency pipes commonly called a fabric.  The SDN architecture itself will in turn provide a network wide view and the ability to manage the network and network flows centrally.

This architecture is accomplished by separating the control plane from the data plane devices and providing a programmable interface for that separated control plane.  The data plane devices receive forwarding rules from the separated control plane and apply those rules in hardware ASICs.  These ASICs can be either commodity switching ASICs or customized silicone depending on the functionality and performance aspects required.  The diagram below depicts this relationship:

image

In this model the SDN controller provides the control plane and the data plane is comprised of hardware switching devices.  These devices can either be new hardware devices or existing hardware devices with specialized firmware.  This will depend on vendor, and deployment model.  One major advantage that is clearly shown in this example is the visibility provided to the control plane.  Rather than each individual data plane device relying on advertisements from other devices to build it’s view of the network topology a single control plane device has a view of the entire network.  This provides a platform from which advanced routing, security, and quality decisions can be made, hence the need for programmability.  Another major capability that can be drawn from this centralized control is visibility.  With a centralized controller device it is much easier to gain usable data about real time flows on the network, and make decisions (automated or manual) based on that data.

This diagram only shows a portion of the picture as it is focused on physical infrastructure and serves.  Another major benefit is the integration of virtual server environments into SDN networks.  This allows centralized management of consistent policies for both virtual and physical resources.  Integrating a virtual network is done by having a Virtual Ethernet Bridge (VEB) in the hypervisor that can be controlled by an SDN controller.  The diagram below depicts this:

image

This diagram more clearly depicts the integration between virtual networking systems and physical networking systems in order to have cohesive consistent control of the network.  This plays a more important role as virtual workloads migrate.  Because both the virtual and physical data planes are managed centrally by the control plane when a VM migration happens it’s network configuration can move with it regardless of destination in the fabric.  This is a key benefit for policy enforcement in virtualized environments because more granular controls can be placed on the VM itself as an individual port and those controls stick with the VM throughout the environment.

Note: These diagrams are a generalized depiction of an SDN architecture.  Methods other than a single separated controller could be used, but this is the more common concept.

With the system in place to have centralized command and control of the network through SDN and a programmable interface more intelligent processes can now be added to handle complex systems.  Real time decisions can be made for the purposes of traffic optimization, security, outage, or maintenance.  Separate traffic types can be run side by side while receiving different paths and forwarding that can respond dynamically to network changes.

Summary:

Software Defined Networking has the potential to disrupt the networking market and move us past the days of the switch/router jockey.  This shift will provide extreme benefits in the form of flexibility, scalability and traffic performance for datacenter networks.  While all of the aspects are not yet defined SDN projects such as OpenFlow (www.openflow.org) provide the tools to begin testing and developing SDN architectures on supported hardware.  Expect to see lots of changes in this eco system and many flavors in the vendor offerings.

Why Software-Defined Networking Could Revolutionize Networking

Software-defined networking (SDN) is a hot topic in the network community. Vendors such as Big Switch, Brocade, Cisco and HP are getting into the mix, and just about anyone in networking is making announcements. Additionally, OpenFlow is an open-source option for building an SDN control plane. To read the full article visit: http://www.networkcomputing.com/next-gen-network-tech-center/240001372

Server Networking With gen 2 UCS Hardware

** this post has been slightly edited thanks to feedback from Sean McGee**

In previous posts I’ve outlined:

If you’re not familiar with UCS networking I suggest you start with those for background.  This post is an update to those focused on UCS B-Series server to Fabric Interconnect communication using the new hardware options announced at Cisco Live 2011.  First a recap of the new hardware:

The UCS 6248UP Fabric Interconnect

The 6248 is a 1RU model that provides 48 universal ports (1G/10G Ethernet or 1/2/4/8G FC.)  This provides 20 additional ports over the 6120 in the same 1RU form factor.  Additionally the 6248 is lower latency at 2.0us from 3.2us previously.

The UCS 2208XP I/O Module

The 2208 doubles the total uplink bandwidth per I/O module providing a total of 160Gbps total throughput per 8 blade chassis.  It quadruples the number of internal 10G connections to the blades allowing for 80Gbps per half-width blade.

UCS 1280 VIC

The 1280 VIC provides 8x10GE ports total, 4x to each IOM for a total of 80Gbps per half-width slot (160 Gbs with 2x in a full-width blade.)  It also double the VIF numbers of the previous VIC allowing for 256 (theoretical)  vNICs or vHBAs.  The new VIC also supports port-channeling to the UCS 2208 IOM and iSCSI boot.

The other addition that affects this conversation is the ability to port-channel the uplinks from the 2208 IOM which could not be done before (each link on a 2104 IOM operated independently.)  All of the new hardware is backward compatible with all existing UCS hardware.  For more detailed information on the hardware and software announcements visit Sean McGee’s blog where I stole these graphics: http://www.mseanmcgee.com/2011/07/ucs-2-0-cisco-stacks-the-deck-in-las-vegas/.

Let’s start by discussing the connectivity options from the Fabric Interconnects to the IOMs in the chassis focusing on all gen 2 hardware.

There are two modes of operation for the IOM: Discrete and Port-Channel.  in both modes it is possible to configure 1, 2 , 4, or 8 uplinks from each IOM in either Discrete mode (non-bundled) or port-channel mode (bundled.)

UCS 2208 fabric Interconnect Failover

image

Discrete Mode:

In discrete mode a static pinning mechanism is used mapping each blade to a given port dependent on number of uplinks used.  This means that each blade will have an assigned uplink on each IOM for inbound and outbound traffic.  In this mode if a link failure occurs the blade will not ‘re-pin’ on the side of the failure but instead rely on NIC-Teaming/bonding or Fabric Failover for failover to the redundant IOM/Fabric.  The pinning behavior is as follows with the exception of 1-Uplink (not-shown) in which all blades use the only available Port:

2 Uplinks

Blade

Port 1

Port 2

Port 3

Port 4

Port 5

Port 6

Port 7

Port 8

1

image

2

image

3

image

4

image

5

image

6

image

7

image

8

image

4 Uplinks

Blade

Port 1

Port  2

Port 3

Port 4

Port 5

Port 6

Port 7

Port 8

1

image

2

image

3

image

4

image

5

image

6

image

7

image

8

image

8 Uplinks

Blade

Port 1

Port2

Port 3

Port 4

Port 5

Port 6

Port 7

Port 8

1

image

2

image

3

image

4

image

5

image

6

image

7

image

8

image

The same port-pinning will be used on both IOMs, therefore in a redundant configuration each blade will be uplinked via the same port on separate IOMs to redundant fabrics.  The draw of discrete mode is that bandwidth is predictable in link failure scenarios.  If a link fails on one IOM that server will fail to the other fabric rather than adding additional bandwidth draws on the active links for the failure side.  In summary it forces NIC-teaming/bonding or Fabric Failover to handle failure events rather than network based load-balancing.  The following diagram depicts the failover behavior for server three in an 8 uplink scenario.

Discrete Mode Failover

image

In the previous diagram port 3 on IOM A has failed.  With the system in discrete mode NIC-teaming/bonding or Fabric Failover handles failover to the secondary path on IOM B (which is the same port (3) based on static-pinning.)

Port-Channel Mode:

In Port-Channel mode all available links are bonded and a port-channel hashing algorithm (TCP/UDP + Port VLAN, non-configurable) is used for load-balancing server traffic.  In this mode all server links are still ‘pinned’ but they are pinned to the logical bundle rather than individual IOM uplinks.  The following diagram depicts this mode.

Port-Channel Mode

image

In this scenario when a port fails on an IOM port-channel load-balancing algorithms handle failing the server traffic flow to another available port in the channel.  This failover will typically be faster than NIC-teaming/bonding failover.  This will decrease the potential throughput for all flows on the side with a failure, but will only effect performance if the links are saturated.  The following diagram depicts this behavior.

image

In the diagram above Blade 3 was pinned to Port 1 on the A side.  When port 1 failed port 4 was selected (depicted in green) while fabric B port 6 is still active leaving a potential of 20 Gbps.

Note: Actual used ports will vary dependent on port-channel load-balancing.  These are used for example purposes only.

As you can see the port-channel mode enables additional redundancy and potential per-server bandwidth as it leaves two paths open.  In high utilization situations where the links are fully saturated this will degrade throughput of all blades on the side experiencing the failure.  This is not necessarily a bad thing (happens with all port-channel mechanisms), but it is a design consideration.  Additionally port-channeling in all forms can only provide the bandwidth of a single link per flow (think of a flow as a conversation.)  This means that each flow can only utilize 10Gbps max even though 8x10Gbps links are bundled.  For example a single FTP transfer would max at 10Gbps bandwidth, while 8xFTP transfers could potentially use 80Gbps (10 per link) dependent on load-balancing.

Next lets discuss server to IOM connectivity (yes I use discuss to describe me monologuing in print, get over it, and yes I know monologuing isn’t a word) I’ll focus on the new UCS 1280 VIC because all other current cards maintain the same connectivity.  the following diagram depicts the 1280 VIC connectivity.

image

The 1280 VIC utilizes 4x10Gbps links across the mid-pane per IOM to form two 40Gbps port-channels.  This provides for 80Gbps total potential throughput per card.  This means a half-width blade has a total potential of 80Gbps using this card and a full-width blade can receive 160Gbps (of course this is dependent upon design.)  As with any port-channel, link-bonding, trunking or whatever you may call it, any flow (conversation) can only utilize the max of one physical link (or back plane trace) of bandwidth.  This means every flow from any given UCS server has a max potential bandwidth of 10Gbps, but with 8 total uplinks 8 different flows could potentially utilize 80Gbps.

This becomes very important with things like NFS-based storage within hypervisors.  Typically a virtualization hypervisor will handle storage connectivity for all VMs.  This means that only one flow (conversation) will occur between host and storage.  In these typical configurations only 10Gbps will be available for all VM NFS data traffic even though the host may have a potential 80Gbps bandwidth.  Again this is not necessarily a concern, but a design consideration as most current/near-future hosts will never use more than 10Gbps of storage I/O.

Summary:

The new UCS hardware packs a major punch when it comes to bandwidth, port-density and failover options.  That being said it’s important to understand the frame flow, port-usage and potential bandwidth in order to properly design solutions for maximum efficiency.  As always comments, complaints and corrections are quite welcome!

WWT's Geek Day 2012

A BrightTALK Channel

Why FCoE Standards Matter

Mike Fratto at Network Computing recently wrote an article titled ‘FCoE: Standards Don’t Matter; Vendor Choice Does’ (http://www.networkcomputing.com/storage-networking-management/231002706.)

I definitely differ from Mike’s opinion on the subject.  While I’m no fan of the process of making standards (puts sausage making to shame), or the idea of slowing progress to wait on standards, I do feel they are an absolutely necessary part of FCoE’s future.  It’s all about the timing at which we expect them, the way in which they’re written, and most importantly the way in which they’re adhered to.

Mike bases his opinion on Fibre Channel history and accurately describes the strangle hold the storage vendors have had on the customer.  The vendor’s Hardware Compatibility List (HCL) dictates which vendor you could connect to, and which model and which firmware you can use.  Slip off the list and you lose support.  This means that in the FC world customers typically went with the Storage Area Network (SAN) their VAR or storage vendor recommended, and stuck with it.  While not ideal this worked fine in the small network environment of SAN with the specialized and dedicated purpose of delivering block data from array to server.  These extreme restrictions based on storage vendors and protocol compatibility will not fly as we converge networks.

As worried as storage/SAN admins may be about moving their block data onto Ethernet networks, the traditional network admins may be more worried because of the interoperability concept.  For years network admins have been able to intermix disparate vendors technology to build the networks that they desired, best-of-breed or not.  A load-balancer here, firewall there, data center switch here and presto everything works.  They may have had to sacrifice some features (proprietary value add-that isn’t compatible) but they could safely connect the devices.  More importantly they didn’t have to answer to an HCL dictated by some end-point (storage disk) or another on their network.

For converged networking to work, this freedom must remain.  Adding FCoE to consolidate infrastructure cannot lock network admins into storage HCLs and extreme hardware incompatibility.  This means that the standards must exist, be agreed upon, be specific enough, and be adhered to.  While Mike is correct, you probably won’t want to build multi-vendor networks day one, you will want to have the opportunity to incorporate other services, and products, migrate from one vendor to another, etc.  You’ll want an interoperable standard that allows you to buy 3rd party FCoE appliances for things like de-duplication, compression, encryption or whatever you may need down the road.  We’re not talking about building an Ethernet network dedicated to FCoE, we’re talking about building one network to rule them all (hopefully we never have to take it to Mordor and toss it into molten lava.)  To run one network we need the standards and compatibility that provide us flexibility.

There is no reason for storage vendors to hold the keys to what you can deploy any longer.  Hardware is stable, and if standards are in place the network will properly transport the blocks.  Customers and resellers shouldn’t accept lock in and HCL dictation just because that has been the status quo.  We’re moving the technology forward move your thinking forward.  The issue in the past has been the looseness with which IEEE FCBB-5 is written on some aspects since it’s inception.  This leaves room for interpretation which is where interoperability issues arise between vendors who are both ‘standards based.’  The onus is on us as customers, resellers and an IT community to demand that the standards be well defined, and that the vendors adhere to them in an interoperable fashion. 

Do not accept incompatibility and lack of interoperability in your FCoE switching just because we made the mistake of allowing that to happen with pure FC SANs.  Next time your storage vendor wants a few hundred thousand for your next disk array tell them it isn't happening unless you can plug it into any standards compliant network without fear of their HCL and loss of support.

Additional Private Cloud Blogs

For those that are interested and unaware I’ve been blogging for Network Computing for about a month on their Private Cloud Tech Center.  You can find those blogs here: http://www.networkcomputing.com/private-cloud-tech-center.  You should see a new one there every week or so.  I will continue to publish content here as regularly as possible and I’m always seeking new contributors for guest posts or regular contributions.  Contact me via the About page if you’re interested.

Inter-Fabric Traffic in UCS

It’s been a while since my last post, time sure flies when you’re bouncing all over the place busy as hell.  I’ve been invited to Tech Field Day next week and need to get back in the swing of things so here goes.

In order for Cisco’s Unified Computing System (UCS) to provide the benefits, interoperability and management simplicity it does, the networking infrastructure is handled in a unique fashion.  This post will take a look at that unique setup and point out some considerations to focus on when designing UCS application systems.  Because Fibre Channel traffic is designed to be utilized with separate physical fabrics exactly as UCS does this post will focus on Ethernet traffic only.   This post focuses on End Host mode, for the second art of this post focusing on switch mode use this link: http://www.definethecloud.net/inter-fabric-traffic-in-ucspart-ii.  Let’s start with taking a look at how this is accomplished:

UCS Connectivity

image

In the diagram above we see both UCS rack-mount and blade servers connected to a pair of UCS Fabric Interconnects which handle the switching and management of UCS systems.  The rack-mount servers are shown connected to Nexus 2232s which are nothing more than remote line-cards of the fabric interconnects known as Fabric Extenders.  Fabric Extenders provide a localized connectivity point (10GE/FCoE in this case) without expanding management points by adding a switch.  Not shown in this diagram are the I/O Modules (IOM) in the back of the UCS chassis.  These devices act in the same way as the Nexus 2232 meaning they extend the Fabric Interconnects without adding management or switches.  Next let’s look at a logical diagram of the connectivity within UCS.

UCS Logical Connectivity

imageIn the last diagram we see several important things to note about UCS Ethernet networking:

At first glance handling all switching at the Fabric Interconnect level looks as though it would add latency (inter-blade traffic must be forwarded up to the fabric interconnects then back to the blade chassis.)  While this is true, UCS hardware is designed for low latency environments such as High Performance Computing (HPC.)  Because of this design goal all components operate at very low latency.  The Fabric Interconnects themselves operate at approximately 3.2us (micro seconds), and the Fabric Extenders operate at about 1.5us.  This means total roundtrip time blade to blade is approximately 6.2us right inline or lower than most Access Layer solutions.  Equally as important with this design switching between any two blades/servers in the system will occur at the same speed regardless of location (consistent predictable latency.)

The question then becomes how is traffic between fabrics handled?  The answer is that traffic between fabrics must be handled upstream (next hop device(s) shown in the diagrams as the LAN cloud.)  This is an important consideration when designing UCS implementations and selecting a redundancy/load-balancing behavior for server NICs.

Let’s take a look at two examples, first a bare-metal OS (Windows, Linux, etc.) next a VMware server.

Bare-Metal Operating System

image In the diagram above we see two blades which have been configured in an active/passive NIC teaming configuration using separate fabrics (within UCS this is done within the service profile.)  This means that blade 1 is using Fabric A as a primary path with B available for failover and blade 2 is doing the opposite.  In this scenario any traffic sent from blade 1 to blade 2 would have to be handled by the upstream device depicted by the LAN cloud.  This is not necessarily an issue for the occasional frame but will impact performance for servers that communicate frequently.

Recommendation:

For bare-metal operating systems analyze the blade to blade communication requirements and ensure chatty server to server applications are utilizing the same fabric as a primary:

VMware Servers

image

In the above diagram we see that the connectivity is the same from a physical perspective but in this case we are using VMware as the operating system.  In this case a vSwitch, vDS, or Cisco Nexus 1000v will be used to connect the VMs within the Hypervisor.  Regardless of VMware switching option the case will be the same.  It is necessary to properly design the the virtual switching environment to ensure that server to server communication is handled in the most efficient way possible.

Recommendation:

Summary:

UCS utilizes a unique switching design in order to provide high bandwidth, low-latency switching with a greatly reduced management architecture compared to competing solutions.  The networking requires a  thorough understanding in order to ensure architectural designs provide the greatest available performance.  Ensuring application groups that utilize high levels of server to server traffic are placed on the same path will provide maximum performance and minimal additional overhead on upstream networking equipment.

Data Center 101: Local Area Network Switching

Interestingly enough 2 years ago I couldn’t even begin to post an intelligent blog on Local Area Networking 101, funny how things change.  That being said I make no guarantees that this post will be intelligent in any way.  Without further ado let’s get into the second part of the Data Center 101 series and discuss the LAN.

I find the best way to understand a technology is to have a grasp on its history and the problems it solves, so let’s take a minute to dive into the history of the LAN.  For the sake of simplicity and real world applicability I’m going to stick to Ethernet as it is the predominant LAN technology in today’s data center environments.  Before we even go into the history we’ll define Ethernet and where it fits on the OSI model.

Ethernet:

Ethernet is a frame based networking technology which is comprised as a set of standards for Layer 1 and 2 of the OSI model.  Ethernet devices use a address called a Media-Access Control Address (MAC) for communication.  MAC addresses are a flat address space which is not routable (can only be used on a flat layer 2 network) and is composed of several components most importantly a vendor ID known as an Organizational Unique Identifier (OUI) and a unique address for the individual port.

OSI Model:

The Open-Systems Interconnection (OSI) model is a sub-division of the components of communication that is used as a tool to create interoperable network systems and is a fantastic model for learning networks.  The OSI model breaks into 7-Layers much like my favorite taco dip.

image Understanding the OSI model and where protocols and hardware fit into it will not only help you learn but also help with understanding new technologies and how they fit together.   I often revert back to placing concepts in terms of the OSI model when having highly technical discussions about new concepts and technology.  The beauty of the model is that it allows for easy interoperability and flexibility.  For instance Ethernet is still Ethernet whether you use Fiber cables or copper cables because only Layer 1 is changing.

Ethernet LAN History:

As the LAN networks we use today evolved they typically started with individual groups within an organization.  For instance a particular group would have a requirement for a database server and would purchase a device to connect that group.  Those devices were commonly a hub.

Hub:

A network hub is a device with multiple ports used to connect several devices for the purposes of network communication.  When an Ethernet hub receives a frame it replicates it to all connected ports except the one it received it on in a process called flooding.  All connected devices receive a copy of the frame and will typically only process the frame if the destination MAC address is their own (there are exceptions to this which are beyond the scope of this discussion.)

image

In the diagram above you see a single device sending a frame and that frame being flooded to all other active ports.  This works quite well for small networks consisting of a single hub and low port count, but you can easily see where problems start to arise as the network grows.

image

Once multiple hubs are connected and the network grows each hub will flood every frame, and all devices will receive these frames regardless of whether they are the intended recipient.  This causes major overhead in the network due to the unneeded frames consuming bandwidth.

Bridge:

The next step in the network evolution is called bridging and was designed to alleviate this problem and decrease the overhead of forwarding unneeded frames.  A bridge is a device that makes an intelligent decision on when and where to flood frames based on MAC addresses stored in a table.  These MAC addresses can be static (manually input) or dynamic (learned on the fly.)  Because it is more common we will focus on dynamic.  The original bridges were typically 2 or more ports (low port counts) and could separate MAC addresses using the table for those ports.

image

In the above diagram you see a hub operating normally on the left flooding the frame to all active ports.  When the frame is received by the bridge a MAC address lookup is done on the MAC table and the bridge makes a decision whether or not to flood to the other side of the network.  Because the frame in this example is destined for a MAC address existing on the left side of the network the bridge does not flood the frame.  These addresses will be learned dynamically as devices send frames.  If the destination MAC address had been a device on the right side of the network the bridge would have sent the frame to that side to be flooded by the hub.

Bridges reduced unnecessary network traffic between groups or departments while allowing resource sharing when needed.  The limitation of original bridges came from the low port counts and changing data patterns.  Because the bridges were typically only separating 2-4 networks there was still quite a bit of flooding, especially when more and more resources were shared across groups.

Switches:

Switches are the next evolution of bridges and the operation they perform is still considered bridging.  In very basic terms a switch is a high port-count bridge that is able to make decisions on a port-by-port basis.  A switch maintains a MAC table and only forwards frames to the appropriate port based on the destination MAC.  If the switch has not yet learned the destination MAC it will flood the frame.  Switches and bridges will also flood multi-cast (traffic destined for multiple recipients) and broadcast (traffic destined for all recipients) frames which are beyond the scope of this discussion.

image

In the diagram above I have added several components to clarify switching operations now that we are familiar with basic bridging. Starting in the top left of the diagram you see some of the information that is contained in the header of an Ethernet frame.  In this case it is the source and destination MAC addresses of two of the devices connected to the switch.  Each end-point in the above diagram is labeled with a MAC address starting with AF:AF:AF:AF:AF.  In the top right we see a representation of a MAC table which is stored on the switch and learned dynamically.  The MAC table contains a listing of which MAC addresses are known to be on each port.  Because the MAC table in this example is fully populated we can assume that the switch has previously seen a frame from each device in order to populate the table.  That auto population is the ‘dynamic learning’ and it is done be recording the source MAC address of incoming frames.  Lastly we see that the frame being sent by the device on port 1 is only being forwarded to the device on port 2.  In the event port 2’s MAC address had not yet been learned the switch would be forced to flood the frame to all ports except the one it received it on in order to ensure it was received by the destination device.

So far we’ve learned that bridges improved upon hubs, and switches improved upon basic bridging.  The next kink in the evolution of Ethernet LANs came as our networks grew beyond single switches and we began adding in redundancy.

The three issues that arose can all be grouped as problems with network loops (specifically Layer 2 Ethernet loops.)  These issues are:

Multiple Frame Copies:

When a device receives the same frame more than once due to replication or loop issues it is a multiple frame copy.  This can cause issues for some hardware and software and also consumes additional unnecessary bandwidth.

MAC Address Instability:

When a switch must repeatedly change its MAC table entry for a given device this is considered MAC address instability.

Broadcast Storms:

Broadcast storms are the most serious of the three issues as they can literally bring all traffic to a halt.  If you ask someone who has been doing networking for quite some time how they troubleshoot a broadcast storm you are quite likely to hear ‘Unplug everything and plug things back in one at a time until you find the offending device.’  The reason for this is that in the past the storm itself would soak up all available bandwidth leaving no means to access switching equipment in order to troubleshoot the issue.  Most major vendors now provide protection against this level of problem but storms are still a serious problem that can have a major performance impact on production data.  Broadcast storms are caused when a broadcast, multi-cast or flooded frame is repeatedly forwarded and replicated by one or more switches.

 image In the diagram above we can see a switched loop.  We can also observe several stages of frame forwarding starting with the device 1 in the top left sending a frame to the device 2 in the top right.

  1. Device 1 forwards a frame to device 2.  This one-to-one communication is known as unicast.
  2. The switch on the top left does not yet have device 2 in its MAC table therefore it is forced to flood the frame, meaning replicate the frame to all ports except the one where it was received. 
  3. In stage three we see two separate things occur:
    1. The switch in the top right delivers the frame to the intended device (for simplicities sake we are assuming the switch in the top right already has a MAC table entry for the device.)
    2. The bottom switch having received the frame forwards the frame to the switch in the top right.
  4. The switch in the top right receives the second copy and forwards it based on MAC table delivering the second copy of the same frame to device 2.

 

image

The above example has a little more going on and can become confusing quickly.  For the purposes of this example assume all three switches have blank MAC address tables with no devices known.  Also remember that they are building the MAC table dynamically based on the source MAC address they see in a frame.  To aid in understanding I will fill out the MAC tables at each step.

1. Our first stage is the easy one.  Device 1 forwards a unicast frame to device 2.  Switch A receives this frame on the top port.

image

2. When switch A receives the frame it checks its MAC table for the correct port to forward frames to device 2.  Because its MAC table is currently blank it must flood the frame (replicate it to all ports except the one where it was received.)  As it floods the frame it also records the MAC address and attached port of device 1 because it has seen this MAC as the source in the frame.

image

3. In stage 3 two switches receive the frame and must make decisions. 

  1. Switch C having a blank MAC table must flood the frame.  Because there is only one port other than the one it received it on switch C floods the frame to the only available port, at the same time it records the source MAC address as having been received on its port 1.  
  2. Switch B also receives the frame from switch A, and must make a decision.  Like switch C, switch B has no records in its MAC table and must flood the frame.  It floods the frame down to switch B, and up to device 2.  At the same time switch B records the source MAC in its MAC table.

image

4. In the fourth stage we again have several things happening. 

  1. Switch C has received the same frame for the second time, this time from port 2.  Because it still has not seen the destination device it must flood the frame.  Additionally because this is the exact same frame switch C sees the MAC address of device 1 coming from its right port, port 2, and assumes the device has moved.  This forces switch C to change it's MAC table. 
  2. At the same time Switch B receives another copy of the frame.  Switch B seeing the same source address must change its MAC table and because it still does not have the destination MAC in the table it must flood the frame again.

image

In the above diagram pay close attention to the fact that the MAC tables have been changed for switch B and C.  Because they saw the same frame come from a different port they must assume the device has moved and change the table.  Additionally because the cycle has not been completed the loop will continue and this is one way broadcast storms begin.  More and more of these endless loops hit the network until there is no bandwidth left to serve data frames. 

In this simple example it may seem that the easy solution is to not build loops like the triangle in my diagram.  This is actually the premise of the next Ethernet evolution we’ll discuss, but first let’s look at how easy it is to create loops just by adding redundancy.

image

In the diagram above we start with a non-redundant switch link.  This link is a single point of failure and in the event a component fails devices on separate switches will be unable to communicate.  The simple solution is adding a second port for redundancy, with the assumed added benefit of having more bandwidth.  In reality without another mechanism in place adding the second link turns the physical view on the bottom left into the logical view on the bottom right which is loop.  This is where the next evolution comes into play.

Spanning-Tree Protocol (STP):

STP is defined in IEEE 802.1d and provides an automated method for building loop free topologies based on a very simple algorithm.  The premise is to allow the switches to automatically configure a loop free topology by placing redundant links in a blocked state.  Like a tree this loop free topology is built up from the root (root bridge) and branches out (switches) to to the leaves (end-nodes) with only one path to get to each end-node.

imageThe way Spanning-tree does this is by detecting redundant links and placing them in a ‘blocked’ state.  This means that the ports do not send or receive frames. In the event of a primary link failure (designated port) the blocked port is brought online.  The issue with spanning-tree is two fold:

Multiple versions of STP have been implemented and standardized to improve upon the original 802.1d specification.  These include:

Per-VLAN Spanning-Tree Protocol (PVSTP):

Performs the blocking algorithm independently for each VLAN allowing greater bandwidth utilization.

Rapid Spanning-Tree Protocol (RSTP):

Uses additional port-types not in the original STP specification to allow faster convergence during failure events.

Per-VLAN Rapid Spanning-Tree (PVRSTP):

Provides rapid spanning-tree functionality on a per VLAN basis.

Other STP implementations exist and the details of STP operation in each of its flavors is beyond the scope of what I intend to cover with the 101 series.  If there is a demand these concepts may be covered in a more in-depth 202 series once this series is completed.

Summary:

Ethernet networking has evolved quite a bit over the years and is still a work in progress.  Understanding the how’s and why’s of where we are today will help in understanding the advancements that continue to come.  If you have any comments, questions, or corrections please leave them in the comments or contact me in any of the ways listed on the about page.