FCoE multi-hop; Do you Care?

There is a lot of discussion in the industry around FCoE’s current capabilities, and specifically around the ability to perform multi-hop transmission of FCoE frames and the standards required to do so.  A recent discussion between Brad Hedlund at Cisco and Ken Henault at HP (http://bit.ly/9Kj7zP) prompted me to write this post.  Ken proposes that FCoE is not quite ready and Brad argues that it is. 

When looking at this discussion remember that Cisco has had FCoE products shipping for about 2 years, and has a robust product line of devices with FCoE support including: UCS, Nexus 5000, Nexus 4000 and Nexus 2000, with more products on the road map for launch this year.  No other switching vendor has this level of current commitment to FCoE.  For any vendor with a less robust FCoE portfolio it makes no sense to drive FCoE sales and marketing at this point and so you will typically find articles and blogs like the one mentioned above.  The one quote from that blog that sticks out in my mind is:

“Solutions like HP’s upcoming FlexFabric can take advantage of FCoE to reduce complexity at the network edge, without requiring a major network upgrades or changes to the LAN and SAN before the standards are finalized.”

If you read between the lines here it would be easy to take this as ‘FCoE isn’t ready until we are.’  This is not unusual and if you take a minute to search through articles about FCoE over the last 2-3 years you’ll find that Cisco has been a big endorser of the protocol throughout (because they actually had a product to sell) and other vendors become less and less anti-FCoE as they announce FCoE products.

It’s also important to note that Cisco isn’t the only vendor out there embracing FCoE: NetApp has been shipping native FCoE storage controllers for some time, EMC has them road mapped for the very near future, Qlogic is shipping a 2nd generation of Converged Network adapter, and Emulex has fully embraced 10Gig Ethernet as the way forward with their OneConnect adapter (10GE, iSCSI, FCoE all in one card.)  Additionally support for FCoE switching of native Fibre Channel storage is widely supported by the storage community.

Fibre Channel over Ethernet (FCoE) is defined in IEEE FC-BB5 and requires the switches it traverses to support the IEEE Data Center Bridging (DCB)standards for proper traffic treatment on the network.  For more information on FCoE or DCB see my previous posts on the subjects (FCoE: http://www.definethecloud.net/?p=80, DCB: http://www.definethecloud.net/?p=31.)

DCB Has four major components, and the one in question in the above article is Quantized Congestion Notification (QCN) which the article states is required for multi-hop FCoE.  QCN is basically a regurgitation of FECN and BECN from frame relay.  It allows a switch to monitor it’s buffers and push congestion to the edge rather than clog the core. In the comments Brad correctly states that QCN is not required for FCoE, the reason for this is that Fibre Channel operates today without any native version of QCN, therefore when placing it on Ethernet you will not need to add functionality that wasn’t there to begin with, remember Ethernet is just a new layer 1-2 for native FC layers 2-4, the FC secret sauce remains unmodified.  Remember that not every standard defined by a standards body has to be adhered to by every device, some are required, some are optional.  Logical SANs are a great example of an optional standard.

Rather than discuss what is or isn’t required for multi-hop FCoE I’d like to ask a more important question that we as engineers tend to forget: Do I care?  This question is key because it avoids having us argue the technical merits of something we may never actually need, or may not have a need for today.

Do we care?

First let’s look at why we do multi-hop anything: to expand the port-count of our network.  Take TCP/IP networks and the internet for example, we require the ability to move packets across the globe through multiple routers (hops.)  This is in order to attach devices on all corners of the globe.

Now let’s look at what we do with FC today: typically one or two hop networks (sometimes three) used to connect several hundred devices (occasionally but rarely more.)  It’s actually quite common to find FC implementations with less than 100 attached ports.  This means that if you can hit the right port count without multiple hops you can remove complexity and decrease latency, in Storage Area Networks (SAN) we call this the collapsed core design.

The second thing to consider is a hypothetical question: If FCoE were permanently destined for single hop access/edge only deployments (it isn’t) should that actually stop you from using it?  The answer here is an emphatic no, I would still highly recommend FCoE as an access/edge architecture even if it were destined to connect back to an FC SAN and Ethernet LAN for all eternity.  Let’s jump to some diagrams to explain.  In the following diagrams I’m going to focus on Cisco architecture because as stated above they are currently the only vendor with a full FCoE product portfolio.

 image

In the above diagram you can see a fairly dynamic set  of FCoE connectivity options.  Nexus 5000 can be directly connected to servers, or to Nexus 4000 in IBM BladeCenter to pass FCoE.  It can also be connected to 10GE Nexus 2000s to increase its port density. 

To use the nexus 5000 + 2000 as an example it’s possible to create a single-hop (2000 isn’t an L2 hop it is an extension of the 5000) FCoE architecture of up to 384 ports with one point of switching management per fabric.  If you take server virtualization into the picture and assume 384 servers with a very modest V2P ratio of 10 virtual machines to 1 physical machine that brings you to 3840 servers connected to a single hop SAN.  That is major scalability with minimal management all without the need for multi-hop. The diagram above doesn’t include the Cisco UCS product portfolio which architecturally supports up to 320 FCoE connected servers/blades.

The next thing I’ve asked you to think about is whether or not you should implement FCoE in a hypothetical world where FCoE stays an access/edge architecture forever.  The answer would be yes.  In the following diagrams I outline the benefits of FCoE as an edge only architecture.

image

The first benefit is reducing the networks that are purchased, managed, power, and cooled from 3 to 1 (2 FC and 1 Eth to 1 FCoE.)  Even just at the access layer this is a large reduction in overhead and reduces the refresh points as I/O demands increase.

image The second benefit is the overall infrastructure reduction at the access layer.  Taking a typical VMware server as an example we reduce 6x 1GE ports, 2x 4GFC ports and the 8 cables required for them to 2x 10GE ports carrying FCoE.  This increases total bandwidth available while greatly reducing infrastructure.  Don’t forget the 4 top-of-rack switches (2x FC, 2x GE) reduced to 2 FCoE switches.

Since FCoE is fully compatible with both FC and pre-DCB Ethernet this requires 0 rip-and-replace of current infrastructure.  FCoE is instead used to build out new application environments or expand existing environments while minimizing infrastructure and complexity.

What if I need a larger FCoE environment?

If you require a larger environment than is currently supported extending your SAN is quite possible without multi-hop FCoE.  FCoE can be extended using existing FC infrastructure.  Remember customers that require an FCoE infrastructure this large already have an FC infrastructure to work with.

image 

What if I need to extend my SAN between data centers?

FCoE SAN extension is handled in the exact same way as FC SAN extension, CWDM, DWDM, Dark Fiber, or FCIP.  Remember we’re still moving Fibre Channel frames.

image

Summary:

FCoE multi-hop is not an argument that needs to be had for most current environments.  FCoE is a supplemental technology to current Fibre Channel implementations.  Multi-hop FCoE will be available by the end of CY2010 allowing 2+ tier FCoE networks with multiple switches in the path, but there is no need to wait for them to begin deploying FCoE.  The benefits of an FCoE deployment at the access layer only are significant, and many environments will be able to scale to full FCoE roll-outs without ever going mutli-hop. 

GD Star Rating
loading...

Consolidated I/O

Consolidated I/O (input/output) is a hot topic and has been for the last two years, but it’s not a new concept.  We’ve already consolidated I/O once in the data center and forgotten about it, remember those phone PBXs before we replaced them with IP Telephony?  The next step in consolidating I/O comes in the form of getting management traffic, backup traffic and storage traffic from centralized storage arrays to the servers on the same network that carries our IP data.  In the most general terms the concept is ‘one wire.’  ‘Cable Once’ or ‘One Wire’ allows a flexible I/O infrastructure with a greatly reduced cable count and a single network to power, cool and administer.

Solutions have existed and been used for years to do this, iSCSI (SCSI storage data over IP networks) is one tool that has been commonly used to do this.  The reason the topic has hit the mainstream over the last 2 years is that 10GB Ethernet was ratified and we now have a common protocol with the proper bandwidth to support this type of consolidation.  Prior to 10GE we simply didn’t have the right bandwidth to effectively put everything down the same pipe.

The first thing to remember when discussing I/O consolidation is that contrary to popular belief I/O consolidation does not mean Fibre Channel over Ethernet (FCoE.)  I/O consolidation is all about using a single infrastructure and underlying protocol to carry any and all traffic types required in the data center.  The underlying protocol of choice is 10G Ethernet because it’s lightweight, high bandwidth and Ethernet itself is the most widely used data center protocol today.  Using 10GE and the IEEE standards for Data Center bridging (DCB) as the underlying data center network, any and all protocols can be layered on top as needed on a per application basis.  See my post on DCB for more information (http://www.definethecloud.net/?p=31.)These protocols can be FCoE, iSCSI, UDP, TCP, NFS, CIFS, etc. or any combination of them all.

If you look at the data center today most are already using a combination of these protocols, but typically have 2 or more separate infrastructures to support them.  A data center that uses Fibre Channel heavily has two Fibre Channel networks (for redundancy) and one or more LAN networks. These ‘Fibre Channel shops’ are typically still using additional storage protocols such as NFS/CIFS for file based storage.  The cost of administering, powering, cooling, and eventually upgrading/refreshing these separate networks continues to grow.

Consolidating onto a single infrastructure not only provides obvious cost benefits but also provides the flexibility required for a cloud infrastructure.  Having a ‘Cable Once’ infrastructure allows you to provide the right protocol at the right time on an application basis, without the need for hardware changes.

Call it what you will I/O Consolidation, Network Convergence, or Network Virtualization, a cable once topology that can support the right protocol at the right time is one of the pillars of cloud architectures in the data center.

GD Star Rating
loading...

What’s a cloud?

So to start things off I thought I’d take a stab at defining the cloud.  This is always an interesting subject because so many people have placed very different labels and definitions on the cloud.  YouTube is filled with videos of high dollar IT talking heads spitting up non-sensical answers as to what cloud is, or in many cases diverting the question and discussing something they understand.  So before we get into what it is let’s talk about why it’s so hard to define?

Part of the difficulty in defining cloud comes from the fact that the term gets its power from being loosely defined.  If you put a strict definition on it, the term becomes useless.  For example put yourself in the shoes of an IT vendor account manager (sales rep), if you are an account manager stay in your own shoes for the next excercise and forget I said sales rep.

Now from those shoes imagine yourself in a meeting with a CxO discussing the data center.  A question such as ‘Have you looked into implementing a cloud strategy and if so what are your goals’ can be quite powerful.  It’s an open-ended question that leaves plenty of room for discussion.  Within that discussion there is a large opportunity to identify business challenges, and begin to narrow down solutions which equates to potential product sales to meet the customers requirements.

If cloud had a strict definition such as ‘Providing an off-site hosted service at a subscription rate’ it would only be applicable to a handful of customers.  Any strict definition of cloud based on size, location, infrastructure requirements, etc. reduces the overall usability of the term.  This doesn’t imply that cloud is just a sales term and should be ignored, but it does make the definition more complicated.

From a sales perspective the value of cloud is the flexibility of the term, which is quite interestingly one of the technical values to the customer (we’ll discuss that later.)  From a customer perspective the real challenge is defining what it means to you.  Service providers such as BT and AT&T will have very different definitions of cloud from Amazon and Google.  Amazon and Google will define cloud differently than SalesForce.com, and they’ll all have totally different definitions than the average enterprise customer.  This is because the definition of cloud is in the eye of the beholder, it’s all about how the concept can be effectively applied to the individual business demands.

Part of the reason engineers tend to cringe when they hear the term cloud is that it has no real meaning to an engineer.  Engineers work with defined terms that are quantifiable, bandwidth, bus speed, port count, disk space. If you use any of those terms in your definition you’ve already missed the point.  To make matters worse the more cloud gets discussed the more confusing it becomes from an engineering standpoint, we started with just cloud and now have: private cloud, internal cloud, public cloud, secure cloud, hybrid cloud, semi-private cloud.  It’s akin to LAN, MAN, WAN, CAN, PAN, GAN to describe various ‘Area networks’ except at least those can move data and cloud is just a term.

Cloud is not an engineering term, cloud is a business term.

Cloud does not solve IT problems, cloud solves business problems.

So to bring a definition to the term let’s stay at 10,000 feet and think conceptually, because that’s what it’s really all about.  Think about the last time you saw a PDF, or drew a whiteboard that discussed moving data across the internet.  Did you draw the complex series of routers and switches, optical mux’s and de-mux’s that moved your packet from London to Beijing?  Of course not, but what did you draw instead, why?   If you’re like most of the world you drew a cloud, you drew that cloud because you don’t care about the underlying infrastructure or complexity.  You know that the web has already been built and it works.  You know that if you put a packet in one end, it will eventually come out the other.  You don’t care how it gets there, only that it does.  The term cloud is no more complex than that, it’s all about putting together an infrastructure that gets the job done without having to dwell on the underlying components.

The point of moving to a cloud architecture is a rethinking of what the data center really does and how it does it in order to alleviate current data center issues without causing new ones.  Start by asking what is the purpose of the data center, and really take some time to think about it.  The entire data center, from the input power through the distribution system and UPS, the cooling, the storage, the network, and the servers are all there to do one thing, run applications.  The application truly is the center of the data center universe.  If you’ve ever been on a help desk team and got a call from a user saying ‘the network is slow’ it wasn’t because they ran a throughput test and found a bottleneck, it was because their email wouldn’t load or it took them an extra 15 seconds to access a database.  The data center itself is there to support the applications that run the business.

That tends to be a hard pill to swallow for people who have spent their lives in networks, or storage, or even server hardware because in reality the only Oscar they could ever receive is best supporting actor, applications are the star of the show.  No company built a network and then decided they should find an application to use up some of that bandwidth.  We’ve built our infrastructures to support the applications we choose to run our business, and that’s where the problems came from.

We’ve built data center infrastructures one app at a time as our businesses grew, adding on where necessary and removing when/if possible.  What we’ve ended up with is a Frankenstein type mess of siloed architecture and one trick pony infrastructure.  We’ve consolidated, and scaled in or out, we’ve virtualized all to try to fix this and we’ve failed.  We’ve failed because we’ve taken a view of the current problems while wearing blinders that prevented seeing the big picture.  The big picture is that businesses require apps, apps come and go, and apps require infrastructure.  The solution is building a flexible, available, high performance platform that adapts to our immediate businesses needs.  A dynamic infrastructure that ties directly to the business needs without the need for procurement or decommission when an app comes up or hits its end-of-life.  That applies to any type of cloud you care to talk about.

In future blogs I’ll describe the technical and business drivers behind cloud solutions, the types of clouds, the risks of cloud, and the technology that enables a move to cloud.  I’ll do my best to remain vendor neutral and keep my opinions out of it as much as possible.  I welcome any and all feedback, comments and corrections.

The cloud doesn’t have to be a pig

GD Star Rating
loading...