What’s the deal with Quantized Congestion Notification (QCN)

For the last several months there has been a lot of chatter in the blogosphere and Twitter about FCoE and whether full scale deployment requires QCN.  There are two camps on this:

  1. FCoE does not require QCN for proper operation with scale.
  2. FCoE does require QCN for proper operation and scale.

Typically the camps break down as follows (there are exceptions) :

  1. HP camp stating they’ve not yet released a suite of FCoE products because QCN is not fully ratified and they would be jumping the gun.  The flip side of this is stating that Cisco did jumped the gun with their suite of products and will have issues with full scale FCoE.
  2. Cisco camp stating that QCN is not required for proper FCoE frame flow and HP is using the QCN standard as an excuse for not having a shipping product.

For the purpose of this post I’m not camping with either side, I’m not even breaking out my tent.  What I’d like to do is discuss when and where QCN matters, what it provides and why.  The intent being that customers, architects, engineers etc. can decide for themselves when and where they may need QCN.

QCN: QCN is a form of end-to-end congestion management defined in IEEE 802.1.Qau.  The purpose of end-to-end congestion management is to ensure that congestion is controlled from the sending device to the receiving device in a dynamic fashion that can deal with changing bottlenecks.  The most common end-to-end congestion management tool is TCP Windows sizing.

TCP Window Sizing:

With window sizing TCP dynamically determines the number of frames to send at once without an acknowledgement.  It continuously ramps this number up dynamically if the pipe is empty and acknowledgements are being received.  If a packet is dropped due to congestion and an acknowledgement is not received TCP halves the window size and starts the process over.  This provides a mechanism in which the maximum available throughput can be achieved dynamically.

Below is a diagram showing the dynamic window size (total packets sent prior to acknowledgement) over the course of several round trips.  You can see the initial fast ramp up followed by a gradual increase until a packet is lost, from there the window is reduced and the slow ramp begins again.

image If you prefer analogies I always refer to TCP sliding windows as a Keg Stand (http://en.wikipedia.org/wiki/Keg_stand.)


In the photo we see several gentleman surrounding a keg, with one upside down performing a keg stand.

To perform a keg stand:

  • Place both hands on top of the keg
  • 1-2 Friend(s) lift your feet over your head while you support your body weight on locked-out arms
  • Another friend places the keg’s nozzle in your mouth and turns it on
  • You swallow beer full speed for as long as you can

What the hell does this have to do with TCP Flow Control? I’m so glad you asked.

During a keg stand your friend is trying to push as much beer down your throat as it can handle, much like TCP increasing the window size to fill the end-to-end pipe.  Both of your hands are occupied holding your own weight, and your mouth has a beer hose in it, so like TCP you have no native congestion signaling mechanism.  Just like TCP the flow doesn’t slow until packets/beer drops, when you start to spill they stop the flow.

So that’s an example of end-to-end congestion management.  Within Ethernet and FCoE specifically we don’t have any native end-to-end congestion tools (remember TCP is up on L4 and we’re hanging out with the cool kids at L2.)  No problem though because We’re talking FCoE right?  FCoE is just a L1-L2 replacement for Fibre Channel (FC) L0-L1, so we’ll just use FC end-to-end congestion management… Not so fast boys and girls, FC does not have a standard for end-to-end congestion management, that’s right our beautiful over engineered lossless FC has no mechanism for handling network wide, end-to-end congestion.  That’s because it doesn’t need it.

FC is moving SCSI data, and SCSI is sensitive to dropped frames, latency is important but lossless delivery is more important.  To ensure a frame is never dropped FC uses a hop-by-hop flow control known as buffer-to-buffer (B2B) credits. At a high level each FC device knows the amount of buffer spaces available on the next hop device based on the agreed upon frame size (typically 2148 bytes.)  This means that a device will never send a frame to a next hop device that cannot handle the frame.  Let’s go back to the world of analogy.

Buffer-to-buffer credits:

The B2B credit system works in the same method you’d have 10 Marines offload and stack a truckload of boxes (‘fork-lift, we don’t need no stinking forklift.’)  The best system to utilize 10 Marines to offload boxes is to line them up end-to-end one in the truck and one on the other end to stack.  Marine 1 in the truck initiates the send by grabbing a box and passing it to Marine 2, the box moves down the line until it gets to the target Marine 10 who stacks it.  Before any Marine hands another Marine a box they look to ensure that Marines hands are empty verifying they can handle the box and it won’t be dropped.  Boxes move down the line until they are all offloaded and stacked.  If anyone slows down or gets backed up each marine will hold their box until the congestion is relieved.

In this analogy the Marine in the truck is the initiator/server and the Marine doing the stacking is the target/storage with each Marine in between being a switch.

When two FC devices initiate a link they follow the Link-Initialization-Protocols (LIP.)  During this process they agree on an FC frame size and exchange the available dedicated frame buffer spaces for the link.  A sender is always keeping track of available buffers on the receiving side of the link.  The only real difference between this and my analogy is each device (Marine) is typically able to handle more than one frame (box) at once.

So if FC networks operate without end-to-end congestion management just fine why do we need to implement a new mechanism in FCoE, well there-in lies the rub.  Do we need QCN?  The answer is really Yes and No, and it will depend on design.  FCoE today provides the exact same flow control as FC using two standards defined within Data Center Bridging (DCB) these are Enhanced Transmission Selection (ETS) and Priority-Flow Control (PFC) for more info on theses see my DCB blog: http://www.definethecloud.net/?p=31.)  Basically ETS provides a bandwidth guarantee without limiting and PFC provides lossless delivery on an Ethernet network.

Why QCN:

The reason QCN was developed is the differences between the size, scale, and design of FC and Ethernet networks.  Ethernet networks are usually large mesh or partial mesh type designs with multiple switches.  FC designs fall into one of three major categories Collapsed core (single layer), Core edge (two layer) or in rare cases for very large networks edge-core-edge (three layer.)  This is because we typically have far fewer FC connected devices than we do Ethernet (not every device needs consolidated storage/backup access.)

If we were to design our FCoE networks where every current Ethernet device supported FCoE and FCoE frames flowed end-to-end QCN would be a benefit to ensure point congestion didn’t clog the entire network.  On the other hand if we maintain similar size and design for FCoE networks as we do FC networks, there is no need for QCN.

Let’s look at some diagrams to better explain this:


 image In the diagrams above we see a couple of typical network designs.  The Ethernet diagram shows Core at the top, aggregation in the middle, and edge on the bottom where servers would connect.  The Fibre Channel design shows a core at the top with an edge at the bottom.  Storage would attach to the core and servers would attach at the bottom.  In both diagrams I’ve also shown typical frame flow for each traffic type.  Within Ethernet, servers commonly communicate with one another as well network file systems, the WAN etc.  In an FC network the frame flow is much more simplistic, typically only initiator target (server to storage) communication occurs.  In this particular FC example there is little to no chance of a single frame flow causing a central network congestion point that could effect other flows which is where end-to-end congestion management comes into play.

What does QCN do:

QCN moves congestion from the network center to the edge to avoid centralized congestion on DCB networks.  Let’s take a look at a centralized congestion example (FC only for simplicity):

image In the above example two 2Gbbps hosts are sending full rate frame flows to two storage devices.  One of the storage devices is a 2Gbps device and can handle the full speed, the other is a 1Gbps device and is not able to handle the full speed. If these rates are sustained switch 3’s buffers will eventually fill and cause centralized congestion effecting frame flows to both switch 4, and 5.  This means that the full rate capable devices would be affected by the single slower device.  QCN is designed to detect this type of congestion and push it to the edge, therefore slowing the initiator on the bottom right avoiding overall network congestion.

This example is obviously not a good design and is only used to illustrate the concept.  In fact in a properly designed FC network with multiple paths between end-points central congestion is easily avoidable.

When moving to FCoE if the network is designed such that FCoE frames pass through the entire full-mesh network shown in the Common Ethernet design above, there would be greater chances of central congestion.  If the central switches were DCB capable but not FCoE Channel Forwarders (FCF) QCN could play a part in pushing that congestion to the edge.

If on the other hand you design FCoE in a similar fashion to current FC networks QCN will not be necessary.  An example of this would be:

imageThe above design incorporates FCoE into the existing LAN Core, Aggregation, Edge design without clogging the LAN core with unneeded FCoE traffic.  Each server is dual connected to the common Ethernet mesh, and redundantly connected to FCoE SAN A and B.  This design is extremely scalable and will provide more than enough ports for most FCoE implementations.


QCN like other congestion management tools before it such as FECN and BECN have significant use cases.  As far as FCoE deployments go QCN is definitely not a requirement and depending on design will provide no benefit for FCoE.  It’s important to remember that the DCB standards are there to enhance Ethernet as a whole, not just for FCoE.  FCoE utilizes ETS and PFC for lossless frame delivery and bandwidth control, but the FCoE standard is a separate entity from DCB.

Also remember that FCoE is an excellent tool for virtualization which reduces physical server count.  This means that we will continue to require less and less FCoE ports overall especially as 40Gbps and 100Gbps are adopted.  Scaling FCoE networks further than today’s FC networks will most likely not be a requirement.

GD Star Rating

The Difference Between ‘Foothold’ and ‘Lock-In’

There is always a lot of talk within IT marketing around vendor ‘lock-in’.  This is most commonly found within competitive marketing, i.e. ‘Product X from Company Y creates lock-in causing you to purchase all future equipment from them.  In some cases lock-in definitely exists, in other cases what you really have is better defined as ‘foothold.’  Foothold is an entirely different thing. 

Any given IT vendor wants to sell as much product as possible to their customers and gain new customers as quickly as possible, that’s business.  One way to do this is to use one product offering as a way in the door (foothold) and sell additional products later on.  Another way to do this is to sell a product that forces the sale of additional products.  There are other methods, including the ‘Build a better mousetrap method’, but these are the two methods I’ll discuss.


Foothold is like the beachhead at Normandy during WWII, it’s not necessarily easy to get but once held it gives a strategic position from which to gain more territory.

Great examples of foothold products exist throughout IT.  My favorite example is NetApp’s NFS/CIFS storage, which did the file based storage job so well they were able to convert their customer’s block storage to NetApp in many cases.  There are currently two major examples of the use of foothold in IT, HP and Cisco.

HP is using its leader position in servers to begin seriously pursuing the network equipment.  They’ve had ProCurve for some time but recently started pushing it hard, and acquired 3Com to significantly boost their networking capabilities (among other advantages.)  This is proper use of foothold and makes strategic sense, we’ll see how it pans out. 

Cisco is using its dominant position in networking to attack the market transition to denser virtualization and cloud computing with its own server line.  From a strategic perspective this could be looked at either offensively or defensively.  Either Cisco is on the offense attacking former strong vendor partner territory to grow revenue, or Cisco on the defense realized HP was leveraging its foothold in servers to take network market share.  In either event it makes a lot of strategic sense.  By placing servers in the data center they have foothold to sell more networking gear, and they also block HP’s traditional foothold.

From my perspective both are strong moves, to continue to grow revenue you eventually need to branch into adjacent markets.  You’ll here people cry and whine about stretching too thin, trying to do too much, etc, but it’s a reality.  As a publicly traded company stagnant revenue stream is nearly as bad as a negative revenue stream.

If you look closely at it both companies are executing in very complementary adjacent markets.  Networks move the data in and out of HP’s core server business, so why not own them?  Servers (and Flip cameras for that matter) create the data Cisco networks move, so why not own them?


You’ll typically hear more about vendor lock-in then you will actually experience.  that’s not to say there isn’t plenty of it out there, but it usually gets more publicity than is warranted.

Lock-in is when a product you purchase and use forces you to buy another product/service from the same vendor, or replace the first.  To use my previous Cisco and HP example, both companies are using adjacent markets as foothold but neither lock you in.  For example both HP and Cisco servers can be connected to any vendors switching, their network systems interoperate as well.  Of course you may not get every feature when connecting to a 3rd party device but that’s part of foothold and the fact that they add proprietary value.

The best real example of lock-in is blades.  Don’t be fooled, every blade system on the market has inherent vendor lock-in.  Current blade architecture couldn’t provide the advantages it does without lock-in.  To give you an example let’s say you decide to migrate to blades and you purchase 7 IBM blades and a chassis, 4 Cisco blades and a chassis, or 8 HP blades and a chassis.  You now have a chassis half full of blades.  When you need to expand by one server, who you gonna call (Ghost Busters can’t help.)  Your obviously going to buy from the chassis vendor because blades themselves don’t interoperate and you’ve got empty chassis slots.  That is definite lock-in to the max capacity of that chassis.

When you scale past the first blade system you’ll probably purchase another from the same vendor, because you know and understand its unique architecture, that’s not lock-in, that’s foothold.


Lock-in happens but foothold is more common.  When you here a vendor, partner, etc. say product X will lock you in to vendor Y make that person explain in detail what they mean.  Chances are you’re not getting locked-in to anything.  If you are getting locked-in, know the limits of that lock-in and make an intelligent decision on whether that lock-in is worth the advantages that made you consider the product in the first place, they very well might be.

GD Star Rating

Bacon And Eggs as a Service (BAEaaS) at VMworld

Yeah I know the ‘and’ between bacon and eggs should be lower case but that just looks silly, let’s move on 😉


BAEaaS is a recovery tweetup following the previous nights festivities.  It was originally scheduled for Tuesday but due to popular demand has been moved to Wednesday (mainly because the amount of large parties going on Monday night may negate people’s desire to get out of bed any earlier than sessions require.)

Breakfast will go from 7:00am – 9:00am with the intent that people will filter in and out throughout the two hour period.  There are no reservations and Mel’s is a diner so it’s first come first serve, best bet is to come by 7 or at 8 (breakfast shift work.)

Come eat bacon #vmworld3word



Note: BAEaaS is a BYOB event (Buy Your Own Breakfast.)  That being said with plenty of partners and vendors around mention an interest and a budget and you may make it on somebody’s expenses.


twtvite: http://twtvite.com/BAEaaS

Wednesday 9/1 – 7:00am –9:00am Pacific


801 Mission St San Francisco CA 94103


For any overflow there are two close by options where groups can go: Denny’s and Starbucks depending on your preferences.  See all locations on the map below:

Green Pin: Moscone Center

Red Pin: Mel’s Drive-In

Blue Pin: Denny’s

Orange Pin: Starbucks

image  http://bit.ly/8YWTvD

Special Thanks:

@juiceLSU009 the brains behind the idea

@tscalzott for naming the location which looks to be fantastic food

Extra Special Thanks:

@crystal_lowe Crystal provided me with wonderful suggestions and contacts for properly planning an event such as this.  Due to time constraints, budget, personal laziness, etc. I ignored all of her suggestions.  If any part of the BAEaaS is not up to snuff please ensure you retweet Crystal’s well deserved ‘I told you so.’

If you’re not familiar with Crystal and have a spouse that attends industry events with you, it’s time to familiarize yourself.  Here are links to what she’s got planned this year for the spouses.

Special note: you cannot exchange you actual VMworld pass for a Spousetivities pass although you’ll want to.




GD Star Rating

Dell, Backing the Right Horse in the Wrong Race

Horse racingWith Dell’s announced acquisition of 3par I’ve been pondering the question of what it is they’re thinking.  I’ve been scouring the blogs looking for an answer and there is none that resonates well with me.  Most of what I find states they picked a good horse and that the business behind buying a horse to race makes sense, but nobody asks are they in the right race.  The separate races I’m talking about are private and public clouds.

Dell bid on 3Par which is a small high end storage company with a product line positioned to compete with EMC and Hitachi for some use cases.  This complements Dell’s own storage offering which was built upon the Equalogix iSCSI storage acquisition and geared toward the SMB space.  Dell also has had a traditionally strong partnership with EMC and resold a great deal of EMC storage where Equalogix was not a good fit.  The Equalogix acquisition did not appear to damage the Dell EMC partnership significantly but by adding 3par to the mix this may change.  On the other hand EMC is heavily backing Cisco UCS, so this may very well be a defensive play.

So what is Dell’s play expanding their internal storage capabilities and risking damage to a profitable partnership with EMC?  Most of the analysis I find states that Dell is looking to grow data center revenue to regain profit they are losing to HP in the desktop/laptop space.  In order to do this they are putting together more of the key hardware components of private cloud architectures.  The thinking being that they will try and put together an offering to compete with vBlock, Matrix, SMT, CloudBurst, etc. 

At first glance this all makes sense, Dell doesn’t want to be left without a horse in the private cloud race so they make some moves and acquisitions and get their offering in place, late, but maybe not too late.  On the flip side they can utilize the small market share 3par has as an avenue for Dell server sales, and reversely use Dell server sales to boost 3par’s struggling sales.  With any luck Dell will have the same success with 3par that they did with Equalogix.  That’s what I see at first glance, upon further thought there are more concerns:

  • Can Dell’s sales force handle the addition, especially while they’re battling a new server vendor (Cisco) with a lot of marketing dollars to spend and a strong partner/manufacturer ecosystem?  Will the Dell account reps and engineers be able to incorporate 3par into they’re offering without cannibalizing Equalogix business or muddying the waters enough for a competitor to move in?
  • Will a customer requiring top Tier storage that they traditionally turned to Hitachi or EMC for be willing to accept a Dell solution?  If they do will they be willing to put those top tier applications on Dell servers?  Dell is not traditionally looked at as high-end servers, and they really only recently (last six months) started adding any innovation into their server product line.
  • Will the customers buying Dell servers now have any interest in an upper tier storage array?

The most important question in my mind: Is Dell putting their horse in the right race?

Dell is looking to attack the enterprise and federal data center where private cloud will be a big play.  This is the home of solid high performance, feature rich, innovative platforms.  It’s also a place where trust means everything, i.e. ‘Nobody gets fired for buying vendor x.’  Dell is not vendor X, they’ve typically competed solely on price.  Moving heavily into this market they will be in constant battle with HP, IBM, EMC, NetApp, Cisco and others.

I think Dell is missing an opportunity to execute on their traditional strengths and attack public cloud markets with a unique offering.  Public cloud is all about massive scale and the intelligence, redundancy, etc. is built into the software layers.  This means that a company who can effectively deliver bulk, reliable, low cost servers, storage and networking will have a very strong offering.  The HP’s, Cisco’s, IBM’s etc. will have a much harder time selling into this space due to cost.  Their products have traditionally been more about performance and usability features which may not have a strong a message in the public cloud.


Dell solidly executed on the merger and acquisition of Equalogix and has had great success there providing a low-end, low-cost storage system paired perfectly with their server offering.  The 3par acquisition and recent Dell innovations in their server offering show a preview of a new model for Dell.  Whether this is a successful model or not is yet to be seen.  From my point of view successful or not Dell would be better suited to pairing their traditional business to public cloud solutions and creating a new market for themselves with less competition.

GD Star Rating

SMT, Matrix and Vblock: Architectures for Private Cloud

Cloud computing environments provide enhanced scalability and flexibility to IT organizations.  Many options exist for building cloud strategies, public, private etc.  For many companies private cloud is an attractive option because it allows them to maintain full visibility and control of their IT systems.  Private clouds can also be further enhanced by merging private cloud systems with public cloud systems in a hybrid cloud.  This allows some systems to gain the economies of scale offered by public cloud while others are maintained internally.  Some great examples of hybrid strategies would be:

  • Utilizing private cloud for mission critical applications such as SAP while relying on public cloud for email systems, web hosting, etc.
  • Maintaining all systems internally during normal periods and relying on the cloud for peaks.  This is known as Cloud Bursting and is excellent for workloads that cycle throughout the day, week, month or year.
  • Utilizing private cloud for all systems and capacity while relying on cloud based Disaster Recovery (DR) solutions.

Many more options exist and any combination of options is possible.  If private cloud is part of the cloud strategy for a company there is a common set of building blocks required to design the computing environment.


In the diagram above we see that each component builds upon one another.  Starting at the bottom we utilize consolidated hardware to minimize power, cooling and space as well as underlying managed components.  At the second tier of the private cloud model we layer on virtualization to maximize utilization of the underlying hardware while providing logical separation for individual applications. 

If we stop at this point we have what most of today’s data centers are using to some extent or moving to.  This is a virtualized data center.  Without the next two layers we do not have a cloud/utility computing model.  The next two layers provide the real operational flexibility and organizational benefits of a cloud model.

To move out virtualized data center to a cloud architecture we next layer on Automation and Monitoring.  This layer provides the management and reporting functionality for the underlying architecture.  It could include: monitoring systems, troubleshooting tools, chargeback software, hardware provisioning components, etc.  Next we add a provisioning portal to allow the end-users or IT staff to provision new applications, decommission systems no longer in use, and add/remove capacity from a single tool.  Depending on the level of automation in place below some things like capacity management may be handled without user/staff intervention.

The last piece of the diagram above is security.  While many private cloud discussions leave security out, or minimize its importance it is actually a key component of any cloud design.  When moving to private cloud customers are typically building a new compute environment, or totally redesigning an existing environment.  This is the key time to design robust security in from end-to-end because you’re not tied to previous mistakes (we all make them)or legacy design.  Security should be part of the initial discussion for each layer of the private cloud architecture and the solution as a whole.

Private cloud systems can be built with many different tools from various vendors.  Many of the software tools exist in both Open Source and licensed software versions.  Additionally several vendors have private cloud offerings of an end-to-end stack upon which to build design a private cloud system.  The remainder of this post will cover three of the leading private cloud offerings:

Scope: This post is an overview of three excellent solutions for private cloud.  It is not a pro/con discussion or a feature comparison.  I would personally position any of the three architectures for a given customer dependant on customer requirements, existing environment, cloud strategy, business objective and comfort level.  As always please feel free to leave comments, concerns or corrections using the comment form at the bottom of the post.

Secure Multi-Tenancy (SMT):

Vendor positioning:  ‘This includes the industry’s first end-to-end secure multi-tenancy solution that helps transform IT silos into shared infrastructure.’


SMT is a pairing of: VMware vSphere, Cisco Nexus, UCS, MDS, and NetApp storage systems.  SMT has been jointly validated and tested by the three companies, and a Cisco Validated Design (CVD) exists as a reference architecture.  Additionally a joint support network exists for customers building or using SMT solutions.

Unlike the other two systems SMT is a reference architecture a customer can build internally or along with a trusted partner.  This provides one of the two unique benefits of this solution.

Unique Benefits:

  • Because SMT is a reference architecture it can be built in stages married to existing refresh and budget cycles.  Existing equipment can be reutilized or phased out as needed.
  • SMT is designed to provide end-to-end security for multiple tenants (customers, departments, or applications.)

HP Matrix:

Vendor positioning:  ‘The industry’s first integrated infrastructure platform that enables you to reduce capital costs and energy consumption and more efficiently utilize the talent of your server administration teams for business innovation rather than operations and maintenance.’


Matrix is a integration of HP blades, HP storage, HP networking and HP provisioning/management software.  HP has tested the interoperability of the proven components and software and integrated them into a single offering. 

Unique benefits:

  • Of the three solutions Matrix is the only one that is a complete solution provided by a single vendor.
  • Matrix provides the greatest physical server scalability of any of the three solutions with architectural limits of thousands of servers.


Vendor positioning:  ‘The industry’s first completely integrated IT offering that combines best-in-class virtualization, networking, computing, storage, security, and management technologies with end-to-end vendor accountability.’


Vblocks are a combination of EMC software and storage storage, Cisco UCS, MDS and Nexus, and VMware virtualization.  Vblocks are complete infrastructure packages sold in one of three sizes based on number of virtual machines.  Vblocks offer a thoroughly tested and jointly supported infrastructure with proven performance levels based on a maximum number of VMs. 

Unique Benefits:

  • Vblocks offer a tightly integrated best-of-breed solution that is purchased as a single product.  This provides very predictable scalability costs when looked at from a C-level perspective (i.e. x dollars buys y scalability, when needs increase x dollars will be required for the next block.)
  • Vblock is supported by a unique partnering between Cisco, EMC and VMware as well as there ecosystem of channel partners.  This provides robust front and backend support for customer before during and after install.


Private cloud can provide a great deal of benefits when implemented properly, but like any major IT project the benefits are greatly reduced by mistakes and improper design.  Pre-designed and tested infrastructure solutions such as the ones above provide customers a proven platform on which they can build a private cloud.

GD Star Rating

Why You’re Ready to Create a Private Cloud

I’m catching up on my reading and ran into David Linthicum’s ‘Why you’re not ready to create a private cloud’ (http://www.infoworld.com/d/cloud-computing/why-youre-not-ready-create-private-cloud-458.)  It’s a great article and points out a major issue with private-cloud adoption – internal expertise.  The majority of data center teams don’t have the internal expertise required to execute effectively on private-cloud architectures.  This isn’t a knock on these teams, it’s a near impossibility to have and maintain that internal expertise.  Remember back when VMware was gaining adoption.  Nobody had virtualization knowledge so they learned it on the fly.  As people became experts many times they left the nest where they learned it in search of bigger better worms.  More importantly because it was a learn-as-you-go process the environments were inherently problematic and were typically redesigned several times to maximize performance and benefit.

Looking at the flip side of that coin, what is the value to the average enterprise or federal data center in retaining a private cloud architect?  If they’re good at their job they only do it once.  Yes there will be optimization and performance assessments to maintain it, but that’s not typically a full time job. The question becomes:  Because you don’t have the internal expertise to build a private cloud should you ignore the idea or concept?  I would answer a firm no.

The company I work for has the ability, reseller agreements, service offerings and expertise to execute on private clouds.  We’re capable of designing and deploying these solutions from the data center walls to the provisioning portals with experts on hand that have experience in each aspect, and enough overlap to tie it all together.  To put our internal capabilities in perspective one of my companies offerings is private cloud containers and ruggedized deployable private cloud racks.  These aren’t throw some stuff in a box solutions they are custom designed containers outfitted with shock absorption, right-sized power/cooling, custom rack rails providing full equipment serviceability and private cloud IT architectures built on several industry leading integrated platforms. That’s a very unique home grown offering for a systems integrator (typically DC containers are the space of IBM, Sun, etc.)  I accepted this position for these reasons, among others. 

This is not an advertisement for my company but instead an example of why you’re ready to build private cloud infrastructures.  You should not expect to have the internal expertise to architect and build a private cloud infrastructure, you should utilize industry experts to assist with your transition.  There are two major methods of utilizing experts to assess, design, and deploy a private cloud: a capable reseller/solutions provider or a capable consultant/consulting firm.  Both methods have pros and cons.

Reseller/Systems Integrator:

Utilizing a reseller and systems integrator has some major advantages in the form of what is provided at no cost and having a one stop shop for design, purchase, and deployment.  Typically when working with a reseller much of the upfront consulting and design is provided free, this is because it is considered pre-sales and the hardware sale is where they make their money.  With complex systems and architectural designs such as Virtual Desktop Infrastructures (VDI) and cloud architectures don’t expect everything to be cost free, but good portions will be.  These type of deployments require in depth assessment and planning sessions, some of which will be paid engagements but are typically low overall cost and vital to success.  For example you won’t deploy VDI successfully without first understanding your applications in depth.  Application assessments are extended technical engagements.

Another advantage of using a reseller is that the hardware design, purchase and and installation can all be provided from the same company.  This simplifies the overall process and provides the ever so important ‘single-neck-to-choke.’  If something isn’t right before, during or after the deployment a good reseller will be able to help you coordinate actions to repair the issue without you having to call 10 separate vendors.

Lastly a reseller of sufficient size to handle private cloud design and migration will have an extensive pool of technical resources to draw upon during the process both internally and through vendor relationships, which means the team your working with has back-end support in several disciplines and product areas.

There are also some potential downsides to using a reseller that you’ll want to fully understand.  First a reseller typically partners with a select group of vendors that they’ve chosen.  This means that the architectural design will tend to revolve around those vendors.  This is not necessarily a  bad thing as long as:

  • The reseller takes an ‘independent view’ and has multiple vendors to choose from for the solutions they provide.  This allows them to pick the right fit for your environment.
  • You ensure the reseller explains to your satisfaction the reasoning behind the hardware choices they’ve positioned and the advantages of that hardware.

Obviously a reseller is in the business of making a sale, but a good reseller will leverage their industry knowledge and vendor relationships to build the right solution for the customer. Another note is even if your reseller doesn’t partner with a specific vendor, they should be able to make appropriate arrangements to include anything you choose in your design.

Consultant/Consulting Firm:

Utilizing a consultant is another good option for designing and deploying a private-cloud.  A good consultant can help assess the existing environment/organization and begin to map out an architecture and road map to build the private cloud.  One advantage of a consultant will be the vendor independence you’ll have with an independent consultant or firm.  Once they’ve helped you map out the architecture and roadmap they can typically work with you during with the purchase process through vendors or resellers.

Some potential drawbacks to independent consultants will be identifying a reliable individual or team with the proper capabilities to outline a cloud strategy. The best bet here to minimize risk here will be to use references from colleagues that have made the transition, trusted vendors, etc.  Excellent cloud architecture consultants exist, you’ll just need to find the right fit.

Hybrid Strategy:

These two options are never mutually exclusive.  In many cases I’d recommend working with a trusted reseller and utilizing an independent consultant as well.  There are benefits to this approach, one the consultant can assist to ‘keep the reseller honest’ and additionally should be able to provide alternative opinions and design considerations.


Migrating to cloud is not an overnight process and most likely not something that can be planned for, designed and implemented using all internal resources.  When making the decision to move to cloud utilize the external resources available to you. As one last word of caution, don’t even bother looking at cloud architectures until your ready to align your organization to the flexibility provided by cloud, a cloud architecture is of no value to a silo driven organization (see my post ‘The Organizational Challenge’ for more detail: http://www.definethecloud.net/?p=122)

GD Star Rating

Networking Showdown: UCS vs. HP Virtual Connect (Updated)

Note: I have made updates to reflect that Virtual Connect is two words, and technical changes to explain another method of network configuration within Virtual Connect that prevents the port blocking described below.  Many thanks to the Ken Henault at HP who graciously walked me through the corrections, and beat them into my head until I understood them.

I’m sitting on a flight from Honolulu to Chicago in route home after a trip to Hawaii for customer briefings.  I was hoping to be asleep at this point but a comment Ken Henault left on my ‘FlexFabric – Small Step, Right Direction’ post is keeping me awake… that’s actually a lie, I’m awake because I’m cramped into a coach seat for 8 hours while my fiancé, who joined me for a couple of days, enjoys the first class luxuries of my auto upgrade, all the comfort in the world wouldn’t make up for the looks I got when we landed if I was the one up front.

So, being that I’m awake anyway I thought I’d address the comment from the aforementioned post.  Before I begin I want to clarify that my last post had nothing to do with UCS, I intentionally left UCS out because it was designed with FCoE in mind from the start so it has native advantages in an FCoE environment.  Additionally within UCS you can’t get away from FCoE, if you want Fibre Channel connectivity your using FCoE so it’s not a choice to be made (iSCSI, NFS, and others are supported but to connect to FC devices or storage it’s FCoE.) The blog was intended to state exactly what it did: HP has made a real step into FCoE with FlexFabric but there is still a long way to go. To see the original post click the link (http://www.definethecloud.net/?p=419.)

I’ve got a great deal of respect for both Ken and HP whom he works for.  Ken knows his stuff, our views may differ occasionally but he definitely gets the technology.  The fact that Ken knows HP blades inside, outside, backwards forwards and has a strong grasp on Cisco’s UCS made his comment even more intriguing to me, because it highlights weak spots in the overall understanding of both UCS and server architecture/design as it pertains to network connectivity. 


This post will cover the networking comparison of HP C-Class using Virtual Connect (VC) modules and Virtual Connect (VC) management as it compares to the Cisco UCS Blade System.  This comparison is the closest ‘apples-to-apples’ comparison that can be done between Cisco UCS and HP C-Class.  Additionally I will be comparing the max blades in a single HP VC domain which is 64 (4 chassis x 16 blades) against 64 UCS blades which would require 8 Chassis.

Accuracy and Objectivity:

It is not my intent to use numbers favorable to one vendor or the other.  I will be as technically accurate as possible throughout, I welcome all feedback, comments and corrections from both sides of the house.

HP Virtual Connect:

VC is an advanced management system for HP C-Class blades that allows 4 blade chassis to be interconnected and managed/networked as a single system.  In order to provide this functionality the LAN/SAN switch modules used must be VC and the chassis must be interconnected by what HP calls a stacking-link.  HP does not consider VC Ethernet modules to be switches, but for the purpose of this discussion they will be.  I make this decision based on the fact that: They make switching decisions and they are the same hardware as the ProCurve line of blade switches.

Note: this is a networking discussion so while VC has other features they are not discussed here.

Let’s take a graphical view of a 4-chassis VC domain.

image In the above diagram we see a single VC domain cabled for LAN and SAN connectivity.  You can see that each chassis is independently connected to SAN A and B for Fibre Channel access, but Ethernet traffic can traverse the stacking-links along with the domain management traffic.  This allows a reduced number of uplinks to be used from the VC domain to the network for each 4 chassis VC domain.  This solution utilizes 13 total links to provide 16 Gbps of FC per chassis (assuming 8GB uplinks) and 20 Gbps of Ethernet for the entire VC domain (with blocking considerations discussed below.)  More links could be added to provide additional bandwidth.

This method of management and port reduction does not come without its drawbacks.  In the next graphic I add loop prevention and server to server communication.


The first thing to note in the above diagram is the blocked link.  When only a single vNet is configured accross the VC Domain (1-4 chassis) only 1 link or link aggregate group may forward per VLAN.  This means that per VC domain there is only one ingress or egress point to the network per VLAN.  This is because VC is not ‘stacking’ 4 switches into one distributed switch control plane but instead ‘daisy-chaining’ four independent switches together using an internal loop prevention mechanism.  This means that to prevent loops from being caused within the VC domain only one link can be actively used for upstream connectivity per VLAN.

Because of this loop prevention system you will see multiple-hops for frames destined between servers in separate chassis, as well as frames destined upstream in the network.  In the diagram I show a worst case scenario for educational purposes where a frame from a server in the lower chassis must hop three times before leaving the VC domain.  Proper design and consideration would reduce these hops to two max per VC domain.


This is only one of the methods available for configuring vNets within a VC domain.  The second method will allow both uplinks to be configured using separate vNets which allows each uplink to be utilized even within the same VLANs but segregates that VLAN internally.  The following diagram shows this configuration.


In this configuration server NIC pairs will be configured to each use one vNet and NIC teaming software will provide failover.  Even though both vNets use the same VLAN the networks remain separate internally which prevents looping, upstream MAC address instability etc.  For example a server utilizing only two onboard NICs would have one NIC in vNet1 and one in vNet2.  In the event of an uplink failure for vNet1 the NIC in that vNet would have no north/south access but NIC teaming software could be relied upon to force traffic to the NIC in vNet 2. 

While both methods have advantages and disadvantages this will typically be the preferred method to avoid link blocking and allow better bandwidth utilization.  In this configuration the center two chassis will still require an extra one or two hops to send/receive north/south traffic depending on which vNet is being used.

**End Update**

The last thing to note is that any Ethernet cable reduction will also result in lower available bandwidth for upstream/northbound traffic to the network.  For instance in the top example above only one link will be usable per VLAN.  Assuming 10GE links, that leaves 10G bandwidth upstream for 64 servers.  Whether that is acceptable or not depends on the particular I/O profile of the applications.  Additional links may need to be added to provide adequate upstream bandwidth.  That brings us to our next point:

Calculating bandwidth needs:

Before making a decision on bandwidth requirements it is important to note the actual characteristics of your applications.  Some key metrics to help in design are:

  • Peak Bandwidth
  • Average Bandwidth
  • East/West traffic
  • North/South Traffic

For instance, using the example above, if all of my server traffic is East/West within a single chassis then the upstream link constraints mentioned are mute points.  If the traffic must traverse multiple chassis the stacking-link must be considered.  Lastly if traffic must also move between chassis as well as North/South to the network, uplink bandwidth becomes critical.  With networks it is common to under-architect and over-engineer, meaning spend less time designing and throw more bandwidth at the problem, this does not provide the best results at the right cost.

Cisco Unified Computing System:

Cisco UCS takes a different approach to providing I/O to the blade chassis.  Rather than placing managed switches in the chassis UCS uses a device called an I/O Module or Fabric Extender (IOM/FEX) which does not make switching decisions and instead passes traffic based on an internal pinning mechanism.  All switching is handled by the upstream Fabric Interconnects (UCS 6120 or 6140.)  Some will say the UCS Fabric Interconnect is ‘not-a-switch’ using the same logic as I did above for HP VC devices the Fabric Interconnect is definitely a switch.  In both operational modes the interconnect will make forwarding decisions based on MAC address.

One major architectural difference between UCS and HP, Dell, IBM, Sun blade implementations is that the switching and management components are stripped from the individual chassis and handled in the middle of row by a redundant pair of devices (fabric interconnects.)  These devices replace the LAN Access and SAN edge ports that other vendors Blade devices connect to.  Another architectural difference is that the UCS system never blocks server links to prevent loops (all links are active from the chassis to the interconnects) and in the default mode, End Host mode it will not block any upstream links to the network core.  For more detail on these features see my posts: Why UCS is my ‘A-Game Server Architecture http://www.definethecloud.net/?p=301, and UCS Server Failover http://www.definethecloud.net/?p=359.) 

A single UCS implementation can scale to  a max 40 Chassis 320 servers using a minimal bandwidth configuration, or 10 chassis 80 servers using max bandwidth depending on requirements.  There is also flexibility to mix and match bandwidth needs between chassis etc.  Current firmware limits a single implementation to 12 chassis (96 servers) for support and this increases with each major release.  Let’s take a look at the 8 chassis 64 server implementation for comparison to an equal HP VC domain.


In the diagram above we see an 8 chassis 64 server implementation utilizing the minimum number of links per chassis to provide redundancy (the same as was done in the HP example above.  Here we utilize 16 links for 8 chassis providing 20Gbps of LAN and SAN traffic to each chassis.  Because there is no blocking required for loop-prevention all links shown are active.  Additionally because the Fabric Interconnects shown here in green are the access/edge switches for this topology all east/west traffic between servers in a single chassis or across chassis is fully accounted for.  Depending on bandwidth requirements additional uplinks could be added to each chassis.  Lastly there would be no additional management cables required from the interconnects to the chassis as all management is handled on dedicated, prioritized internal VLANs.

In the system above all traffic is aggregated upstream via the two fabric interconnects, this means that accounting for North/South traffic is handled by calculating the bandwidth needs of the entire system and designing the appropriate number of links.

Side by Side Comparison:

imageIn the diagram we see a maximum server scale VC Domain compared to an 8 chassis UCS domain.  The diagram shows both domains connected up to a shared two-tier SAN design (core/edge) and 3 tier network design (Access, Aggregation, Core.)  In the UCS domain all access layer connectivity is handled within the system.

In the next diagram we look at an alternative connectivity method for the HP VC domain utilizing the switch modules in the HP chassis as the access layer to reduce infrastructure.


In this method we have reduced the switching infrastructure by utilizing the onboard switching of the HP chassis as the access layer.  The issue here will be the bandwidth requirements and port costs at the LAN aggregation/SAN core.  Depending on application bandwidth requirements additional aggregation/core ports will be required which can be more costly/complex than access connectivity.  Additionally this will increase cabling length requirements in order to tie back to the data center aggregation/core layer. 


When comparing UCS to HP blade implementations a base UCS blade implementation is best compared against a single VC domain in order to obtain comparable feature parity.  The total port and bandwidth counts from the chassis for a minimum redundant system are:

  HP Cisco
Total uplinks 13 16
Gbps FC 16 per chassis N/A
Gbps Ethernet 10 per VLAN per VC Domain/ 20 Total N/A
Consolidated I/O N/A 20 per chassis
Total Chassis I/O 21 Gbps for 16 servers 20 Gbps for 8 servers


This does not take into account the additional management ports required for the VC domain that will not be required by the UCS implementation.  An additional consideration will be scaling beyond 64 servers.  With this minimal consideration the Cisco UCS will scale to 40 chassis 320 servers where the HP system scales in blocks of 4 chassis as independent VC domains.  While multiple VC domains can be managed by a Virtual Connect Enterprise Manager (VCEM) server the network stacking is done per 4 chassis domain requiring North/South traffic for domain to domain communication.

The other networking consideration in this comparison is that in the default mode all links shown for the UCS implementation will be active.  The HP implementation will have one available uplink or port aggregate uplink per VLAN for each VC domain, further restraining bandwidth and/or requiring additional ports.

GD Star Rating

10 Things to Know About Cisco UCS

Bob Olwig VP of Corporate Business Development for World Wide Technologies (www.wwt.com) asked me to provide 10 things to know about UCS for his blog http://bobolwig.wordpress.com.  See the post 10 Things to Know About Cisco UCS here: http://bobolwig.wordpress.com/2010/08/04/10-things-to-know-about-cisco-ucs/.

GD Star Rating

FlexFabric – Small Step, Right Direction

Note: I’ve added a couple of corrections below thanks to Stuart Miniman at Wikibon (http://wikibon.org/wiki/v/FCoE_Standards)  See the comments for more.

I’ve been digging a little more into the HP FlexFabric announcements in order to wrap my head around the benefits and positioning.  I’m a big endorser of a single network for all applications, LAN, SAN, IPT, HPC, etc. and FCoE is my tool of choice for that right now.  While I don’t see FCoE as the end goal, mainly due to limitations on any network use of SCSI which is the heart of FC, FCoE and iSCSI, I do see FCoE as the way to go for convergence today.  FCoE provides a seamless migration path for customers with an investment in Fibre Channel infrastructure and runs alongside other current converged models such as iSCSI, NFS, HTTP, you name it.  As such any vendor support for FCoE devices is a step in the right direction and provides options to customers looking to reduce infrastructure and cost.

FCoE is quickly moving beyond the access layer where it has been available for two years now.  That being said the access layer (server connections) is where it provides the strongest benefits for infrastructure consolidation, cabling reduction, and reduced power/cooling.  A properly designed FCoE architecture provides a large reduction in overall components required for server I/O.  Let’s take a look at a very simple example using standalone servers (rack mount or tower.)

imageIn the diagram we see traditional Top-of-Rack (ToR) cabling on the left vs. FCoE ToR cabling on the right.  This is for the access layer connections only.  The infrastructure and cabling reduction is immediately apparent for server connectivity.  4 switches, 4 cables, 2-4 I/O cards reduced to 2, 2, and 2.  This is assuming only 2 networking ports are being used which is not the case in many environments including virtualized servers.  For servers connected using multiple 1GE ports the savings is even greater.

Two major vendor options exist for this type of cabling today:


  • Brocade 8000 – This is a 1RU ToR CEE/FCoE switch with 24x 10GE fixed ports and 8x 1/2/4/8G fixed FC ports.  Supports directly connected FCoE servers. 
    • This can be purchased as an HP OEM product.
  • Brocade FCoE 10-24 Blade – This is a blade for the Brocade DCX Fibre Channel chassis with 24x 10GE ports supporting CEE/FCoE.  Supports directly connected FCoE servers.

Note: Both Brocade data sheets list support for CEE which is a proprietary pre-standard implementation of DCB which is in the process of being standardized with some parts ratified by the IEEE and some pending.  The terms do get used interchangeably so whether this is a typo or an actual implementation will be something to discuss with your Brocade account team during the design phase.  Additionally Brocade specifically states use for Tier 3 and ‘some Tier 2’ applications which suggests a lack of confidence in the protocol and may suggest a lack of commitment to support and future products.  (This is what I would read from it based on the data sheets and Brocade’s overall positioning on FCoE from the start.)


  • Nexus 5000 – There are two versions of the Nexus 5000:
    • 1RU 5010 with 20 10GE ports and 1 expansion module slot which can be used to add (6x 1/2/4/8G FC, 6x 10GE, 8x 1/2/4G FC, or 4x 1/2/4G FC and 4x 10GE)
    • 2RU 5020 with 40 10GE ports and 2 expansion module slots which can be used to add (6x 1/2/4/8G FC, 6x 10GE, 8x 1/2/4G FC, or 4x 1/2/4G FC and 4x 10GE)
    • Both can be purchased as HP OEM products.
  • Nexus 7000 – There are two versions of the Nexus 7000 which are both core/aggregation Layer data center switches.  The latest announced 32 x 1/10GE line card supports the DCB standards.  Along with support for Cisco Fabric path based on pre-ratified TRILL standard.

Note: The Nexus 7000 currently only supports the DCB standard, not FCoE.  FCoE support is planned for Q3CY10 and will allow for multi-hop consolidated fabrics.

Taking the noted considerations into account any of the above options will provide the infrastructure reduction shown in the diagram above for stand alone server solutions.

When we move into blade servers the options are different.  This is because Blade Chassis have built in I/O components which are typically switches.  Let’s look at the options for IBM and Dell then take a look at what HP and FlexFabric bring to the table for HP C-Class systems.


  • BNT Virtual Fabric 10G Switch Module – This module provides 1/10GE connectivity and will support FCoE within the chassis when paired with the Qlogic switch discussed below.
  • Qlogic Virtual Fabric Extension Module – This module provides 6x 8GB FC ports and when paired with the BNT switch above will provide FCoE connectivity to CNA cards in the blades.
  • Cisco Nexus 4000 – This module is an DCB switch providing FCoE frame delivery while enforcing DCB standards for proper FCoE handling.  This device will need to be connected to an upstream Nexus 5000 for Fibre Channel Forwarder functionality.  Using the Nexus 5000 in conjunction with one or more Nexus 4000s provides multi-hop FCoE for blade server deployments.
  • IBM 10GE Pass-Through – This acts as a 1-to-1 pass-through for 10GE connectivity to IBM blades.  Rather than providing switching functionality this device provides a single 10GE direct link for each blade.  Using this device IBM blades can be connected via FCoE to any of the same solutions mentioned under standalone servers.

Note: Versions of the Nexus 4000 also exist for HP and Dell blades but have not been certified by the vendors, currently only IBM supports the device.  Additionally the Nexus 4000 is a standards compliant DCB switch without FCF capabilities, this means that it provides the lossless delivery and bandwidth management required for FCoE frames along with FIP snooping for FC security on Ethernet networks, but does not handle functions such as encapsulation and de-encapsulation.  This means that the Nexus 4000 can be used with any vendor FCoE forwarder (Nexus or Brocade currently) pending joint support from both companies.


  • Dell 10GE Pass-Through – Like the IBM pass-through the Dell pass-through will allow connectivity from a blade to any of the rack mount solutions listed above.

Both Dell and IBM offer Pass-Through technology which will allow blades to be directly connected as a rack mount server would.  IBM additionally offers two other options: using the Qlogic and BNT switches to provide FCoE capability to blades, and using the Nexus 4000 to provide FCoE to blades. 

Let’s take a look at the HP options for FCoE capability and how they fit into the blade ecosystem.


  • 10GE Pass-Through – HP also offers a 10GE pass-through providing the same functionality as both IBM and Dell.
  • HP FlexFabric – The FlexFabric switch is a Qlogic FCoE switch OEM’d by HP which provides a configurable combination of FC and 10GE ports upstream and FCoE connectivity across the chassis mid-plane.  This solution only requires two switches for redundancy as opposed to four with FC and Ethernet configurations.  Additionally this solution works with HP FlexConnect providing 4 logical server ports for each physical 10GE link on a blade, and is part of the VirtualConnect solution which reduces the management overhead of traditional blade systems through  software.

On the surface FlexFabric sounds like the way to go with HP blades, and it very well may be, but let’s take a look at what it’s doing for our infrastructure/cable consolidation.


With the FlexFabric solution FCoE exists only within the chassis and is split to native FC and Ethernet moving up to the Access or Aggregation layer switches.  This means that while reducing the number of required chassis switch components and blade I/O cards from four to two there has been no reduction in cabling.  Additionally HP has no announced roadmap for a multi-hop FCoE device and their current offerings for ToR multi-hop are OEM Cisco or Brocade switches.  Because the HP FlexFabric switch is a Qlogic switch this means any FC or FCoE implementation using FlexFabric connected to an existing SAN will be a mixed vendor SAN which can pose challenges with compatibility, feature/firmware disparity, and separate management models.

HP’s announcement to utilize the Emulex OneConnect adapter as the LAN on motherboard (LOM) adapter makes FlexFabric more attractive but the benefits of that LOM would also be recognized using the 10GE Pass-Through connected to a 3rd party FCoE switch, or a native Nexus 4000 in the chassis if HP were to approve and begin to OEM he product.


As the title states FlexFabric is definitely a step in the right direction but it’s only a small one.  It definitely shows FCoE commitment which is fantastic and should reduce the FCoE FUD flinging.  The main limitation is the lack of cable reduction and the overall FCoE portfolio.  For customers using, or planning to use VirtualConnect to reduce the management overhead of the traditional blade architecture this is a great solution to reduce chassis infrastructure.  For other customers it would be prudent to seriously consider the benefits and drawbacks of the pass-through module connected to one of the HP OEM ToR FCoE switches.

GD Star Rating