Server Networking With gen 2 UCS Hardware

** this post has been slightly edited thanks to feedback from Sean McGee**

In previous posts I’ve outlined:

If you’re not familiar with UCS networking I suggest you start with those for background.  This post is an update to those focused on UCS B-Series server to Fabric Interconnect communication using the new hardware options announced at Cisco Live 2011.  First a recap of the new hardware:

The UCS 6248UP Fabric Interconnect

The 6248 is a 1RU model that provides 48 universal ports (1G/10G Ethernet or 1/2/4/8G FC.)  This provides 20 additional ports over the 6120 in the same 1RU form factor.  Additionally the 6248 is lower latency at 2.0us from 3.2us previously.

The UCS 2208XP I/O Module

The 2208 doubles the total uplink bandwidth per I/O module providing a total of 160Gbps total throughput per 8 blade chassis.  It quadruples the number of internal 10G connections to the blades allowing for 80Gbps per half-width blade.

UCS 1280 VIC

The 1280 VIC provides 8x10GE ports total, 4x to each IOM for a total of 80Gbps per half-width slot (160 Gbs with 2x in a full-width blade.)  It also double the VIF numbers of the previous VIC allowing for 256 (theoretical)  vNICs or vHBAs.  The new VIC also supports port-channeling to the UCS 2208 IOM and iSCSI boot.

The other addition that affects this conversation is the ability to port-channel the uplinks from the 2208 IOM which could not be done before (each link on a 2104 IOM operated independently.)  All of the new hardware is backward compatible with all existing UCS hardware.  For more detailed information on the hardware and software announcements visit Sean McGee’s blog where I stole these graphics: http://www.mseanmcgee.com/2011/07/ucs-2-0-cisco-stacks-the-deck-in-las-vegas/.

Let’s start by discussing the connectivity options from the Fabric Interconnects to the IOMs in the chassis focusing on all gen 2 hardware.

There are two modes of operation for the IOM: Discrete and Port-Channel.  in both modes it is possible to configure 1, 2 , 4, or 8 uplinks from each IOM in either Discrete mode (non-bundled) or port-channel mode (bundled.)

UCS 2208 fabric Interconnect Failover

image

Discrete Mode:

In discrete mode a static pinning mechanism is used mapping each blade to a given port dependent on number of uplinks used.  This means that each blade will have an assigned uplink on each IOM for inbound and outbound traffic.  In this mode if a link failure occurs the blade will not ‘re-pin’ on the side of the failure but instead rely on NIC-Teaming/bonding or Fabric Failover for failover to the redundant IOM/Fabric.  The pinning behavior is as follows with the exception of 1-Uplink (not-shown) in which all blades use the only available Port:

2 Uplinks

Blade

Port 1

Port 2

Port 3

Port 4

Port 5

Port 6

Port 7

Port 8

1

image

2

image

3

image

4

image

5

image

6

image

7

image

8

image

4 Uplinks

Blade

Port 1

Port  2

Port 3

Port 4

Port 5

Port 6

Port 7

Port 8

1

image

2

image

3

image

4

image

5

image

6

image

7

image

8

image

8 Uplinks

Blade

Port 1

Port2

Port 3

Port 4

Port 5

Port 6

Port 7

Port 8

1

image

2

image

3

image

4

image

5

image

6

image

7

image

8

image

The same port-pinning will be used on both IOMs, therefore in a redundant configuration each blade will be uplinked via the same port on separate IOMs to redundant fabrics.  The draw of discrete mode is that bandwidth is predictable in link failure scenarios.  If a link fails on one IOM that server will fail to the other fabric rather than adding additional bandwidth draws on the active links for the failure side.  In summary it forces NIC-teaming/bonding or Fabric Failover to handle failure events rather than network based load-balancing.  The following diagram depicts the failover behavior for server three in an 8 uplink scenario.

Discrete Mode Failover

image

In the previous diagram port 3 on IOM A has failed.  With the system in discrete mode NIC-teaming/bonding or Fabric Failover handles failover to the secondary path on IOM B (which is the same port (3) based on static-pinning.)

Port-Channel Mode:

In Port-Channel mode all available links are bonded and a port-channel hashing algorithm (TCP/UDP + Port VLAN, non-configurable) is used for load-balancing server traffic.  In this mode all server links are still ‘pinned’ but they are pinned to the logical bundle rather than individual IOM uplinks.  The following diagram depicts this mode.

Port-Channel Mode

image

In this scenario when a port fails on an IOM port-channel load-balancing algorithms handle failing the server traffic flow to another available port in the channel.  This failover will typically be faster than NIC-teaming/bonding failover.  This will decrease the potential throughput for all flows on the side with a failure, but will only effect performance if the links are saturated.  The following diagram depicts this behavior.

image

In the diagram above Blade 3 was pinned to Port 1 on the A side.  When port 1 failed port 4 was selected (depicted in green) while fabric B port 6 is still active leaving a potential of 20 Gbps.

Note: Actual used ports will vary dependent on port-channel load-balancing.  These are used for example purposes only.

As you can see the port-channel mode enables additional redundancy and potential per-server bandwidth as it leaves two paths open.  In high utilization situations where the links are fully saturated this will degrade throughput of all blades on the side experiencing the failure.  This is not necessarily a bad thing (happens with all port-channel mechanisms), but it is a design consideration.  Additionally port-channeling in all forms can only provide the bandwidth of a single link per flow (think of a flow as a conversation.)  This means that each flow can only utilize 10Gbps max even though 8x10Gbps links are bundled.  For example a single FTP transfer would max at 10Gbps bandwidth, while 8xFTP transfers could potentially use 80Gbps (10 per link) dependent on load-balancing.

Next lets discuss server to IOM connectivity (yes I use discuss to describe me monologuing in print, get over it, and yes I know monologuing isn’t a word) I’ll focus on the new UCS 1280 VIC because all other current cards maintain the same connectivity.  the following diagram depicts the 1280 VIC connectivity.

image

The 1280 VIC utilizes 4x10Gbps links across the mid-pane per IOM to form two 40Gbps port-channels.  This provides for 80Gbps total potential throughput per card.  This means a half-width blade has a total potential of 80Gbps using this card and a full-width blade can receive 160Gbps (of course this is dependent upon design.)  As with any port-channel, link-bonding, trunking or whatever you may call it, any flow (conversation) can only utilize the max of one physical link (or back plane trace) of bandwidth.  This means every flow from any given UCS server has a max potential bandwidth of 10Gbps, but with 8 total uplinks 8 different flows could potentially utilize 80Gbps.

This becomes very important with things like NFS-based storage within hypervisors.  Typically a virtualization hypervisor will handle storage connectivity for all VMs.  This means that only one flow (conversation) will occur between host and storage.  In these typical configurations only 10Gbps will be available for all VM NFS data traffic even though the host may have a potential 80Gbps bandwidth.  Again this is not necessarily a concern, but a design consideration as most current/near-future hosts will never use more than 10Gbps of storage I/O.

Summary:

The new UCS hardware packs a major punch when it comes to bandwidth, port-density and failover options.  That being said it’s important to understand the frame flow, port-usage and potential bandwidth in order to properly design solutions for maximum efficiency.  As always comments, complaints and corrections are quite welcome!

GD Star Rating
loading...

Technology Passion

The May 24th IDC report on server market share by the IDC validated a technology I’ve been passionate about for some time; Cisco unified Computing System (UCS.)  For the first time since UCS’s launch two years ago Cisco reported server earnings to IDC with amazing result – #3 in global Blade Server market share and 1.6% factory revenue share overall for servers as a whole.  Find the summary of blades by Kevin Houston here: http://bladesmadesimple.com/2011/05/q1-2011-idc-worldwide-server-market-shows-blade-server-leader-as/ and the IDC report here: http://www.idc.com/getdoc.jsp?containerId=prUS22841411

This report shows that in two years Cisco has either taken significant market share from incumbents, driven new demand, or both.  Regardless of where the numbers came from they are impressive, as far as servers go it’s close to David and Goliath proportions and still playing out with Cisco about 1% behind IBM in the #2 spot.  I have been a ‘cheerleader’ for UCS for nearly its entire existence but didn’t start that way.  I describe the transition here: http://www.definethecloud.net/why-cisco-ucs-is-my-a-game-server-architecture

Prior to Cisco UCS I was a passionate IBM BladeCenter advocate, great technology, reliable hardware and a go-to brand.  I was passionate about IBM.  When IBM launched the BladeCenter H they worked hard to ensure customer investment protection and in doing so anchored the H chassis as a whole.  They hindered technical enhancements and created complexity to ensure the majority of components customers purchased in BladeCenter E would be forward compatible.  At the time I liked this concept, and IBM had several great engineering concepts built in that provided real value. 

In the same time frame HP released the C-Class blade chassis which had no forward/backward compatibility with previous HP blade architectures but used that fresh slate to build a world class platform that had the right technology for the time with the scalability to move far into the future.  At that point from a technical perspective I had no choice but to concede HP as the technical victor but I still whole-heartedly recommended IBM because the technical difference was minimal enough that IBM’s customer investment protection model made them the right big picture choice in my eyes.   

I always work with a default preference or what I call an ‘A-Game’ as described in the link above, but my A-Game is constantly evolving.  As I discover a new technology that will work in the spaces I exist I assess it against my A-Game and decide whether it can provide better value to 80% or more of the customer base I work with.  When a technology is capable of displacing my A-Game I replace it.

Sean McGee (http://www.mseanmcgee.com/) says it better than I can, so I’ll paraphrase him ‘I’m a technologist, I work with and promote the best technology I’m aware of and can’t support a product once I know a better one exists.’

In the same fashion I’ll support and promote Cisco UCS until a better competitor proves itself, and I’m happy to see that customers agree based on the IDC reporting.

For some added fun here are some great Twitter comments from before the IDC announcement served with a side of crow:

image

GD Star Rating
loading...

OTV and Vplex: Plumbing for Disaster Avoidance

High availability, disaster recovery, business continuity, etc. are all key concerns of any data center design. They all describe separate components of the big concept: ‘When something does go wrong how do I keep doing business.’

Very public real world disasters have taught us as an industry valuable lessons in what real business continuity requires. The Oklahoma City bombing can be at least partially attributed to the concepts of off-site archives and Disaster Recovery (DR.) Prior to that having only local or off-site tape archives was commonly acceptable, data gets lost I get the tape and restore it. That worked well until we saw what happens when you have all the data and no data center to restore to.

September 11th, 2001 taught us another lesson about distance. There were companies with primary data centers in one tower and the DR data center in the other. While that may seem laughable now it wasn’t unreasonable then. There were latency and locality gains from the setup, and the idea that both world class engineering marvels could come down was far-fetched.

With lessons learned we’re now all experts in the needs of DR, right up until the next unthinkable happens ;-). Sarcasm aside we now have a better set of recommended practices for DR solutions to provide Business Continuity (BC.). It’s commonly acceptable that the minimum distance between sites be 50KM away. 50KM will protect from an explosion, a power outage, and several other events, but it probably won’t protect from a major natural disaster such as earthquake or hurricane. If those are concerns the distance increases, and you may end up with more than two data centers.

There are obviously significant costs involved in running a DR data center. Due to these costs the concept of running a ‘dark’ standby data center has gone away. If we pay for: compute, storage, and network we want to be utilizing it. Running Test/Dev systems or other non-frontline mission critical applications is one option, but ideally both data centers could be used in an active fashion for production workloads with the ability to failover for disaster recovery or avoidance.

While solutions for this exist within the high end Unix platforms and mainframes it has been a tough cookie to crack in the x86/x64 commodity server system market. The reason for this is that we’ve designed our commodity server environments as individual application silos directly tied to the operating system and underlying hardware. This makes it extremely complex to decouple and allow the application itself to live resident in two physical locations, or at least migrate non-disruptively between the two. 

In steps VMware and server virtualization.  With VMware’s ability to decouple the operating system and application from the hardware it resides on.  With the direct hardware tie removed, applications running in operating systems on virtual hardware can be migrated live (without disruption) between physical servers, this is known as vMotion.  This application mobility puts us one step closer to active/active datacenters from a Disaster Avoidance (DA) perspective, but doesn’t come without some catches: bandwidth, latency, Layer 2 adjacency, and shared storage.

The first two challenges can be addressed between data centers using two tools: distance and money.  You can always spend more money to buy more WAN/MAN bandwidth, but you can’t beat physics, so latency is dependent on the speed of light and therefore distance.  Even with those two problems solved there has traditionally been no good way to solve the Layer 2 adjacency problem.  By Layer 2 adjacency I’m talking about same VLAN/Broadcast domain, i.e. MAC based forwarding.  Solutions have existed and still exist to provide this adjacency across MAN and WAN boundaries (EoMPLS and VPLS) but they are typically complex and difficult to manage with scale.  Additionally these protocols tend to be cumbersome due to L2 flooding behaviors.

Up next is Cisco with Overlay Transport VLANs (OTV.)  OTV is a Layer 2 extension technology that utilizes MAC routing to extend Layer 2 boundaries between physically separate data centers.  OTV offers both simplicity and efficiency in an L2 extension technology by pushing this routing to Layer 2 and negating the flooding behavior of unknown unicast.  With OTV in place a VLAN can safely span a MAN or WAN boundary providing Layer 2 adjacency to hosts in separate data centers.  This leaves us with one last problem to solve.

The last step in putting the plumbing together for Long Distance vMotion is shared storage.  In order for the magic of vMotion to work, both the server the Virtual Machine (VM) is currently running on, and the server the VM will be moved to must see the same disk.  Regardless of protocol or disk type both servers need to see the files that comprise a VM.  This can be accomplished in many ways dependent on the storage protocol you’re using, but traditionally what you end up with is one of the following two scenarios:image

In the diagram above we see that both servers can access the same disk, but that the server in DC 2, must access the disk across the MAN or WAN boundary, increasing latency and decreasing performance.  The second option is:

image

In the next diagram shown above we see storage replication at work.  At first glance it looks like this would solve our problem, as the data would be available in both data centers, however this is not the case.  With existing replication technologies the data is only active or primary in one location, meaning it can only be read from and written to on a single array.  The replicated copy is available only for failover scenarios.  This is depicted by the P in the diagram.  While each controller/array may own active disk as shown, it’s only accessible on a single side at a single time, that is until Vplex.

EMC’s Vplex provides the ability to have active/active read/write copies of the same data in two places at the same time.  This solves our problem of having to cross the MAN/WAN boundary for every disk operation.  Using Vplex the virtual machine data can be accessed locally within each data center.

image

Putting both pieces together we have the infrastructure necessary to perform a Long Distance vMotion as shown above.

Summary:

OTV and Vplex provide an excellent and unique infrastructure for enabling long-distance vMotion.  They are the best available ‘plumbing’ for use with VMware for disaster avoidance.  I use the term plumbing because they are just part of the picture, the pipes.  Many other factors come into play such as rerouting incoming traffic, backup, and disaster recovery.  When properly designed and implemented for the correct use cases OTV and Vplex provide a powerful tool for increasing the productivity of Active/Active data center designs.

GD Star Rating
loading...

Cisco unified Computing System (UCS) High-Level Overview

I’ve been looking for tools to supplement Power Point, Whiteboard, etc. and Brian Gracely (@bgracely) suggested I try Prezi (www.prezi.com.) Prezi is a very slick tool for non-slide based presentations.   I don’t think it will replace slides or white board for me, but it’s a great supplement.  It’s got a fairly quick learning curve if you watch the quick tutorials.  Additionally it works quite well for mind-mapping, I just throw all of my thoughts on the canvas and then start tying them together, whereas slides are very linear and take more planning.  My favorite feature of Prezi is the ability to break out of the flow, and quickly return to it at any time during a presentation.  I love this because real world discussions never go the way you mapped them out in advance.  To start learning the tool I created the following high-level overview of the Cisco Unified Computing System (UCS.)  This content is fully/usable and recyclable so do with it what you want!

GD Star Rating
loading...

Inter-Fabric Traffic in UCS

It’s been a while since my last post, time sure flies when you’re bouncing all over the place busy as hell.  I’ve been invited to Tech Field Day next week and need to get back in the swing of things so here goes.

In order for Cisco’s Unified Computing System (UCS) to provide the benefits, interoperability and management simplicity it does, the networking infrastructure is handled in a unique fashion.  This post will take a look at that unique setup and point out some considerations to focus on when designing UCS application systems.  Because Fibre Channel traffic is designed to be utilized with separate physical fabrics exactly as UCS does this post will focus on Ethernet traffic only.   This post focuses on End Host mode, for the second art of this post focusing on switch mode use this link: http://www.definethecloud.net/inter-fabric-traffic-in-ucspart-ii.  Let’s start with taking a look at how this is accomplished:

UCS Connectivity

image

In the diagram above we see both UCS rack-mount and blade servers connected to a pair of UCS Fabric Interconnects which handle the switching and management of UCS systems.  The rack-mount servers are shown connected to Nexus 2232s which are nothing more than remote line-cards of the fabric interconnects known as Fabric Extenders.  Fabric Extenders provide a localized connectivity point (10GE/FCoE in this case) without expanding management points by adding a switch.  Not shown in this diagram are the I/O Modules (IOM) in the back of the UCS chassis.  These devices act in the same way as the Nexus 2232 meaning they extend the Fabric Interconnects without adding management or switches.  Next let’s look at a logical diagram of the connectivity within UCS.

UCS Logical Connectivity

imageIn the last diagram we see several important things to note about UCS Ethernet networking:

  • UCS is a Layer 2 system meaning only Ethernet switching is provided within UCS.  This means that any routing (L3 decisions) must occur upstream.
  • All switching occurs at the Fabric Interconnect level.  This means that all frame forwarding decisions are made on the Fabric Interconnect and no intra-chassis switching occurs.
  • The only connectivity between Fabric Interconnects is the cluster links.  Both Interconnects are active from a switching perspective but the management system known as UCS Manger (UCSM) is an Active/Standby clustered application.  This clustering occurs across these links.  These links do not carry data traffic which means that there is no inter-fabric communication within the UCS system and A to B traffic must be handled upstream.

At first glance handling all switching at the Fabric Interconnect level looks as though it would add latency (inter-blade traffic must be forwarded up to the fabric interconnects then back to the blade chassis.)  While this is true, UCS hardware is designed for low latency environments such as High Performance Computing (HPC.)  Because of this design goal all components operate at very low latency.  The Fabric Interconnects themselves operate at approximately 3.2us (micro seconds), and the Fabric Extenders operate at about 1.5us.  This means total roundtrip time blade to blade is approximately 6.2us right inline or lower than most Access Layer solutions.  Equally as important with this design switching between any two blades/servers in the system will occur at the same speed regardless of location (consistent predictable latency.)

The question then becomes how is traffic between fabrics handled?  The answer is that traffic between fabrics must be handled upstream (next hop device(s) shown in the diagrams as the LAN cloud.)  This is an important consideration when designing UCS implementations and selecting a redundancy/load-balancing behavior for server NICs.

Let’s take a look at two examples, first a bare-metal OS (Windows, Linux, etc.) next a VMware server.

Bare-Metal Operating System

image In the diagram above we see two blades which have been configured in an active/passive NIC teaming configuration using separate fabrics (within UCS this is done within the service profile.)  This means that blade 1 is using Fabric A as a primary path with B available for failover and blade 2 is doing the opposite.  In this scenario any traffic sent from blade 1 to blade 2 would have to be handled by the upstream device depicted by the LAN cloud.  This is not necessarily an issue for the occasional frame but will impact performance for servers that communicate frequently.

Recommendation:

For bare-metal operating systems analyze the blade to blade communication requirements and ensure chatty server to server applications are utilizing the same fabric as a primary:

  • When using a card that supports hardware failover provide only one vNIC (made redundant through HW failover) and place its primary path on the same fabric as any other servers that communicate frequently.
  • When using cards that don’t support HW failover use active/passive NIC teaming and ensure that the active side is set to the same fabric for servers that communicate frequently.

VMware Servers

image

In the above diagram we see that the connectivity is the same from a physical perspective but in this case we are using VMware as the operating system.  In this case a vSwitch, vDS, or Cisco Nexus 1000v will be used to connect the VMs within the Hypervisor.  Regardless of VMware switching option the case will be the same.  It is necessary to properly design the the virtual switching environment to ensure that server to server communication is handled in the most efficient way possible.

Recommendation:

  • For half-width blades requiring 10GE or less total throughput, or full-width blades requiring 20GE or less total throughput provide a single vNIC with hardware failover if available or use an active/passive NIC configuration for the VMware switching.
  • For blades requiring the total active/active throughput of available NICs determine application profiles and utilize port-groups (port-profiles with Nexus 1000v) to ensure active paths are the same for application groups which communicate heavily.

Summary:

UCS utilizes a unique switching design in order to provide high bandwidth, low-latency switching with a greatly reduced management architecture compared to competing solutions.  The networking requires a  thorough understanding in order to ensure architectural designs provide the greatest available performance.  Ensuring application groups that utilize high levels of server to server traffic are placed on the same path will provide maximum performance and minimal additional overhead on upstream networking equipment.

GD Star Rating
loading...

My First Podcast: ‘Coffee With Thomas’

I had the pleasure of joining Thomas Jones on his new podcast ‘Coffee With Thomas’.’  His podcast is always good, well put together and about 30 minutes.  It’s done in a very refreshing conversation style as if your having a cup of coffee.  If your interested in listening to us talk technology, UCS, Apple, UFC, and other topics check it out: http://www.niketown588.com/2010/09/coffee-with-thomas-episode-5-wwts.html.

 

Thanks for the opportunity Thomas, that was a lot of fun!

GD Star Rating
loading...

What’s the deal with Quantized Congestion Notification (QCN)

For the last several months there has been a lot of chatter in the blogosphere and Twitter about FCoE and whether full scale deployment requires QCN.  There are two camps on this:

  1. FCoE does not require QCN for proper operation with scale.
  2. FCoE does require QCN for proper operation and scale.

Typically the camps break down as follows (there are exceptions) :

  1. HP camp stating they’ve not yet released a suite of FCoE products because QCN is not fully ratified and they would be jumping the gun.  The flip side of this is stating that Cisco did jumped the gun with their suite of products and will have issues with full scale FCoE.
  2. Cisco camp stating that QCN is not required for proper FCoE frame flow and HP is using the QCN standard as an excuse for not having a shipping product.

For the purpose of this post I’m not camping with either side, I’m not even breaking out my tent.  What I’d like to do is discuss when and where QCN matters, what it provides and why.  The intent being that customers, architects, engineers etc. can decide for themselves when and where they may need QCN.

QCN: QCN is a form of end-to-end congestion management defined in IEEE 802.1.Qau.  The purpose of end-to-end congestion management is to ensure that congestion is controlled from the sending device to the receiving device in a dynamic fashion that can deal with changing bottlenecks.  The most common end-to-end congestion management tool is TCP Windows sizing.

TCP Window Sizing:

With window sizing TCP dynamically determines the number of frames to send at once without an acknowledgement.  It continuously ramps this number up dynamically if the pipe is empty and acknowledgements are being received.  If a packet is dropped due to congestion and an acknowledgement is not received TCP halves the window size and starts the process over.  This provides a mechanism in which the maximum available throughput can be achieved dynamically.

Below is a diagram showing the dynamic window size (total packets sent prior to acknowledgement) over the course of several round trips.  You can see the initial fast ramp up followed by a gradual increase until a packet is lost, from there the window is reduced and the slow ramp begins again.

image If you prefer analogies I always refer to TCP sliding windows as a Keg Stand (http://en.wikipedia.org/wiki/Keg_stand.)

File:Kegstand147.jpg

In the photo we see several gentleman surrounding a keg, with one upside down performing a keg stand.

To perform a keg stand:

  • Place both hands on top of the keg
  • 1-2 Friend(s) lift your feet over your head while you support your body weight on locked-out arms
  • Another friend places the keg’s nozzle in your mouth and turns it on
  • You swallow beer full speed for as long as you can

What the hell does this have to do with TCP Flow Control? I’m so glad you asked.

During a keg stand your friend is trying to push as much beer down your throat as it can handle, much like TCP increasing the window size to fill the end-to-end pipe.  Both of your hands are occupied holding your own weight, and your mouth has a beer hose in it, so like TCP you have no native congestion signaling mechanism.  Just like TCP the flow doesn’t slow until packets/beer drops, when you start to spill they stop the flow.

So that’s an example of end-to-end congestion management.  Within Ethernet and FCoE specifically we don’t have any native end-to-end congestion tools (remember TCP is up on L4 and we’re hanging out with the cool kids at L2.)  No problem though because We’re talking FCoE right?  FCoE is just a L1-L2 replacement for Fibre Channel (FC) L0-L1, so we’ll just use FC end-to-end congestion management… Not so fast boys and girls, FC does not have a standard for end-to-end congestion management, that’s right our beautiful over engineered lossless FC has no mechanism for handling network wide, end-to-end congestion.  That’s because it doesn’t need it.

FC is moving SCSI data, and SCSI is sensitive to dropped frames, latency is important but lossless delivery is more important.  To ensure a frame is never dropped FC uses a hop-by-hop flow control known as buffer-to-buffer (B2B) credits. At a high level each FC device knows the amount of buffer spaces available on the next hop device based on the agreed upon frame size (typically 2148 bytes.)  This means that a device will never send a frame to a next hop device that cannot handle the frame.  Let’s go back to the world of analogy.

Buffer-to-buffer credits:

The B2B credit system works in the same method you’d have 10 Marines offload and stack a truckload of boxes (‘fork-lift, we don’t need no stinking forklift.’)  The best system to utilize 10 Marines to offload boxes is to line them up end-to-end one in the truck and one on the other end to stack.  Marine 1 in the truck initiates the send by grabbing a box and passing it to Marine 2, the box moves down the line until it gets to the target Marine 10 who stacks it.  Before any Marine hands another Marine a box they look to ensure that Marines hands are empty verifying they can handle the box and it won’t be dropped.  Boxes move down the line until they are all offloaded and stacked.  If anyone slows down or gets backed up each marine will hold their box until the congestion is relieved.

In this analogy the Marine in the truck is the initiator/server and the Marine doing the stacking is the target/storage with each Marine in between being a switch.

When two FC devices initiate a link they follow the Link-Initialization-Protocols (LIP.)  During this process they agree on an FC frame size and exchange the available dedicated frame buffer spaces for the link.  A sender is always keeping track of available buffers on the receiving side of the link.  The only real difference between this and my analogy is each device (Marine) is typically able to handle more than one frame (box) at once.

So if FC networks operate without end-to-end congestion management just fine why do we need to implement a new mechanism in FCoE, well there-in lies the rub.  Do we need QCN?  The answer is really Yes and No, and it will depend on design.  FCoE today provides the exact same flow control as FC using two standards defined within Data Center Bridging (DCB) these are Enhanced Transmission Selection (ETS) and Priority-Flow Control (PFC) for more info on theses see my DCB blog: http://www.definethecloud.net/?p=31.)  Basically ETS provides a bandwidth guarantee without limiting and PFC provides lossless delivery on an Ethernet network.

Why QCN:

The reason QCN was developed is the differences between the size, scale, and design of FC and Ethernet networks.  Ethernet networks are usually large mesh or partial mesh type designs with multiple switches.  FC designs fall into one of three major categories Collapsed core (single layer), Core edge (two layer) or in rare cases for very large networks edge-core-edge (three layer.)  This is because we typically have far fewer FC connected devices than we do Ethernet (not every device needs consolidated storage/backup access.)

If we were to design our FCoE networks where every current Ethernet device supported FCoE and FCoE frames flowed end-to-end QCN would be a benefit to ensure point congestion didn’t clog the entire network.  On the other hand if we maintain similar size and design for FCoE networks as we do FC networks, there is no need for QCN.

Let’s look at some diagrams to better explain this:

image

 image In the diagrams above we see a couple of typical network designs.  The Ethernet diagram shows Core at the top, aggregation in the middle, and edge on the bottom where servers would connect.  The Fibre Channel design shows a core at the top with an edge at the bottom.  Storage would attach to the core and servers would attach at the bottom.  In both diagrams I’ve also shown typical frame flow for each traffic type.  Within Ethernet, servers commonly communicate with one another as well network file systems, the WAN etc.  In an FC network the frame flow is much more simplistic, typically only initiator target (server to storage) communication occurs.  In this particular FC example there is little to no chance of a single frame flow causing a central network congestion point that could effect other flows which is where end-to-end congestion management comes into play.

What does QCN do:

QCN moves congestion from the network center to the edge to avoid centralized congestion on DCB networks.  Let’s take a look at a centralized congestion example (FC only for simplicity):

image In the above example two 2Gbbps hosts are sending full rate frame flows to two storage devices.  One of the storage devices is a 2Gbps device and can handle the full speed, the other is a 1Gbps device and is not able to handle the full speed. If these rates are sustained switch 3’s buffers will eventually fill and cause centralized congestion effecting frame flows to both switch 4, and 5.  This means that the full rate capable devices would be affected by the single slower device.  QCN is designed to detect this type of congestion and push it to the edge, therefore slowing the initiator on the bottom right avoiding overall network congestion.

This example is obviously not a good design and is only used to illustrate the concept.  In fact in a properly designed FC network with multiple paths between end-points central congestion is easily avoidable.

When moving to FCoE if the network is designed such that FCoE frames pass through the entire full-mesh network shown in the Common Ethernet design above, there would be greater chances of central congestion.  If the central switches were DCB capable but not FCoE Channel Forwarders (FCF) QCN could play a part in pushing that congestion to the edge.

If on the other hand you design FCoE in a similar fashion to current FC networks QCN will not be necessary.  An example of this would be:

imageThe above design incorporates FCoE into the existing LAN Core, Aggregation, Edge design without clogging the LAN core with unneeded FCoE traffic.  Each server is dual connected to the common Ethernet mesh, and redundantly connected to FCoE SAN A and B.  This design is extremely scalable and will provide more than enough ports for most FCoE implementations.

Summary:

QCN like other congestion management tools before it such as FECN and BECN have significant use cases.  As far as FCoE deployments go QCN is definitely not a requirement and depending on design will provide no benefit for FCoE.  It’s important to remember that the DCB standards are there to enhance Ethernet as a whole, not just for FCoE.  FCoE utilizes ETS and PFC for lossless frame delivery and bandwidth control, but the FCoE standard is a separate entity from DCB.

Also remember that FCoE is an excellent tool for virtualization which reduces physical server count.  This means that we will continue to require less and less FCoE ports overall especially as 40Gbps and 100Gbps are adopted.  Scaling FCoE networks further than today’s FC networks will most likely not be a requirement.

GD Star Rating
loading...

The Difference Between ‘Foothold’ and ‘Lock-In’

There is always a lot of talk within IT marketing around vendor ‘lock-in’.  This is most commonly found within competitive marketing, i.e. ‘Product X from Company Y creates lock-in causing you to purchase all future equipment from them.  In some cases lock-in definitely exists, in other cases what you really have is better defined as ‘foothold.’  Foothold is an entirely different thing. 

Any given IT vendor wants to sell as much product as possible to their customers and gain new customers as quickly as possible, that’s business.  One way to do this is to use one product offering as a way in the door (foothold) and sell additional products later on.  Another way to do this is to sell a product that forces the sale of additional products.  There are other methods, including the ‘Build a better mousetrap method’, but these are the two methods I’ll discuss.

Foothold:

Foothold is like the beachhead at Normandy during WWII, it’s not necessarily easy to get but once held it gives a strategic position from which to gain more territory.

Great examples of foothold products exist throughout IT.  My favorite example is NetApp’s NFS/CIFS storage, which did the file based storage job so well they were able to convert their customer’s block storage to NetApp in many cases.  There are currently two major examples of the use of foothold in IT, HP and Cisco.

HP is using its leader position in servers to begin seriously pursuing the network equipment.  They’ve had ProCurve for some time but recently started pushing it hard, and acquired 3Com to significantly boost their networking capabilities (among other advantages.)  This is proper use of foothold and makes strategic sense, we’ll see how it pans out. 

Cisco is using its dominant position in networking to attack the market transition to denser virtualization and cloud computing with its own server line.  From a strategic perspective this could be looked at either offensively or defensively.  Either Cisco is on the offense attacking former strong vendor partner territory to grow revenue, or Cisco on the defense realized HP was leveraging its foothold in servers to take network market share.  In either event it makes a lot of strategic sense.  By placing servers in the data center they have foothold to sell more networking gear, and they also block HP’s traditional foothold.

From my perspective both are strong moves, to continue to grow revenue you eventually need to branch into adjacent markets.  You’ll here people cry and whine about stretching too thin, trying to do too much, etc, but it’s a reality.  As a publicly traded company stagnant revenue stream is nearly as bad as a negative revenue stream.

If you look closely at it both companies are executing in very complementary adjacent markets.  Networks move the data in and out of HP’s core server business, so why not own them?  Servers (and Flip cameras for that matter) create the data Cisco networks move, so why not own them?

Lock-In:

You’ll typically hear more about vendor lock-in then you will actually experience.  that’s not to say there isn’t plenty of it out there, but it usually gets more publicity than is warranted.

Lock-in is when a product you purchase and use forces you to buy another product/service from the same vendor, or replace the first.  To use my previous Cisco and HP example, both companies are using adjacent markets as foothold but neither lock you in.  For example both HP and Cisco servers can be connected to any vendors switching, their network systems interoperate as well.  Of course you may not get every feature when connecting to a 3rd party device but that’s part of foothold and the fact that they add proprietary value.

The best real example of lock-in is blades.  Don’t be fooled, every blade system on the market has inherent vendor lock-in.  Current blade architecture couldn’t provide the advantages it does without lock-in.  To give you an example let’s say you decide to migrate to blades and you purchase 7 IBM blades and a chassis, 4 Cisco blades and a chassis, or 8 HP blades and a chassis.  You now have a chassis half full of blades.  When you need to expand by one server, who you gonna call (Ghost Busters can’t help.)  Your obviously going to buy from the chassis vendor because blades themselves don’t interoperate and you’ve got empty chassis slots.  That is definite lock-in to the max capacity of that chassis.

When you scale past the first blade system you’ll probably purchase another from the same vendor, because you know and understand its unique architecture, that’s not lock-in, that’s foothold.

Summary:

Lock-in happens but foothold is more common.  When you here a vendor, partner, etc. say product X will lock you in to vendor Y make that person explain in detail what they mean.  Chances are you’re not getting locked-in to anything.  If you are getting locked-in, know the limits of that lock-in and make an intelligent decision on whether that lock-in is worth the advantages that made you consider the product in the first place, they very well might be.

GD Star Rating
loading...

SMT, Matrix and Vblock: Architectures for Private Cloud

Cloud computing environments provide enhanced scalability and flexibility to IT organizations.  Many options exist for building cloud strategies, public, private etc.  For many companies private cloud is an attractive option because it allows them to maintain full visibility and control of their IT systems.  Private clouds can also be further enhanced by merging private cloud systems with public cloud systems in a hybrid cloud.  This allows some systems to gain the economies of scale offered by public cloud while others are maintained internally.  Some great examples of hybrid strategies would be:

  • Utilizing private cloud for mission critical applications such as SAP while relying on public cloud for email systems, web hosting, etc.
  • Maintaining all systems internally during normal periods and relying on the cloud for peaks.  This is known as Cloud Bursting and is excellent for workloads that cycle throughout the day, week, month or year.
  • Utilizing private cloud for all systems and capacity while relying on cloud based Disaster Recovery (DR) solutions.

Many more options exist and any combination of options is possible.  If private cloud is part of the cloud strategy for a company there is a common set of building blocks required to design the computing environment.

image

In the diagram above we see that each component builds upon one another.  Starting at the bottom we utilize consolidated hardware to minimize power, cooling and space as well as underlying managed components.  At the second tier of the private cloud model we layer on virtualization to maximize utilization of the underlying hardware while providing logical separation for individual applications. 

If we stop at this point we have what most of today’s data centers are using to some extent or moving to.  This is a virtualized data center.  Without the next two layers we do not have a cloud/utility computing model.  The next two layers provide the real operational flexibility and organizational benefits of a cloud model.

To move out virtualized data center to a cloud architecture we next layer on Automation and Monitoring.  This layer provides the management and reporting functionality for the underlying architecture.  It could include: monitoring systems, troubleshooting tools, chargeback software, hardware provisioning components, etc.  Next we add a provisioning portal to allow the end-users or IT staff to provision new applications, decommission systems no longer in use, and add/remove capacity from a single tool.  Depending on the level of automation in place below some things like capacity management may be handled without user/staff intervention.

The last piece of the diagram above is security.  While many private cloud discussions leave security out, or minimize its importance it is actually a key component of any cloud design.  When moving to private cloud customers are typically building a new compute environment, or totally redesigning an existing environment.  This is the key time to design robust security in from end-to-end because you’re not tied to previous mistakes (we all make them)or legacy design.  Security should be part of the initial discussion for each layer of the private cloud architecture and the solution as a whole.

Private cloud systems can be built with many different tools from various vendors.  Many of the software tools exist in both Open Source and licensed software versions.  Additionally several vendors have private cloud offerings of an end-to-end stack upon which to build design a private cloud system.  The remainder of this post will cover three of the leading private cloud offerings:

Scope: This post is an overview of three excellent solutions for private cloud.  It is not a pro/con discussion or a feature comparison.  I would personally position any of the three architectures for a given customer dependant on customer requirements, existing environment, cloud strategy, business objective and comfort level.  As always please feel free to leave comments, concerns or corrections using the comment form at the bottom of the post.

Secure Multi-Tenancy (SMT):

Vendor positioning:  ‘This includes the industry’s first end-to-end secure multi-tenancy solution that helps transform IT silos into shared infrastructure.’

image

SMT is a pairing of: VMware vSphere, Cisco Nexus, UCS, MDS, and NetApp storage systems.  SMT has been jointly validated and tested by the three companies, and a Cisco Validated Design (CVD) exists as a reference architecture.  Additionally a joint support network exists for customers building or using SMT solutions.

Unlike the other two systems SMT is a reference architecture a customer can build internally or along with a trusted partner.  This provides one of the two unique benefits of this solution.

Unique Benefits:

  • Because SMT is a reference architecture it can be built in stages married to existing refresh and budget cycles.  Existing equipment can be reutilized or phased out as needed.
  • SMT is designed to provide end-to-end security for multiple tenants (customers, departments, or applications.)

HP Matrix:

Vendor positioning:  ‘The industry’s first integrated infrastructure platform that enables you to reduce capital costs and energy consumption and more efficiently utilize the talent of your server administration teams for business innovation rather than operations and maintenance.’

prod-shot-170x190

Matrix is a integration of HP blades, HP storage, HP networking and HP provisioning/management software.  HP has tested the interoperability of the proven components and software and integrated them into a single offering. 

Unique benefits:

  • Of the three solutions Matrix is the only one that is a complete solution provided by a single vendor.
  • Matrix provides the greatest physical server scalability of any of the three solutions with architectural limits of thousands of servers.

Vblock:

Vendor positioning:  ‘The industry’s first completely integrated IT offering that combines best-in-class virtualization, networking, computing, storage, security, and management technologies with end-to-end vendor accountability.’

image

Vblocks are a combination of EMC software and storage storage, Cisco UCS, MDS and Nexus, and VMware virtualization.  Vblocks are complete infrastructure packages sold in one of three sizes based on number of virtual machines.  Vblocks offer a thoroughly tested and jointly supported infrastructure with proven performance levels based on a maximum number of VMs. 

Unique Benefits:

  • Vblocks offer a tightly integrated best-of-breed solution that is purchased as a single product.  This provides very predictable scalability costs when looked at from a C-level perspective (i.e. x dollars buys y scalability, when needs increase x dollars will be required for the next block.)
  • Vblock is supported by a unique partnering between Cisco, EMC and VMware as well as there ecosystem of channel partners.  This provides robust front and backend support for customer before during and after install.

Summary:

Private cloud can provide a great deal of benefits when implemented properly, but like any major IT project the benefits are greatly reduced by mistakes and improper design.  Pre-designed and tested infrastructure solutions such as the ones above provide customers a proven platform on which they can build a private cloud.

GD Star Rating
loading...

10 Things to Know About Cisco UCS

Bob Olwig VP of Corporate Business Development for World Wide Technologies (www.wwt.com) asked me to provide 10 things to know about UCS for his blog http://bobolwig.wordpress.com.  See the post 10 Things to Know About Cisco UCS here: http://bobolwig.wordpress.com/2010/08/04/10-things-to-know-about-cisco-ucs/.

GD Star Rating
loading...