FCoE multi-hop; Do you Care?

There is a lot of discussion in the industry around FCoE’s current capabilities, and specifically around the ability to perform multi-hop transmission of FCoE frames and the standards required to do so.  A recent discussion between Brad Hedlund at Cisco and Ken Henault at HP (http://bit.ly/9Kj7zP) prompted me to write this post.  Ken proposes that FCoE is not quite ready and Brad argues that it is. 

When looking at this discussion remember that Cisco has had FCoE products shipping for about 2 years, and has a robust product line of devices with FCoE support including: UCS, Nexus 5000, Nexus 4000 and Nexus 2000, with more products on the road map for launch this year.  No other switching vendor has this level of current commitment to FCoE.  For any vendor with a less robust FCoE portfolio it makes no sense to drive FCoE sales and marketing at this point and so you will typically find articles and blogs like the one mentioned above.  The one quote from that blog that sticks out in my mind is:

“Solutions like HP’s upcoming FlexFabric can take advantage of FCoE to reduce complexity at the network edge, without requiring a major network upgrades or changes to the LAN and SAN before the standards are finalized.”

If you read between the lines here it would be easy to take this as ‘FCoE isn’t ready until we are.’  This is not unusual and if you take a minute to search through articles about FCoE over the last 2-3 years you’ll find that Cisco has been a big endorser of the protocol throughout (because they actually had a product to sell) and other vendors become less and less anti-FCoE as they announce FCoE products.

It’s also important to note that Cisco isn’t the only vendor out there embracing FCoE: NetApp has been shipping native FCoE storage controllers for some time, EMC has them road mapped for the very near future, Qlogic is shipping a 2nd generation of Converged Network adapter, and Emulex has fully embraced 10Gig Ethernet as the way forward with their OneConnect adapter (10GE, iSCSI, FCoE all in one card.)  Additionally support for FCoE switching of native Fibre Channel storage is widely supported by the storage community.

Fibre Channel over Ethernet (FCoE) is defined in IEEE FC-BB5 and requires the switches it traverses to support the IEEE Data Center Bridging (DCB)standards for proper traffic treatment on the network.  For more information on FCoE or DCB see my previous posts on the subjects (FCoE: http://www.definethecloud.net/?p=80, DCB: http://www.definethecloud.net/?p=31.)

DCB Has four major components, and the one in question in the above article is Quantized Congestion Notification (QCN) which the article states is required for multi-hop FCoE.  QCN is basically a regurgitation of FECN and BECN from frame relay.  It allows a switch to monitor it’s buffers and push congestion to the edge rather than clog the core. In the comments Brad correctly states that QCN is not required for FCoE, the reason for this is that Fibre Channel operates today without any native version of QCN, therefore when placing it on Ethernet you will not need to add functionality that wasn’t there to begin with, remember Ethernet is just a new layer 1-2 for native FC layers 2-4, the FC secret sauce remains unmodified.  Remember that not every standard defined by a standards body has to be adhered to by every device, some are required, some are optional.  Logical SANs are a great example of an optional standard.

Rather than discuss what is or isn’t required for multi-hop FCoE I’d like to ask a more important question that we as engineers tend to forget: Do I care?  This question is key because it avoids having us argue the technical merits of something we may never actually need, or may not have a need for today.

Do we care?

First let’s look at why we do multi-hop anything: to expand the port-count of our network.  Take TCP/IP networks and the internet for example, we require the ability to move packets across the globe through multiple routers (hops.)  This is in order to attach devices on all corners of the globe.

Now let’s look at what we do with FC today: typically one or two hop networks (sometimes three) used to connect several hundred devices (occasionally but rarely more.)  It’s actually quite common to find FC implementations with less than 100 attached ports.  This means that if you can hit the right port count without multiple hops you can remove complexity and decrease latency, in Storage Area Networks (SAN) we call this the collapsed core design.

The second thing to consider is a hypothetical question: If FCoE were permanently destined for single hop access/edge only deployments (it isn’t) should that actually stop you from using it?  The answer here is an emphatic no, I would still highly recommend FCoE as an access/edge architecture even if it were destined to connect back to an FC SAN and Ethernet LAN for all eternity.  Let’s jump to some diagrams to explain.  In the following diagrams I’m going to focus on Cisco architecture because as stated above they are currently the only vendor with a full FCoE product portfolio.

 image

In the above diagram you can see a fairly dynamic set  of FCoE connectivity options.  Nexus 5000 can be directly connected to servers, or to Nexus 4000 in IBM BladeCenter to pass FCoE.  It can also be connected to 10GE Nexus 2000s to increase its port density. 

To use the nexus 5000 + 2000 as an example it’s possible to create a single-hop (2000 isn’t an L2 hop it is an extension of the 5000) FCoE architecture of up to 384 ports with one point of switching management per fabric.  If you take server virtualization into the picture and assume 384 servers with a very modest V2P ratio of 10 virtual machines to 1 physical machine that brings you to 3840 servers connected to a single hop SAN.  That is major scalability with minimal management all without the need for multi-hop. The diagram above doesn’t include the Cisco UCS product portfolio which architecturally supports up to 320 FCoE connected servers/blades.

The next thing I’ve asked you to think about is whether or not you should implement FCoE in a hypothetical world where FCoE stays an access/edge architecture forever.  The answer would be yes.  In the following diagrams I outline the benefits of FCoE as an edge only architecture.

image

The first benefit is reducing the networks that are purchased, managed, power, and cooled from 3 to 1 (2 FC and 1 Eth to 1 FCoE.)  Even just at the access layer this is a large reduction in overhead and reduces the refresh points as I/O demands increase.

image The second benefit is the overall infrastructure reduction at the access layer.  Taking a typical VMware server as an example we reduce 6x 1GE ports, 2x 4GFC ports and the 8 cables required for them to 2x 10GE ports carrying FCoE.  This increases total bandwidth available while greatly reducing infrastructure.  Don’t forget the 4 top-of-rack switches (2x FC, 2x GE) reduced to 2 FCoE switches.

Since FCoE is fully compatible with both FC and pre-DCB Ethernet this requires 0 rip-and-replace of current infrastructure.  FCoE is instead used to build out new application environments or expand existing environments while minimizing infrastructure and complexity.

What if I need a larger FCoE environment?

If you require a larger environment than is currently supported extending your SAN is quite possible without multi-hop FCoE.  FCoE can be extended using existing FC infrastructure.  Remember customers that require an FCoE infrastructure this large already have an FC infrastructure to work with.

image 

What if I need to extend my SAN between data centers?

FCoE SAN extension is handled in the exact same way as FC SAN extension, CWDM, DWDM, Dark Fiber, or FCIP.  Remember we’re still moving Fibre Channel frames.

image

Summary:

FCoE multi-hop is not an argument that needs to be had for most current environments.  FCoE is a supplemental technology to current Fibre Channel implementations.  Multi-hop FCoE will be available by the end of CY2010 allowing 2+ tier FCoE networks with multiple switches in the path, but there is no need to wait for them to begin deploying FCoE.  The benefits of an FCoE deployment at the access layer only are significant, and many environments will be able to scale to full FCoE roll-outs without ever going mutli-hop. 

GD Star Rating
loading...

Welcome to the New Site!

Today Define the Cloud officially transfers here to the new hosted site.  It was time for an upgrade from the wordpress.com hosted blog.  Through privately hosting I’ll be able to provide additional site content and functionality as things progress.

If you find any issues, broken links, or links pointing back to wordpress.com please use the contact info on the about the author page to let me know.  Don’t forget to sign up for email and or RSS updates, we won’t be moving again anytime soon.

GD Star Rating
loading...

Why Cisco UCS is my ‘A-Game’ Server Architecture

A-Game:

When I discuss my A-Game it’s my go to hardware vendor for a specific data center component.  For example I have an A-Game platform for:

  • Storage
  • SAN
  • LAN (access Layer LAN specifically, you don’t want me near your aggregation, core or WAN)
  • Servers and Blades (traditionally this has been one vendor for both)

As this post is in regards to my server A-Game I’ll leave the rest undefined for now and may blog about them later.

Over the last 4 years I’ve worked in some capacity or another as an independent customer advisor or consultant with several vendor options to choose from.  This has been either with a VAR or strategic consulting firm such as www.fireflycom.net.)  In both cases there is typically a company lean one way or another but my role has given me the flexibility to choose the right fit for the customer not my company or the vendors which is what I personally strive to do.  I’m not willing to stake my own integrity on what a given company wants to push today.  I’ve written about my thoughts on objectivity in a previous blog (http://www.definethecloud.net/?p=112.)

Another rule in regards to my A-Game is that it’s not a rule, it’s a launching point.  I start with a specific hardware set in mind in order to visualize the customer need and analyze the best way to meet that need.  If I hit a point of contention that negates the use of my A-Game I’ll fluidly adapt my thinking and proposed architecture to one that better fits the customer.  These points of contention may be either technical, political, or business related:

  • Technical: My A-Game doesn’t fit the customers requirement due to some technical factor, support, feature, etc.
  • Political: My A-Game doesn’t fit the customer because they don’t want Vendor X (previous bad experience, hype, understanding, etc.)
  • Business: My A-Game isn’t on an approved vendor list, or something similar.

If I hit one of these roadblocks I’ll shift my vendor strategy for the particular engagement without a second thought.  The exception to this is if one of these roadblocks isn’t actually a roadblock and my A-Game definitely provides the best fit for the customer I’ll work with the customer to analyze actual requirements and attempt to find ways around the roadblock.

Basically my A-Game is a product or product line that I’ve personally tested, worked with and trust above the others that is my starting point for any consultative engagement.

A quick read through my blog page or a jump through my links will show that I work closely with Cisco products and it would be easy to assume that I am therefore inherently skewed towards Cisco.  In reality the opposite is true, over the last few years I’ve had the privilege to select my job(s) and role(s) based on the products I want to work with.

My sorted UCS history:

As anyone who’s worked with me can attest to I’m not one to pull punches, feign friendliness, or accept what you try and sell me based on a flashy slide deck or eloquent rhetoric.  If you’re presenting to me don’t expect me to swallow anything without proof, don’t expect easy questions, and don’t show up if you can’t put the hardware in my hands to cash the checks your slides write.  When I’m presenting to you, I expect and encourage the same.

Prior to my exposure to UCS I worked with both IBM and HP servers and blades.  I am an IBM Certified Blade Expert (although dated at this point.)  IBM was in fact my A-Game server and blade vendor.  This had a lot to do with the technology of the IBM systems as well as the overall product portfolio IBM brought with it.  That being said I’d also be willing to concede that HP blades have moved above IBM’s in technology and innovation, although IBM’s MAX5 is one way IBM is looking to change that.

When I first heard about Cisco’s launch into the server market I thought, and hoped, it was a joke.  I expected some Frankenstein of a product where I’d place server blades in Nexus or Catalyst chassis.  At the time I was working heavily with the Cisco Nexus product line primarily 5000, 2000, and 1000v.  I was very impressed with these products, the innovation involved, and the overall benefit they’d bring to the customer.  All the love in the world for the Nexus line couldn’t overcome my feeling that there was no way Cisco could successfully move into servers.

Early in 2009 my resume was submitted among several others by my company to Learning at Cisco and the business unit in charge of UCS.  This was part of an application process for learning partners in order to be invited to the initial external Train The Trainer (TTT) and participate in training UCS to: Cisco, partners, and customers worldwide.  Myself and two other engineer/trainers (Dave Alexander and Fabricio Grimaldi) were selected from my company to attend.  The first interesting thing about the process was that the three of us were selected above CCIEs, 2x CCIEs and more experienced instructors from our company based on our server backgrounds.  It seemed Cisco really was looking to push servers not some network adaptation.

During the TTT I remained very skeptical.  The product looked interesting but not ‘game-changing.’  The user interfaces were lacking and definitely showed their Alpha and Beta colors.  Hardware didn’t always behave as expected and the real business/technological benefits of the product didn’t shine through.  That being said remember that at this point the product was months away from launch and this was a very Beta version of hardware/software we were working with.  Regardless of the underlying reasons I walked away from the TTT feeling fully underwhelmed.

I spent the time on my flight back to the East Coast from San Jose looking through my notes and thinking about the system and components.  It definitely had some interesting concepts but I didn’t feel it was a platform I would stake my name to at this point.

Over the next couple of months Fabricio Grimaldi and I assisted Dave Alexander (http://theunifiedcomputingblog.com) in developing the UCS Implementation certification course.  Through this process I spent a lot of time digging into the underlying architecture, relating it back to my server admin days and white boarding the concepts and connections in my home office.  Additionally I got more and more time on the equipment to ‘kick-the-tires.’  During this process Dave myself and Fabrico began instructing an internal Cisco course known as UCS Bootcamp.  The course was designed for Cisco engineers from both pre-sales and post-sales roles and focused specifically on the technology as a product deep dive.

It was over these months having discussions on the product, wrapping my head around the technology, and developing training around the components that the lock cylinders in my brain started to click into place and finally the key turned: UCS changes the game for server architecture, the skeptic had become a convert.

UCS the game changer:

The term game changer ge
ts thrown around all willy nilly like in this industry.  Every minor advancement is touted by its owner as a ‘Game Changer.’  In reality ‘Game Changers’ are few and far between.  In order to qualify you must actually shift the status quo, not just improve upon it.  To use vacuums as an example, if your vacuum sucks harder it just sucks harder, it doesn’t change the game.  A Dyson vacuum may vacuum better than anyone else’s but Roomba (http://www.irobot.com/uk/home_robots.cfm) is the one that changed the game.  With Dyson I still have to push the damn thing around the living room, with Roomba I watch it go.

In order to understand why UCS changes the game rather than improving upon it, you first need to define UCS:

UCS is NOT a blade system it is a server architecture

Cisco’s unified Computing System (UCS) is not all about blades, it is about rack mount servers, blade servers, and management being used as a flexible pool of computing resources.  Because of this it has been likened to an x86-64 based mainframe system.

UCS takes a different approach to the original blade system designs.  It’s not a solution for data center point problems (power, cooling, management, space, cabling) in isolation it’s a redefinition of the way we do computing.

‘Instead of asking how can I improve upon current architectures’

Cisco/Nuova asked

What’s the purpose of the server and what’s the best way to accomplish that goal.’

Many of the ideas UCS utilizes have been tried and implemented in other products before: Unified I/O, single point of management, modular scalability, etc., but never all in one cohesive design.

There are two major features of UCS that I call ‘the cake’ and three more that are really icing.  The two cake features are the reason UCS is my A-Game and the others just further separate it.

  • Unified Management
  • Workload Portability

Unified Management:

Blade architectures are traditionally built with space savings as a primary concern.  In order to do this a blade chassis is built with a shared LAN, SAN, power, cooling infrastructure and an onboard management system to control server hardware access, fan speeds, power levels, etc.  M. Sean McGee describes this much better than I could hope to in his article The “Mini-Rack” approach to Blade Design (http://bit.ly/bYJVJM.)  This traditional design saves space and can also save on overall power, cooling, and cabling but causes pain points in management among other considerations.

UCS was built from the ground up with a different approach, and Cisco has the advantage of zero legacy server investment which allows them to execute on this.  The UCS approach is:

  • Top-of-Rack networking should be Top-Of-Rack not repeated in each blade chassis.
  • Management should encompass the entire architecture not just a single chassis.
  • Blades are only 40% of the data center server picture, rack mounts should not be excluded.
    The UCS Approach

    image

The key difference here is that all management of the LAN, SAN, server hardware, and chassis itself is pulled into the access layer and performed on the UCS Fabric Interconnect which provides all of the switching and management functionality for the system.  The system itself was built from the ground up with this in mind, and as such this is designed into each hardware component.  Other systems that provide a single point of management do so by layering on additional hardware and software components in order to manage underlying component managers.  Additionally these other systems only manage blade enclosures while UCS is designed to manage both blades and traditional rack mounts from one point.  This functionality will be available in firmware by the end of CY10.

To put this in perspective Cisco UCS provides a very similar rapid repeatable physical server deployment model to the virtual server deployment model VMware provides.  Through the use of granular Role Based Access Control (RBAC) UCS ensures that organizational changes are not required, while at the same time providing the flexibility to streamline people and process if desired.

Workload Portability:

Workload portability has significant benefits within the data center, the concept itself is usually described as ‘statelessness.’  If you’re familiar with VMware this is the same flexibility VMware provides for virtual machines, i.e. there is no tie to the underlying hardware. One of the key benefits of UCS is the ability to apply this type of statelessness at the hardware level.  This removes the tie of the server or workload to the blade or slot it resides in, and provides major flexibility to maintenance and repair cycles, as well as deployment times for new or expanding applications.

Within UCS all management is performed on the Fabric Interconnect through the UCS Manager GUI or CLI.  This includes any network configuration for blades, chassis, or rack-mounts, all server configuration including firmware BIOS, NIC/HBA and boot order among other things.  The actual blade is configured through an object called a ‘service profile’.’  This profile defines the server on the network as well as the way in which the server hardware operates (BIOS/Firmware, etc.)

All of the settings contained within a server profile are traditionally configured, managed and stored in hardware on a server.  Because these are now defined in a configuration file the underlying hardware tie is stripped away and a server workload can be quickly moved from one physical blade to another without requiring changes in the networks, or storage arrays.  This decreases maintenance windows and speeds roll-out.

Within UCS, Service Profiles can be created using templates or pools which is unique to UCS.  This further increases the benefits of service profiles and decreases the risk inherent with multiple configuration points, and case-by-case deployment models.

UCS Profiles and Templates

image

These two features and their real world applications and value are what place UCS in my A-Game slot.  These features will provide benefits to ANY server deployment model, and are unique to UCS.  While subcomponents exist within other vendors they are not:

  • Designed into the hardware
  • Fully integrated without the need for additional hardware and software and licensing
  • As robust

Icing on the cake:

  • Dual socket server memory scalability and flexibility (Cisco memory expander technology)
  • Integration with VMware and advanced networking for virtual switching
  • Unified fabric (I/O consolidation)

Each of these feature also offer real world benefits but the real heart of UCS is the Unified management and server statelessness.  You can find more information on these other features through var
ious blogs and Cisco documentation.

When is it time for my B-Game?:

By now you should have an understanding as to why I chose UCS as my A-Game (not to say you necessarily agree, but that you understand my approach.)  So what are the factors that move me towards my B-Game?  I will list three considerations and the qualifying question that would finalize a decision to position a B-Game system:

Infiniband If the customer is using Infiniband for networking UCS does not currently support it.  I would first assess whether there was an actual requirement for Infiniband or if it was just the best option at the time of last refresh.  If Infiniband is required I would move to another solution.
Non-Intel Processors Requirement for non-Intel processors would steer me towards another vendor as UCS does not currently support non-Intel.  As above I would first verify whether non-Intel was a requirement or a choice.
Requirement for chassis based storage If a customer had a requirement for chassis based storage there is no current Cisco offering for this within UCS.  This is however very much a corner case and only a configuration I would typically recommend for single chassis deployments with little need to scale.  In-chassis storage becomes a bottle neck rather than a benefit in multi-chassis configurations.

While there are other reasons I may have to look at another product for a given engagement they are typically few and far between.  UCS has the right combination of entry point and scalability to hit a great majority of server deployments.  Additionally as a newer architecture there is no concern with the architectural refresh cycle of other vendors.  As other blade solutions continue to age there will be an increased risk to the customer in regards to forward compatibility.

Summary:

UCS is not the only server or blade system on the market, but it is the only complete server architecture.  Call it consolidated, unified, virtualized, whatever but there isn’t another platform to combine rack-mounts and blades under a single architecture with a single management window and tools for rapid deployment.  The current offering is appropriate for a great majority of deployments and will continue to get better.

If your considering a server refresh or new deployment it would be a mistake not to take a good look at the UCS architecture.  Even if it’s not what you choose it may give you some ideas as to how you want to move forward, or features to ask your chosen vendor for.

Even if you never buy a UCS server you can still thank Cisco for launching UCS.  The lower pricing you’re getting today, and the features being put in place on other vendors product lines are being driven by a new server player in the market, and the innovation they launched with.

Comments, concerns, complaints always appreciated!

GD Star Rating
loading...

Learning from FaceBook’s failures

In the interest of honesty I’ll start this post by saying I’m not a fan of FaceBook and never have been.  This is based on two major things:

  • If you and I don’t keep in touch through email or phone there is a reason for that.  I have no desire to waste time being FaceBook poked by you, participating in your mafia war, or hearing about your interests and daily life.
  • Constant issues with questionable or downright unacceptable privacy practices and user data/credential theft.

This post however is not that broad in scope, it’s all about what we can learn from FaceBook’s privacy failures and how we can apply that to information and services we decide to place on the web and in the cloud.  I’m also not stating that FaceBook is in any way a cloud provider, the closest cloud definition FaceBook could be provided is FaaS (Failure as a Service.)  That being said FaceBook is a web based service providing an online tool for things you used to do offline (remember the family address book and the yearly holiday card?)

Lately there has been a lot of buzz around FaceBook’s latest major privacy infringement, pushing/selling your data to 3rd party services in the interest of ‘enhancing your user experience.’  The main issue with what FaceBook has done is not the addition of services which may enhance your experience, or even the privacy sacrificed to get those enhancements, it’s about the way they pushed this using an ‘opt-out’ model, rather than an ‘opt-in’ model.

  • Opt-in: A service or add-on that you must consciously choose to accept in order to gain it’s features, for example: ‘Would you like to turn on enhanced personalization features for our service (yes/no.)
  • Opt-out: A service or add on that becomes enabled automatically and may or may not inform you.

If the advanced personalization features of FaceBook were actually a benefit to the end user than opt-in would have been the way to push them.  FaceBook would have provided you a pop-up window detailing the benefits of the new service and the way in which it was done, and you would have happily accepted.  Because the new features are really just a pretty face on a new way for FaceBook to profit from the information you store in your profile they chose an opt-out model and obscured the ability to disable the feature behind a complex non-documented privacy setting hierarchy that requires a PHD to navigate (the complexity of FaceBook’s privacy policy and options system has been well documented in several other posts, if you have a good link post it in the comments.)

Since this announcement several IT professionals, myself included have publicly deleted their accounts to spread awareness.  The hope is that awareness makes it to the average end-user who has no clue about privacy dangers.  From my perspective it’s even more important that this information reach children and teens and that they learn the issues with too much public data.  Several young people will have a rude awakening when they sit across the desk from a manager during an interview and she/he turns their monitor around to show the job candidate a series of highly unprofessional blogs, pictures, videos, etc from FaceBook and other sites that are the reason the candidate won’t be getting the job.  As a side note to that, marking your profile ‘private’ or deleting it won’t be of any use, FaceBook’s privacy settings won’t help and any information that touches the web can be retrieved in some way regardless of deletion (http://www.archive.org/index.php for instance.)

So what’s this got to do with cloud?

FaceBook is just one example of privacy and security concerns with placing data/information in web based services or moving services to the cloud.  Another great example would be Gmail.  When checking your Gmail through a web browser you’re presented with advertisements targeted at you based on email content.  I’m actually a fan of this on the surface, I get non-intrusive text based ads that are typically somewhat relevant to me, this pays for the free service I’m using.  Now if Google took that one step further and sold keyword lists from my email history to advertisers that would be a different story (I’m not saying they do or don’t, if I was aware that they did I would close my account publicly as well.)  The same could be applied to cloud based business services such as SalesForce.com, if they started cross referencing your business data with other hosted companies and selling that it would be a major concern (again not saying they do or don’t.)

As you decide to use web based services, cloud based or not, for business and personal purposes you need to carefully assess how the data is encrypted, secured, backed up and used.  You need to also be very aware of changes to the privacy policies and End User License Agreements (EULA.)  This is no small task as these policies are typically lengthy and change frequently.  In every case remember that being skeptical is your best tool.  If I walked up to you on the street and told you that for just $100.00 I could teach you how to be a millionaire you’d laugh in my face, so why trust a company that says they can give you the world for $0.05 per Gigabyte?

Summary:

This is not intended as an anti-cloud rant, if you look around my blog you’ll see that I’m a definite endorser of cloud architectures in all shapes and forms.  The concept here is that you need to carefully assess both what you move to the cloud and where you move it.  Throughout the history of the data center we as an industry have had a tendency to make it work first and worry about security and privacy later.  Fantastic security engineers and researchers are working hard to change this behavior, help them out.  There is a saying in carpentry that you should always ‘measure twice, cut once’ apply the same to data center and cloud migration strategies.

GD Star Rating
loading...

The Cloud Storage Argument

The argument over the right type of storage for data center applications is an ongoing battle.  This argument gets amplified when discussing cloud architectures both private and public.  Part of the reason for this disparity in thinking is that there is no ‘one size fits all solution.’  The other part of the problem is that there may not be a current right solution at all.

When we discuss modern enterprise data center storage options there are typically five major choices:

  • Fibre Channel (FC)
  • Fibre Channel over Ethernet (FCoE)
  • Internet Small Computer System Interface (iSCSI)
  • Network File System (NFS)
  • Direct Attached Storage (DAS)

In a Windows server environment these will typically be coupled with Common internet File Service (CIFS) for file sharing.  Behind these protocols there are a series of storage arrays and disk types that be used to meet the applications I/O requirements.

As people move from traditional server architectures to virtualized servers, and from static physical silos to cloud based architectures they will typically move away from DAS into one of the other protocols listed above to gain the advantages, features and savings associated with shared storage.  For the purpose of this discussion we will focus on these four: FC, FCoE, iSCSI, NFS.

The issue then becomes which storage protocol to use for transport of your data from the server to the disk?  I’ve discussed the protocol differences in a previous post (http://www.definethecloud.net/?p=43) so I won’t go into the details here.  Depending on who you’re talking to it’s not uncommon to find extremely passionate opinions.  There a quite a few consultants and engineers that are hard coded to one protocol or another.  That being said most end-users just want something that works, performs adequately and isn’t a headache to manage.

Most environments currently work on a combination of these protocols, plenty of FC data centers rely on DAS to boot the operating system and NFS/CIFS for file sharing.  The same can be said for iSCSI.  With current options a combination of these protocols is probably always going to be best, iSCSI, FCoE, and NFS/CIFS can be used side by side to provide the right performance at the right price on an application by application basis.

The one definite fact in all of the opinions is that running separate parallel networks as we do today  with FC and Ethernet is not the way to move forward, it adds cost, complexity, management, power, cooling and infrastructure that isn’t needed.  Combining protocols down to one wire is key to the flexibility and cost savings promised by end-to-end virtualization and cloud architectures.  If that’s the case which wire do we choose, and which protocol rides directly on top to transport the rest?

10 Gigabit Ethernet is currently the industries push for a single wire and with good reason:

  • It’s currently got enough bandwidth/throughput to do it (10gigabits using 64b/66b encoding as opposed to FC/Infiniband which currently use 8b/10b with 20% overhead)
  • It’s scaling fast 40GE and 100GE are well on their way to standardization (As opposed to 16G and 32G FC)
  • Everyone already knows and uses it, yes that includes you.

For the sake of argument let’s assume we all agree on 10GE as the right wire/protocol to carry all of our traffic, what do we layer on top?  FCoE, iSCSI, NFS, something else?  Well that is a tough question.  the first part of the answer is you don’t have to decide, this is very important because none of these protocols is mutually exclusive.  The second part of the answer is, maybe none of these is the end-all-be-all long-term solution.  Each current protocol has benefits and draw backs so let’s take a quick look:

  • iSCSI: Block level protocol carrying SCSI over IP.  Works with standard Ethernet but can have performance issues on congested networks, also incurs IP protocol overhead.  iSCSI is great on standard Ethernet networks until congestion occurs, once the network becomes fully utilized iSCSI performance will tend to drop.
  • FCoE: Block level protocol which maintains Fibre Channel reliability and security while using underlying Ethernet.  Requires 10GE or above and DCB (http://www.definethecloud.net/?p=31) capable switches.  FCoE is currently well proven and reliable at the access layer and a fantastic option there, but no current solutions exist to move it up further into the network.  Products are on the road map to push FCoE further into the network but that may not necessarily be the best way forward.
  • NFS: File level protocol which runs on top of UDP or TCP and IP.

And a quick look at comparative performance:

Protocol Performanceimage

While the above performance model is subjective and network tuning and specific equipment will play a big role the general idea holds sound.

One of the biggest factors that needs to be considered when choosing these protocols is block vs. file.  Some applications require direct block access to disk, many databases fall into this category.  As importantly if you want to boot an operating system from disk block level protocol (iSCSI, FCoE) are required.  This means that for most diskless configurations you’ll need to make a choice between FCoE and iSCSI (still within the assumption of consolidating on 10GE.)  Diskless configurations have major benefits in large scale deployments including power, cooling, administration, and flexibility so you should at least be considering them.

If you chosen a diskless configuration and settled on iSCSI or FCoE for your boot disks now you still need to figure out what to do about file shares?  CIFS or NFS are your next decision, CIFS is typically the choice for Windows, and NFS for Linux/UNIX environments.  Now you’ve wound up with 2-3 protocols running to get your storage settled and your stacking those alongside the rest of your typical LAN data.

Now to look at management step back and take a look at block data as a whole.  If you’re using enterprise class storage you’ve got several steps of management to configure the disk in that array.  It varies with vendor but typically something to the effect of:

  1. Configure the RAID for groups of disks
  2. Pool multiple RAID groups
  3. Logically sub divide the pool
  4. Assign the logical disks to the initiators/servers
  5. Configure required network security (FC zoning/ IP security/ACL, etc)

While this is easy stuff for storage and SAN administrators it’s time consuming, especially when you start talking about cloud infrastructures with lots and lots of moves adds and changes.  It becomes way to cumbersome to scale into petabytes with hundreds or thousands of customers.  NFS has more streamlined management but it can’t be used to boot an OS.  This makes for extremely tough decisions when looking to scale into large virtualized data center architectures or cloud infrastructure.

There is a current option that allows you to consolidate on 10GE, reduce storage protocols and still get diskless servers.  I
t’s definitely not the solution for every use case (there isn’t one), and it’s only a great option because there aren’t a whole lot of other great options.

In a fully virtualized environment NFS is a great low management overhead protocol for Virtual Machine disks.  Because it can’t boot we need another way to get the operating system to server memory.  That’s where PXE Boot comes in.  Pre eXecutionEnvironment (PXE) is a network OS boot that works well for small operating systems, typically terminal clients or Linux images.  It allows for a single instance of the operating system to be stored on a PXE server attached to the network, and a diskless server to retrieve that OS at boot time.  Because some virtualization operating systems (Hypervisors) are light weight, they are great candidates for PXE boot.  This allows the architecture below.

PXE/NFS 100% Virtualized Environment

image

Summary:

While there are  several options for data center storage none of them solves every need.  Current options increase in complexity and management as the scale of the implementation increases.  Looking to the future we need to be looking for better ways to handle storage.  Maybe block based storage has run it’s course, maybe SCSI has run it’s course, either way we need more scalable storage solutions available to the enterprise in order to meet the growing needs of the data center and maintain manageability and flexibility.  New deployments should take all current options into account and never write off the advantages of using more than one, or all of them where they fit.

GD Star Rating
loading...

FCoE initialization Protocol (FIP) Deep Dive

In an attempt to clarify my future posts I will begin categorizing a bit.  The following post will be part of a Technical Deep Dive series.

Fibre Channel over Ethernet (FCoE) is a protocol designed to move native Fibre Channel over 10 Gigabit Ethernet and above links, I’ve described the protocol in a previous post (http://www.definethecloud.net/?p=80.)  In order for FCoE to work we need a mechanism to carry the base Fibre Channel port / device login mechanisms over Ethernet.  These are the processes for a port to login and obtain a routable Fibre Channel Address.  Let’s start with some background and definitions:

DCB Data Center Bridging
FC Native Fibre Channel Protocol
FCF Fibre Channel Forwarder (an Ethernet switch capable of handling Encapsulation/De-encapsulation of FCoE frames and some or all FC services)
FCID Fibre Channel ID (24 Bit Routable address)
FCoE Fibre Channel over Ethernet
FC-MAP A 24-Bit value identifying an individual fabric
FIP FCoE Initialization Protocol
FLOGI FC Fabric Login
FPMA Fabric Provided MAC Address
PLOGI FC Port Login
PRLI Process Login
SAN Storage Area Network (switching infrastructure)
SCSI Small Computer Systems Interface
 
Now for the background, you’ll never grasp FIP properly if you don’t first get the fundamentals of FC:
 
N_Port Initialization
image

 

When a node comes online it’s port is considered an N_port.  When an N_port connects to the SAN it will connect to a switch port defined as a Fabric Port F_Port (this assumes your using a switched fabric.)  All N_ports operate the same way when they are brought online:

  1. FLOGI – Used to obtain a routable FCID for use in FC frame exchange.  The switch will provide the FCID during a FLOGI exchange.
  2. PLOGI – Used to register the N_Port with the FC name server

At this point a targets (disk or storage array) job is done, they can now sit and wait for requests.  An initiator (server) on the other hand needs to perform a few more tasks to discover available targets:

  1. Query – Request available targets from the FC name server, zoning will dictate which targets are available.
  2. PLOGI – A 2nd port Login, this time into the target port.
  3. PRLI – Process login to exchange supported upper layer protocols (ULP) typically SCSI-3.

Once this process has been completed the initiator can exchange frames with the target, i.e. the server can write to disk.

FIP:

The reason the FC login process is key to understanding FIP is that this is the process that FIP is handling for FCoE networks.  FIP allows an Ethernet attached FC node (Enode) to discover existing FCFs and supports the FC login procedure over 10+GE networks.  Rather than just providing an FCID, FIP will provide an FPMA which is a MAC address comprised of two parts: FC-MAP and FCID.

48 bit FCMAP (Mac Address)

image

FIP

image

So FIP provides an Ethernet MAC address used by FCoE to traverse the Ethernet network which contains the FCID required to be routed on the FC network.  FIP also passes the query and query response from the FC name server.  FIP uses a separate Ethertype from FCoE and its frames are standard Ethernet size (1518 Byte 802.1q frame) whereas FCoE frames are 2242 Byte Jumbo Frames.

FIP Snooping:

FIP snooping is used in multi-hop FCoE environments.  FIP snooping is a frame inspection method that can be used by FIP snooping capable DCB devices to monitor FIP frames and apply policies based on the information in those frames.  This allows for:

  • Enhanced FCoE security (Prevents FCoE MAC spoofing.)
  • Creates FC point-to-point links within the Ethernet LAN
  • Allows auto-configuration of ACLs based on name server information read in the FIP frames

FIP Snooping

image

Summary:

FIP snooping uses dynamic Access Control Lists to enforce Fibre Channel rules within the DCB Ethernet network.  This prevents Enodes from seeing or communicating with other Enodes without first traversing an FCF.

Feedback, corrections, updates, questions?

GD Star Rating
loading...

What the iPad means to cloud computing

ipad_with_keyboard

Let me preface is by saying I am far from an Apple fan boy, in fact I would consider myself the exact opposite.  I don’t like their business tactics (iTunes trying to force Safari down my throat for example) or their hardware lock-in for their laptops.  I would never buy a Mac as a PC or laptop replacement.  Most of all I despise the smug pretentiousness that owning Apple products seems to breed.

That being said I just bought an iPad; I fell victim to it’s charms while fiddling with a friends at a BBQ. The device is as sleek and elegant as you could ask for, and it definitely has some cool factor. The responsiveness of the touch screen and the potential for interaction with the device are just phenomenal.  For instance when reading the illustrated iPad version of specific book formats you can watch the pictures respond to you moving the device.  The applications for this type of thing are endless (think therapy for special needs children for one.)

What finally convinced me to get over the ‘it won’t replace my laptop’ and ‘it doesn’t have USB ports etc.’ arguments I’d had were that I realized that it isn’t intended to, and it shouldn’t respectively.  Why tether down to USB cables when I can wirelessly do everything on my Wi-Fi network?  I don’t need a laptop replacement, I want something to supplement my laptop.  For instance when I’m watching TV I don’t want
to break out a laptop to check my email, update my blog, or respond to a Twitter message, instead I grab my iPad and I’m set without adjusting position and flipping a screen.  The iPad is what Apple does best simple interactive devices.

Now that being said what does this mean for cloud computing?  The answer to that came to me the first time I docked the iPad.  Plugging my toy (iPad) into it’s keyboard docking station it sat upright charging with a good size High definition screen and a full size keyboard.  Seeing that it came to me that this is the future of the dummy terminal.  I can carry around an iPad for light use and fun while traveling or out, and when I get to my desk at work or home set it into a docking station and use it as a Virtual Desktop Terminal connected to backend Virtual Desktop Infrastructure (VDI) system.  Need more screen real estate?  Use the VGA dongle.  I don’t need the full PC functionality on my iPad because it can front end a power user virtual machine for me to do the heavy lifting.

Devices like this make all varieties of cloud services more attractive to individual end users.  I can access my Gmail from anywhere, my iTunes music will follow suit.  Pushing my productivity suite into a services offering allows me to access my work anywhere, and any number of companies will give me free cloud storage.

That covers most of my personal use, attaching to a corporate virtual desktop would cover most of my business needs. At that point there’s less to carry, less to worry about, less to lose in the event of a hardware failure, and no sacrifice of functionality.

At $499.00 in the US for the base model that compares nicely with thin clients that range from about $300 – $1000 US.

I think my favorite iPad selling point is that I typed this blog from my iPad in a cramped coach seat on a flight, I can’t even open the lid of the laptop I travel with in this seat, and it was finished nearly as fast as it would have been on my laptop.

65512c96f580f456_f

GD Star Rating
loading...

HP Flex-10, Cisco VIC, and Nexus 1000v

When discussing the underlying technologies for cloud computing topologies virtualization is typically a key building block.  Virtualization can be applied to any portion of the data center architecture from load-balancers to routers, and from servers to storage.  Server virtualization is one of the most widely adopted virtualization technologies, and provides a great deal of benefits to the server architecture. 

One of the most common challenges with server virtualization is the networking.  Virtualized servers typically consist of networks of virtual machines that are configured by the server team with little to no management/monitoring possible from the network/security teams.  This causes inconsistent policy enforcement between physical and virtual servers as well as limited network functionality for virtual machines.

Virtual Networks

image

The separate network management models for virtual servers and physical servers presents challenges to: policy enforcement, compliance, and security, as well as adds complexity to the configuration and architecture of virtual server environments.  Due to this fact many vendors are designing products and solutions to help draw these networks closer together.

The following is a discussion of three products that can be used for this, HP’s Flex-10 adapters, Cisco’s Nexus 1000v and Cisco’s Virtual interface Card (VIC.) 

This is not a pro/con or discussion of which is better, just an overview of the technology and how it relates to VMware.

HP Flex-10 for Virtual Connect:

Using HP’s Virtual Connect switching modules for C-Class blades and either Flex-10 adapters or Lan-On-Motherboard (LOM) administrators can ‘partition the bandwidth of a single 10Gb pipeline into multiple “FlexNICs.” In addition, customers can regulate the bandwidth for each partition by setting it to a user-defined portion of the total 10Gb connection. Speed can be set from 100 Megabits per second to 10 Gigabits per second in 100 Megabit increments.’ (http://bit.ly/boRsiY)

This allows a single 10GE uplink to be presented to any operating system as 4 physical Network Interface Cards (NIC.)

FlexConnect

image

In order to perform this interface virtualization FlexConnect uses internal VLAN mappings for traffic segregation within the 10GE Flex-10 port (mid-plane blade chassis connection from the Virtual Connect Flex-10 10GbE interconnect module and the Flex-10 NIC device.)  Each FlexNIC can present one or more VLANs to the installed operating system.

Some of the advantages with this architecture are:

  • A single 10GE link can be divided into 4 separate logical links each with a defined portion of the bandwidth.
  • More interfaces can be presented from fewer physical adapters which is extremely advantageous within the limited space available on blade servers.

When the installed operating system is VMware this allows for 2x10GE links to be presented to VMware as 8x separate NICs and used for different purposes such as vMotion, Fault Tolerance (FT), Service Console, VM kernel and data.

The requirements for Flex-10 as described here are:

  • HP C-Class blade chassis
  • VC Flex-10 10GE interconnect module (HP blade switches)
  • Flex-10 LOM and or Mezzanine cards

Cisco Nexus 1000v:

‘Cisco Nexus® 1000V Series Switches are virtual machine access switches that are an intelligent software switch implementation for VMware vSphere environments running the Cisco® NX-OS operating system. Operating inside the VMware ESX hypervisor, the Cisco Nexus 1000V Series supports Cisco VN-Link server virtualization technology to provide:

• Policy-based virtual machine (VM) connectivity

• Mobile VM security and network policy, and

• Non-disruptive operational model for your server virtualization, and networking teams’(http://bit.ly/b4JJX5.)

The Nexus 1000v is a Cisco software switch which is placed in the VMware environment and provides physical type network control/monitoring to VMware virtual networks.  The Nexus 1000v is comprised of two components the Virtual Supervisor Module (VSM) and Virtual Ethernet Module (VEM.)  The Nexus 1000v does not have hardware requirements and can be used with any standards compliant physical switching infrastructure.  Specifically the upstream switch should support 802.1q trunks and LACP.

Cisco Nexus 1000v

image

Using the Nexus 1000v Network teams have complete control over the virtual network and manage it using the same tools and policies used on the physical network.

Some advantages of the 1000v are:

  • Consistent policy enforcement for physical and virtual servers
  • vMotion aware policies migrate with the VM
  • Increased, security, visibility and control of virtual networks

The requirements for Cisco Nexus 1000v are:

  • vSphere 4.0 or higher
  • Enterprise + VMware license
  • Per physical host CPU VEM license
  • Virtual Center Server

Cisco Virtual interface Card (VIC):

The Cisco VIC provides interface virtualization similar to the Flex-10 adapter.  One 10GE port is able to be presented to an operating system as up to 128 virtual interfaces depending on the infrastructure. ‘The Cisco UCS M81KR presents up to 128 virtual interfaces to the operating system on a given blade. The virtual interfaces can be dynamically configured by Cisco UCS Manager as either Fibre Channel or Ethernet devices’ (http://bit.ly/9RT7kk.)

Fibre Channel interfaces are known as vFC and Ethernet interfaces are known as vEth, they can be used in any combination up to the architectural limits.  Currently the VIC is only available for Cisco UCS blades but will be supported on UCS rack mount servers as well by the end of 2010.  Interfaces are segregated using an internal tagging mechanism known as VN-Tag which does not use VLAN tags and operates independently of VLAN operation.

Virtual Interface Card

image

Each virtual interface acts as if directly connected to a physical switch port and can be configured in Access or Trunk mode using 802.1q standard trunking. These interfaces can then be used by any operating system or VMware.  For more information on their use see my post Defining VN-Link (http://bit.ly/ddxGU7.)

VIC Advantages:

  • Granular configuration of multiple Fibre Channel and Ethernet ports on one 10GE link.
  • Single point of network configuration handled by a network team rather than a server team.

Requirements:

  • Cisco UCS B-series blades (until C-Series support is released)
  • Cisco Fabric interconnect access layer switches/managers.

Summary:

Each of these products has benefits in specific use cases and can reduce overhead and/or administration for server networks.  When combining one or more of these products you should carefully analyze the benefits of each and identify features that may be sacrificed by combining the two.  For instance using the Nexus 1000v along with FlexConnect adds a Server administered network management layer in between the physical network and virtual network.

Nexus 1000v with Flex-10

image

Comments and corrections are always welcome.

GD Star Rating
loading...