EMC VSPEX

EMC recently announced VSPEX (http://www.emc.com/about/news/press/2012/20120412-01.htm)which is a series of reference architectures designed with: Cisco, Brocade, Citrix, Intel, Microsoft, and VMware.  The intent of these architectures is to provide proven designs for cloud computing while providing customer choice and flexibility.  Overall the intent is to provide flexible architectures of best-of-breed components for cloud computing.

The VSPEX solutions are focused on virtualized infrastructure for private cloud and end-user computing environments.  Current options provide VMware vSphere 5.0 and Microsoft Windows Hyper-V server virtualization from 50 – 250 VMs as well as VMware View and Citrix XenDesktop solutions from 50 – 2000 desktops.  Additionally VSPEX architectures factor in unified management and backup/recovery.  The initial launch solutions are: VMware view (250, 500, 1000, 2000 users), Citrix XenDesktop (250, 500, 1000, 2000 users), VMware Private Cloud (125 & 250 Virtual Machines), VMware Private Cloud (50 & 100 Virtual Machines), Microsoft Private Cloud (50 & 100 Virtual Machines.)  Full details can be found at: http://www.emc.com/platform/virtualizing-information-infrastructure/vspex.htm#!resources.

The reference architecture are further supported through VSPEX Labs from EMC for testing and configuration,which enables partners to validate specific configurations.  The model also enables partners to  further drive new functionality into VSPEX based on their customer base.  First-Level Support will be provided by the EMC channel partner and backed by EMC.

VSPEX is different from Vblock’s offered by VCE The Virtual Computing Environment Company and are more along the lines of FlexPod which is a collaboration of NetApp and Cisco with flavors for VMware, Citrix and several other applications/deployments.  The VSPEX reference architectures offer more choice and flexibility while sacrificing some in the way of acquisition, and operational support.  This gap again presents an opportunity for EMC channel partners to differentiate themselves with custom offerings to fill these gaps.

Overall VSPEX is an excellent offering for both customers and EMC channel partners.  It provides additional options for deploying reliable, tested integrated hardware stacks for private cloud and end-user computing environments.  It also provides a framework and foundation for partners to build a custom solution set from.

GD Star Rating
loading...

Choosing The Right Private Cloud Storage

One of the key decisions in architecting an infrastructure for private cloud is selecting a storage platform for the deployment. Storage is a key component of the infrastructure and will play a major role in the overall performance of the private cloud. The storage decision carries additional weight due to its larger investment and typically longer refresh-cycle…..

To view the full article visit Network Computing: http://www.networkcomputing.com/private-cloud/231901384

GD Star Rating
loading...

Innovative Versus Integration Cloud Stacks

The Live Webcast with NetApp and Kingman Tang went quite well with good discussion on private cloud and data center stacks.  Check out the recording below.

A BrightTALK Channel

GD Star Rating
loading...

Why NetApp is my ‘A-Game’ Storage Architecture

One of, if not the, most popular of my blog posts to date has been ‘Why Cisco UCS is my ‘A-Game’ Server Architecture (http://www.definethecloud.net/why-cisco-ucs-is-my-a-game-server-architecture.)  In that post I describe why I lead with Cisco UCS for most consultative engagements.  This follow up for storage has been a long time coming, and thanks to some ‘gentle’ nudging and random coincidence combined with an extended airport wait I’ve decided to get this posted.

If you haven’t read my previous post I take the time to define my ‘A-Game’ architectures as such:

“The rule in regards to my A-Game is that it’s not a rule, it’s a launching point. I start with a specific hardware set in mind in order to visualize the customer need and analyze the best way to meet that need. If I hit a point of contention that negates the use of my A-Game I’ll fluidly adapt my thinking and proposed architecture to one that better fits the customer. These points of contention may be either technical, political, or business related:

  • Technical: My A-Game doesn’t fit the customers requirement due to some technical factor, support, feature, etc.
  • Political: My A-Game doesn’t fit the customer because they don’t want Vendor X (previous bad experience, hype, understanding, etc.)
  • Business: My A-Game isn’t on an approved vendor list, or something similar.

If I hit one of these roadblocks I’ll shift my vendor strategy for the particular engagement without a second thought. The exception to this is if one of these roadblocks isn’t actually a roadblock and my A-Game definitely provides the best fit for the customer I’ll work with the customer to analyze actual requirements and attempt to find ways around the roadblock.

Basically my A-Game is a product or product line that I’ve personally tested, worked with and trust above the others that is my starting point for any consultative engagement.

In my A-Game Server post I run through my hate then love relationship that brought me around to trust, support, and evangelize UCS; I cannot express the same for NetApp.  My relationship with NetApp fell more along the lines of love at first sight.

NetApp – Love at first sight:

I began working with NetApp storage at the same time I was diving headfirst into datacenter as a whole.  I was moving from server admin/engineer to architect and drinking from the SAN, Virtualization, and storage firehouse.  I had a fantastic boss who to this day is a mentor and friend that pushed me to learn quickly and execute rapidly and accurately, thanks Mike!  The main products our team handled at the time were: IBM blades/servers, VMware, SAN (Brocade and Cisco) and IBM/NetApp storage.  I was never a fan of the IBM storage.  It performed solidly but was a bear to configure, lacked a rich feature set and typically got put in place and left there untouched until refresh.  At the same time I was coming up to speed on IBM storage I was learning more and more about NetApp.

From the non-technical perspective NetApp had accessible training and experts, clear value-proposition messaging and a firm grasp on VMware, where virtualization was heading and how/why it should be executed on.  This hit right on with what my team was focused on.  Additionally NetApp worked hard to maintain an excellent partner channel relationship, make information accessible, and put the experts a phone call or flight away.  This made me WANT to learn more about their technology.

The lasting bonds:

Breakfast food, yep breakfast food is what made NetApp stick for me, and still be my A-game four years later. Not just any breakfast food, but a personal favorite of mine; beer and waffles, err, umm… WAFL (second only to chicken and waffles and missing only bacon.)  Data ONTAP (the beer) and NetApp’s Write Anywhere File System (WAFL) are at the heart of why they are my A-Game.  While you can find dozens of blogs, competitive papers, etc. attacking the use of WAFL for primary block storage, what WAFL enables is amazing from a feature perspective, and the performance numbers NetApp can put up speak for themselves.  Because, unlike a traditional block based array, NetApp owns the underlying file system they can not only do more with the data, but they can more rapidly adapt to market needs with software enhancements.  Don’t take my word for it, do some research, look at the latest announcements from other storage leaders and check to see what year NetApp announced their version of those same features, with few exceptions you’ll be surprised.  The second piece of my love for NetApp is Data ONTAP.  NetApp has several storage controller systems ranging from the lower end to the Tier-1 high-capacity, high availability systems.  Regardless of which one you use, you’re always using the same operating/management system, Data ONTAP.  This means that as you scale, change, refresh, upgrade, downgrade, you name it, you never have to retrain AND you keep a common feature set.

My love for breakfast is not the only draw to NetApp, and in fact without a bacon offering I would have strayed if there weren’t more (note to NetApp: Incorporate fatty pork the way politicians do.) 

Other features that keep NetApp top of my list are:

  • Primary block-level storage Deduplication with real world savings at 70+ % with minimal performance hit (and no license fee to boot)
  • Ease of upgrade/downgrade (keep the shelves of disks, replace the controllers, data stays)
  • Read/Write ‘0’ space/cost clones (the ability to clone various data sets in a read/write status using only pointers and storing only the change ‘delta’) and FlexClone capabilities as a whole
  • Highly optimized snapshots for point-in-time rollback, test/dev, etc.
  • VMware plugins to enable VMware admins to manage and monitor their own storage allotments
  • Storage virtualization, the ability to carve out storage and the management of that storage to multiple tenants in a similar fashion to what VMware does for servers
  • Ability to get 80% of the performance benefits of a shelf of SSD drives by adding Flash Cache (PAM II) cards 

Add to that more recent features such as first to market with FCoE based storage and you’ve got a winner in my book.  All that being said I still haven’t covered the real reason NetApp is the first storage vendor in my head anytime I talk about storage.

Unification:

Anytime I’m talking about servers I’m talking about virtualization as well.  Because I don’t work in the Unix or Mainframe worlds I’m most likely talking about VMware (90% market share has that effect.)  When dealing with virtualization my primary goals are consolidation/optimization and flexibility.  In my opinion nobody can touch NetApp storage for this.  I’m a fan of choice and options, I also like particular features/protocols for particular use cases.  On most storage platforms I have to choose my hardware based on the features and protocols my customers require, and most likely use more than one platform to get them all.  This isn’t the case with NetApp.  With few exceptions every protocol/feature is available simultaneously with any given hardware platform.  This means I can run iSCSI, FC, FCoE or all of the above for block based needs at the same time I run CIFS natively to replace Windows file servers, and NFS for my VMware data stores.  All of that from the same box or even the same ports!  This lets me tier my protocols and features to the application requirements instead of to my hardware limitations.

I’ve been working on VMware deployments in some fashion for four years, and have seen dozens of unique deployments but personally never deployed or worked with a VMware environment that ran off a single protocol, typically at a minimum NFS is used for ISO datastores and CIFS can be used to eliminate Windows file servers rather than virtualize them, with a possible block based protocol involved for boot or databases.

Additionally NetApp offers features and functionality to allow multiple storage functions to be consolidated on a single system.  You no longer require separate hardware for primary, secondary, backup, DR, and archive.  All of this can then be easily setup and managed for replication across any of NetApp’s platforms, or many 3rd party systems front-ended with V-series.  These two pieces combined create a truly ‘unified’ platform.

When do I bring out my B-Game?

NetApp like any solution I’ve ever come across is not the right tool for every job.  For me they hit or exceed the 80/20 rule perfectly.  A few places where I don’t see NetApp as a current fit:

  • Small to Medium Business (SMB) – At the SMB level a single protocol solution may work and you can find lower cost solutions that fit the bill, but if you scale faster than expected you’re stuck with a single protocol platform and may end up having to purchase and manage additional devices if/when needs change
  • Massive scalability – Here I’m talking public cloud petabytes upon petabytes where systems like Isilon from EMC and its competitors have the lead
  • Top-Tier performance and enterprise class reliability for Tier-1 applications –  Here at the very high end typically EMC or Hitachi are the players, and IBM using SVC may also play
  • Mainframes, NetApp don’t play that and Big Blue don’t support it  

Summary:

While I stick to there are no ‘one-size fits all’ IT solutions, and that my A-Game is a starting point not a rule I find NetApp to hit the bulls eye for 80+ percent of the market I work with.  Not only do they fit upfront, but they back it up with support, continued innovation, and product advancement.  NetApp isn’t ‘The Growth Company’ and #2 in storage by luck or chance (although I could argue they did luck out quite a bit with the timing of the industry move to converged storage on 10GE.)

Another reason NetApp still reigns king as my A-Game is the way in which it marries to my A-Game server architecture.  Cisco UCS enables unification, protocol choice and cable consolidation as well as virtualization acceleration, etc.  All of these are further amplified when used alongside NetApp storage which allows rapid provisioning, protocol options, storage consolidation and storage virtualization, etc.  Do you want to pre-provision 50 (or 250) VMware hosts with 25 GB read/write boot LUNs ready to go at the click of a template?  Do you want to do this without utilizing any space up front?  UCS and NetApp have the toolset for you.  You can then rapidly bring up new customers, or stay at dinner with your family while a Network Operations Center (NOC) administrator deploys a pre-architected pre-secured, pre-tested and provisioned server from a template to meet a capacity burst.

If you’re considering a storage decision, a private cloud migration, or a converged infrastructure pod make sure you’re taking a look at NetApp as an option and see it for yourself.  For some more information on NetApp’s virtualization story see the links below:

TR3856: Quantifying the Value of Running VMware on NetApp 

TR3808: VMware vSphere and ESX 3.5 Multiprotocol Performance Comparison Using FC, iSCSI, and NFS

GD Star Rating
loading...

Have We Taken Data Redundancy too Far?

During a recent conversation about disk configuration and data redundancy on a storage array I began to think about everything we put into data redundancy.  The question that came to mind is the title of this post ‘Have we taken data redundancy too far?’

Now don’t get me wrong, I love data as much as the next fellow, and I definitely understand it’s importance to the business and to compliance. I’m not advocating tossing out redundancy or data protection, etc etc.  My question is when is enough enough, and or is there a better way?

To put this in perspective let’s take a look at everything that stacks up to protect enterprise data:

Disks:

We start with the lowly disk, which by itself has no redundancy.  While disks themselves tend to have one of the highest failure rates in the data center they have definitely come a long way.  Many have the ability to self protect and warn of impending failure at a low level, and they can last for years without issue.

Disks alone are a single point-of-failure in which all data on the disk is lost if the drive fails.  Because of this we’ve worked to come up with better ways to apply redundancy to the data.  The simplest form of this is RAID.

RAID:

RAID is ‘Redundant Array of Inexpensive Disks’, it’s also correct to say it ‘Redundant Array of Independent Disks.’  No matter what you call it RAID allows what would typically be a single disk on its own to act as part of a group of disks for the purposes of redundancy, performance or both.  You can think of this like disk clustering.

Some common RAID types used for redundancy are:

    • RAID 0 – Disk striping, data is striped across 2 disks to improve performance, each disk becomes a single point of failure for the entirety of the stored data.
    • RAID 1 – Disk mirroring, data is written to both disks simultaneously as an exact copy.  This method allows either disk to fail with no data loss. 
    • RAID 5 – Raid 5 is striping with parity.  What this mean is that using three or more disks all data is written in stripes across available disks and additional parity data is striped across the disks to provide redundancy.  Because of the parity data one disk can be lost from the group without data loss.  Raid 5 is N-1 capacity, meaning you lose 1 disk worth of space to the parity data.
    • RAID 6 – Disk striping with double parity.  Think RAID 5 with an extra disk lost for parity but the ability to lose two disks without data loss.  This is N-2 capacity.

In many cases ‘hot-spares’ will also be added to the RAID groups.  The purpose of a hot’-spare is to have a drive online but not participating in the RAID for failure events.  If a RAID disk fails the hot-spare can be used to replace it immediately until an administrator can swap the bad drive.

Snapshots:

Another level of redundancy many enterprise storage arrays will use is snapshots.  Snapshots can be used to perform point-in-time recoveries.  Basically when a snapshot is taken it locks the associated blocks of data ensuring they are not modified without copying them.  If a block needs to be changed it will be written in a new location without effecting the original.  In order to revert to a snapshot the change data is simply removed leaving the original locked blocks.  While snapshots are not a backup or redundancy feature on their own they can be used as part of other systems, and are excellent for development environments where testing is required on various data sets, etc.  Snapshots consume additional space as two copies are kept of any locked block that is changed.

Primary/Secondary replication:

Another method for creating data redundancy is tiered storage.  In a tiered redundancy model the primary storage serving the applications is held on the highest performing disk and data is backed up or replicated to lower performance less expensive disk or disk arrays.

Virtual Tape Libraries (VTL):

Virtual tape libraries are storage arrays that present themselves as standard tape libraries for the purposes of backup and archiving.  VTL is typically used in between primary storage and actual tape backups as a means of decreasing the backup window.

Tape backups:

In most cases the last stop for backup and archiving is still tape.  This is because tape is cheap, high density, and ultra-portable.  Large amounts of data can be streamed to tape libraries which can store the data and allow tapes to be sent to off-site storage facilities.

Adding it up:

When you put these redundancy and recovery systems together and start layering them on top of one another you end up with high ratios of storage media being purposed for redundancy and recovery compared to the actual data being served.  10:1, 20:1, 100:1 or more is not uncommon when considering archive/redundancy space compared to usable space.

Summary:

My summary is more of a repeat of the same question.  Have we taken this too far?  Do we need protection built in at each level, and layered on top of one another?  Can we afford to continue down this path adding redundancy at the expense of performance and utilization?  Should we throw higher parity RAID at our arrays and make up the performance hit with expensive cache?  Should we purchase 10TB of media for every 1TB we actually need to serve?  Is there a better way?

I don’t have the answer to this one, but would love to see a discussion on it.  The way I’m thinking now is bunches of dumb independent disk pooled and provisioned through software.  Drop the RAID and hot spares, use the software to maintain multiple local or global copies on different hardware.  When you start moving the disk thinking to cloud environments and talking about Petabytes or more of data the current model starts unraveling quickly.

GD Star Rating
loading...

FCoE multi-hop; Do you Care?

There is a lot of discussion in the industry around FCoE’s current capabilities, and specifically around the ability to perform multi-hop transmission of FCoE frames and the standards required to do so.  A recent discussion between Brad Hedlund at Cisco and Ken Henault at HP (http://bit.ly/9Kj7zP) prompted me to write this post.  Ken proposes that FCoE is not quite ready and Brad argues that it is. 

When looking at this discussion remember that Cisco has had FCoE products shipping for about 2 years, and has a robust product line of devices with FCoE support including: UCS, Nexus 5000, Nexus 4000 and Nexus 2000, with more products on the road map for launch this year.  No other switching vendor has this level of current commitment to FCoE.  For any vendor with a less robust FCoE portfolio it makes no sense to drive FCoE sales and marketing at this point and so you will typically find articles and blogs like the one mentioned above.  The one quote from that blog that sticks out in my mind is:

“Solutions like HP’s upcoming FlexFabric can take advantage of FCoE to reduce complexity at the network edge, without requiring a major network upgrades or changes to the LAN and SAN before the standards are finalized.”

If you read between the lines here it would be easy to take this as ‘FCoE isn’t ready until we are.’  This is not unusual and if you take a minute to search through articles about FCoE over the last 2-3 years you’ll find that Cisco has been a big endorser of the protocol throughout (because they actually had a product to sell) and other vendors become less and less anti-FCoE as they announce FCoE products.

It’s also important to note that Cisco isn’t the only vendor out there embracing FCoE: NetApp has been shipping native FCoE storage controllers for some time, EMC has them road mapped for the very near future, Qlogic is shipping a 2nd generation of Converged Network adapter, and Emulex has fully embraced 10Gig Ethernet as the way forward with their OneConnect adapter (10GE, iSCSI, FCoE all in one card.)  Additionally support for FCoE switching of native Fibre Channel storage is widely supported by the storage community.

Fibre Channel over Ethernet (FCoE) is defined in IEEE FC-BB5 and requires the switches it traverses to support the IEEE Data Center Bridging (DCB)standards for proper traffic treatment on the network.  For more information on FCoE or DCB see my previous posts on the subjects (FCoE: http://www.definethecloud.net/?p=80, DCB: http://www.definethecloud.net/?p=31.)

DCB Has four major components, and the one in question in the above article is Quantized Congestion Notification (QCN) which the article states is required for multi-hop FCoE.  QCN is basically a regurgitation of FECN and BECN from frame relay.  It allows a switch to monitor it’s buffers and push congestion to the edge rather than clog the core. In the comments Brad correctly states that QCN is not required for FCoE, the reason for this is that Fibre Channel operates today without any native version of QCN, therefore when placing it on Ethernet you will not need to add functionality that wasn’t there to begin with, remember Ethernet is just a new layer 1-2 for native FC layers 2-4, the FC secret sauce remains unmodified.  Remember that not every standard defined by a standards body has to be adhered to by every device, some are required, some are optional.  Logical SANs are a great example of an optional standard.

Rather than discuss what is or isn’t required for multi-hop FCoE I’d like to ask a more important question that we as engineers tend to forget: Do I care?  This question is key because it avoids having us argue the technical merits of something we may never actually need, or may not have a need for today.

Do we care?

First let’s look at why we do multi-hop anything: to expand the port-count of our network.  Take TCP/IP networks and the internet for example, we require the ability to move packets across the globe through multiple routers (hops.)  This is in order to attach devices on all corners of the globe.

Now let’s look at what we do with FC today: typically one or two hop networks (sometimes three) used to connect several hundred devices (occasionally but rarely more.)  It’s actually quite common to find FC implementations with less than 100 attached ports.  This means that if you can hit the right port count without multiple hops you can remove complexity and decrease latency, in Storage Area Networks (SAN) we call this the collapsed core design.

The second thing to consider is a hypothetical question: If FCoE were permanently destined for single hop access/edge only deployments (it isn’t) should that actually stop you from using it?  The answer here is an emphatic no, I would still highly recommend FCoE as an access/edge architecture even if it were destined to connect back to an FC SAN and Ethernet LAN for all eternity.  Let’s jump to some diagrams to explain.  In the following diagrams I’m going to focus on Cisco architecture because as stated above they are currently the only vendor with a full FCoE product portfolio.

 image

In the above diagram you can see a fairly dynamic set  of FCoE connectivity options.  Nexus 5000 can be directly connected to servers, or to Nexus 4000 in IBM BladeCenter to pass FCoE.  It can also be connected to 10GE Nexus 2000s to increase its port density. 

To use the nexus 5000 + 2000 as an example it’s possible to create a single-hop (2000 isn’t an L2 hop it is an extension of the 5000) FCoE architecture of up to 384 ports with one point of switching management per fabric.  If you take server virtualization into the picture and assume 384 servers with a very modest V2P ratio of 10 virtual machines to 1 physical machine that brings you to 3840 servers connected to a single hop SAN.  That is major scalability with minimal management all without the need for multi-hop. The diagram above doesn’t include the Cisco UCS product portfolio which architecturally supports up to 320 FCoE connected servers/blades.

The next thing I’ve asked you to think about is whether or not you should implement FCoE in a hypothetical world where FCoE stays an access/edge architecture forever.  The answer would be yes.  In the following diagrams I outline the benefits of FCoE as an edge only architecture.

image

The first benefit is reducing the networks that are purchased, managed, power, and cooled from 3 to 1 (2 FC and 1 Eth to 1 FCoE.)  Even just at the access layer this is a large reduction in overhead and reduces the refresh points as I/O demands increase.

image The second benefit is the overall infrastructure reduction at the access layer.  Taking a typical VMware server as an example we reduce 6x 1GE ports, 2x 4GFC ports and the 8 cables required for them to 2x 10GE ports carrying FCoE.  This increases total bandwidth available while greatly reducing infrastructure.  Don’t forget the 4 top-of-rack switches (2x FC, 2x GE) reduced to 2 FCoE switches.

Since FCoE is fully compatible with both FC and pre-DCB Ethernet this requires 0 rip-and-replace of current infrastructure.  FCoE is instead used to build out new application environments or expand existing environments while minimizing infrastructure and complexity.

What if I need a larger FCoE environment?

If you require a larger environment than is currently supported extending your SAN is quite possible without multi-hop FCoE.  FCoE can be extended using existing FC infrastructure.  Remember customers that require an FCoE infrastructure this large already have an FC infrastructure to work with.

image 

What if I need to extend my SAN between data centers?

FCoE SAN extension is handled in the exact same way as FC SAN extension, CWDM, DWDM, Dark Fiber, or FCIP.  Remember we’re still moving Fibre Channel frames.

image

Summary:

FCoE multi-hop is not an argument that needs to be had for most current environments.  FCoE is a supplemental technology to current Fibre Channel implementations.  Multi-hop FCoE will be available by the end of CY2010 allowing 2+ tier FCoE networks with multiple switches in the path, but there is no need to wait for them to begin deploying FCoE.  The benefits of an FCoE deployment at the access layer only are significant, and many environments will be able to scale to full FCoE roll-outs without ever going mutli-hop. 

GD Star Rating
loading...

The Cloud Storage Argument

The argument over the right type of storage for data center applications is an ongoing battle.  This argument gets amplified when discussing cloud architectures both private and public.  Part of the reason for this disparity in thinking is that there is no ‘one size fits all solution.’  The other part of the problem is that there may not be a current right solution at all.

When we discuss modern enterprise data center storage options there are typically five major choices:

  • Fibre Channel (FC)
  • Fibre Channel over Ethernet (FCoE)
  • Internet Small Computer System Interface (iSCSI)
  • Network File System (NFS)
  • Direct Attached Storage (DAS)

In a Windows server environment these will typically be coupled with Common internet File Service (CIFS) for file sharing.  Behind these protocols there are a series of storage arrays and disk types that be used to meet the applications I/O requirements.

As people move from traditional server architectures to virtualized servers, and from static physical silos to cloud based architectures they will typically move away from DAS into one of the other protocols listed above to gain the advantages, features and savings associated with shared storage.  For the purpose of this discussion we will focus on these four: FC, FCoE, iSCSI, NFS.

The issue then becomes which storage protocol to use for transport of your data from the server to the disk?  I’ve discussed the protocol differences in a previous post (http://www.definethecloud.net/?p=43) so I won’t go into the details here.  Depending on who you’re talking to it’s not uncommon to find extremely passionate opinions.  There a quite a few consultants and engineers that are hard coded to one protocol or another.  That being said most end-users just want something that works, performs adequately and isn’t a headache to manage.

Most environments currently work on a combination of these protocols, plenty of FC data centers rely on DAS to boot the operating system and NFS/CIFS for file sharing.  The same can be said for iSCSI.  With current options a combination of these protocols is probably always going to be best, iSCSI, FCoE, and NFS/CIFS can be used side by side to provide the right performance at the right price on an application by application basis.

The one definite fact in all of the opinions is that running separate parallel networks as we do today  with FC and Ethernet is not the way to move forward, it adds cost, complexity, management, power, cooling and infrastructure that isn’t needed.  Combining protocols down to one wire is key to the flexibility and cost savings promised by end-to-end virtualization and cloud architectures.  If that’s the case which wire do we choose, and which protocol rides directly on top to transport the rest?

10 Gigabit Ethernet is currently the industries push for a single wire and with good reason:

  • It’s currently got enough bandwidth/throughput to do it (10gigabits using 64b/66b encoding as opposed to FC/Infiniband which currently use 8b/10b with 20% overhead)
  • It’s scaling fast 40GE and 100GE are well on their way to standardization (As opposed to 16G and 32G FC)
  • Everyone already knows and uses it, yes that includes you.

For the sake of argument let’s assume we all agree on 10GE as the right wire/protocol to carry all of our traffic, what do we layer on top?  FCoE, iSCSI, NFS, something else?  Well that is a tough question.  the first part of the answer is you don’t have to decide, this is very important because none of these protocols is mutually exclusive.  The second part of the answer is, maybe none of these is the end-all-be-all long-term solution.  Each current protocol has benefits and draw backs so let’s take a quick look:

  • iSCSI: Block level protocol carrying SCSI over IP.  Works with standard Ethernet but can have performance issues on congested networks, also incurs IP protocol overhead.  iSCSI is great on standard Ethernet networks until congestion occurs, once the network becomes fully utilized iSCSI performance will tend to drop.
  • FCoE: Block level protocol which maintains Fibre Channel reliability and security while using underlying Ethernet.  Requires 10GE or above and DCB (http://www.definethecloud.net/?p=31) capable switches.  FCoE is currently well proven and reliable at the access layer and a fantastic option there, but no current solutions exist to move it up further into the network.  Products are on the road map to push FCoE further into the network but that may not necessarily be the best way forward.
  • NFS: File level protocol which runs on top of UDP or TCP and IP.

And a quick look at comparative performance:

Protocol Performanceimage

While the above performance model is subjective and network tuning and specific equipment will play a big role the general idea holds sound.

One of the biggest factors that needs to be considered when choosing these protocols is block vs. file.  Some applications require direct block access to disk, many databases fall into this category.  As importantly if you want to boot an operating system from disk block level protocol (iSCSI, FCoE) are required.  This means that for most diskless configurations you’ll need to make a choice between FCoE and iSCSI (still within the assumption of consolidating on 10GE.)  Diskless configurations have major benefits in large scale deployments including power, cooling, administration, and flexibility so you should at least be considering them.

If you chosen a diskless configuration and settled on iSCSI or FCoE for your boot disks now you still need to figure out what to do about file shares?  CIFS or NFS are your next decision, CIFS is typically the choice for Windows, and NFS for Linux/UNIX environments.  Now you’ve wound up with 2-3 protocols running to get your storage settled and your stacking those alongside the rest of your typical LAN data.

Now to look at management step back and take a look at block data as a whole.  If you’re using enterprise class storage you’ve got several steps of management to configure the disk in that array.  It varies with vendor but typically something to the effect of:

  1. Configure the RAID for groups of disks
  2. Pool multiple RAID groups
  3. Logically sub divide the pool
  4. Assign the logical disks to the initiators/servers
  5. Configure required network security (FC zoning/ IP security/ACL, etc)

While this is easy stuff for storage and SAN administrators it’s time consuming, especially when you start talking about cloud infrastructures with lots and lots of moves adds and changes.  It becomes way to cumbersome to scale into petabytes with hundreds or thousands of customers.  NFS has more streamlined management but it can’t be used to boot an OS.  This makes for extremely tough decisions when looking to scale into large virtualized data center architectures or cloud infrastructure.

There is a current option that allows you to consolidate on 10GE, reduce storage protocols and still get diskless servers.  I
t’s definitely not the solution for every use case (there isn’t one), and it’s only a great option because there aren’t a whole lot of other great options.

In a fully virtualized environment NFS is a great low management overhead protocol for Virtual Machine disks.  Because it can’t boot we need another way to get the operating system to server memory.  That’s where PXE Boot comes in.  Pre eXecutionEnvironment (PXE) is a network OS boot that works well for small operating systems, typically terminal clients or Linux images.  It allows for a single instance of the operating system to be stored on a PXE server attached to the network, and a diskless server to retrieve that OS at boot time.  Because some virtualization operating systems (Hypervisors) are light weight, they are great candidates for PXE boot.  This allows the architecture below.

PXE/NFS 100% Virtualized Environment

image

Summary:

While there are  several options for data center storage none of them solves every need.  Current options increase in complexity and management as the scale of the implementation increases.  Looking to the future we need to be looking for better ways to handle storage.  Maybe block based storage has run it’s course, maybe SCSI has run it’s course, either way we need more scalable storage solutions available to the enterprise in order to meet the growing needs of the data center and maintain manageability and flexibility.  New deployments should take all current options into account and never write off the advantages of using more than one, or all of them where they fit.

GD Star Rating
loading...

Storage Protocols

Storage is a major consideration for cloud initiatives; what type of disk, which vendor, and as importantly which protocol?  Experts will tout one over the other based on cost, performance, throughput, etc.  Let’s take a look at the major storage protocols at play in the data center:

Small Computer System Interface (SCSI):

SCSI is the dominant block level access method for disk in the data center.  Blocks are typically the smallest unit that can be read or written to on a disk, they exist in various sizes depending on disk type and usage.  Block level access means that the server can directly access the disk blocks without the need for a file system in place on top of them, this is opposite of file-based storage discussed later.

SCSI has been in use since the early 1980’s and was originally used to move data within a single server.  The operating system handles writing data using the SCSI protocol to a SCSI drive controller which managed one or more devices on a SCSI cable within a system chassis.  The SCSI controller ensured that only one device would be active on the cable at any time which prevents contention on the SCSI bus.  Because SCSI was managed by a single controller and contained within a system the chance for data loss, or contention were minimal, this meant that SCSI did not require control mechanisms to handle data loss or contention as with networked protocols. SCSI itself is still widely used in its native format but it has also been encapsulated into other protocols for use within storage networks for consolidated storage.

Fibre Channel (FC):

Fibre Channel was designed to extend the functionality of SCSI into point-to-point, loop, and switched topologies.  This allows for longer distances as well as storage consolidation.  FC encapsulates SCSI data and Command Descriptor Blocks (CDB) into the payload of Fibre Channel frames.  Fibre Channel networks provided the addressing, routing, and flow-control required to support SCSI data.  Additionally Fibre Channel networks are designed to meet the needs of SCSI by providing ‘lossless’ in order delivery.  This means that in a stable network FC frames will not be dropped, and are delivered in order ensuring that the Upper Layer Protocols (ULP) will not be forced to reorder or resend frames.

Fibre Channel networks are typically carried over fiber-optic links on dedicated infrastructures.  These infrastructures are traditionally built-in pairs as exact mirrors of one another.  This provides complete physical redundancy end-to-end.  Additionally these networks provide high bandwidth and low-latency.  FC networks come in 1/2/4/8 Gbps speeds with 16/32 Gbps in the works.  Additionally 10Gbps FC links are typically available on a proprietary basis for links between switches.

internet/IP Small Computer System Interface (iSCSI):

iSCSI takes SCSI data and CDBs and places it in the payload of IP packets.  This allows the SCSI protocol to be extended across existing IP infrastructures.  While IP is routable within the data center and across the WAN iSCSI is not traditionally used/supported over routed boundaries (exceptions do exist.)  The draw of iSCSI has been that storage data can be extended across the existing infrastructure with minimal additional cost.

iSCSI has not gained the market share many have predicted over the years due to flaws in the protocol and limitations of the traditional Ethernet based data center networks.  until the standardization of 10 Gigabit Ethernet most data centers relied on 1GE links which were typically saturated already.  This meant implementing iSCSI required new switching infrastructure.  10GE has changed the bandwidth limits but still not catapulted iSCSI into the mainstream.  There are several reasons for this, one being that there is large existing investment in Fibre Channel, and two being the iSCSI protocol itself.

The problem with iSCSI from a protocol standpoint is that it takes the SCSI protocol which expects lossless, in-order delivery, and places it in TCP/IP packets which are designed to support heterogeneous WAN networks and experience packet loss and out-of-order delivery frequently.  This is done without providing any additional tools to either SCSI or TCP/IP for handling the SCSI payloads in the expected fashion.  This in no way means iSCSI is unusable or should be written off it just means that additional considerations must be made when designing iSCSI, especially in the Enterprise or larger environment.

In order to provide proper performance for iSCSI on shared networks Quality of Service (QoS), physical architecture, and jumbo frame support must be taken into account.  Because of these considerations many iSCSI networks have traditionally been placed on separate network hardware from the data center LAN (isolated iSCSI networks.)  This has minimized some of the benefits of consolidating on a single protocol.  With 10 Gigabit Ethernet and the standardization of Data Center Bridging (DCB) iSCSI looks more promising for a greater audience.  For more information on DCB see my previous post (http://www.definethecloud.net/?p=31.)

Fibre Channel over Ethernet (FCoE):

FCoE was ratified in 2009 and provides the functionality for moving native Fibre Channel across consolidated Ethernet networks.  FcoE relies on the DCB standards referenced above.  FCoE encapsulates full Fibre Channel frames inside Ethernet Jumbo Frame payloads.  Utilizing jumbo frames ensure that the FC frame is not fragmented or changed in any way.  The FCoE and DCB standards provide a robust tool set for consolidating existing Fibre Channel workloads on shared 10GE networks while providing the lossless, in-order delivery SCSI expects.  FCoE does not modify the existing Fibre Channel protocol suite and allows for the same management model including zoning, LUN masking, etc.  FCoE has started gaining ground over the last two years pushed by several large hardware vendors in the storage, network, and server markets.  For more information on FCoE see my post (http://www.definethecloud.net/?p=80.)

Common Internet File System (CIFS):

CIFS is a file based storage system based on Small Message block (SMB.)  This is a shared storage protocol typically used in Microsoft environments for file sharing.  Windows-based file shares rely on CIFS as the transfer protocol of the file level data.  File based storage relies on an underlying files system such as FAT32, XFS, NTFS or otherwise which differs from block based storage which does not.  File level storage is an excellent medium for some applications but is not traditionally effective in others.  When an application needs direct block access to disk file based storage is not appropriate.  Deployments that fall into this category include some databases and most Operating Systems.

Network File System (NFS):

NFS is another file based storage protocol.  NFS is traditionally used in Linux and Unix environments.  NFS is also a widely used protocol for VMware environments and can offer several benefits for virtual machine storage.  As a file based storage protocol NFS experiences many of the same limitations as stated for CIFS above.

Hyper Text Transfer Protocol (HTTP) and others:

When the cloud discussion leaves the data center (private/internal cloud) and moves up to the service provider level such as Google, Amazon, or the TelCos the protocols listed above may not have the necessary scalability.  When you begin talking about supporting thousands of customers with multiple Terabytes each, traditional storage protocols may not suffice.  It has to do with both the scalability of the systems and the administration of the disk.  iSCSI and FC both require a fair amount of management for the RAID, volumes, and LUNs, whereas CIFS and NFS require a fair amount for the security and volumes.  Protocols such as HTTP based storage are being used to simplify storage configuration and increase its scalability.

Which is the right protocol to use when moving to the cloud?  Obviously there is only one answer!  As always in IT ‘it depends.’  Each protocol has it’s uses, benefits and drawbacks.  The most important thing to remember is that most environments can benefit from more than one or all of these protocols.  Every application is different and any given protocol may have advantages for a particular app.  The only universal truth in cloud storage is that protocol flexibility will be key.

GD Star Rating
loading...

Virtualization

While not a new concept virtualization has hit the main stream over the last few years and become a uncontrollable buzz word driven by VMware, and other server virtualization platforms.  Virtualization has been around in many forms for much longer than some realizes, things like Logical partitions (LPAR) on IBM Mainframes have been around since the 80’s and have been extended to other non-mainframe platforms.  Networks have been virtualized by creating VLANs for years.  The virtualization term now gets used for all sorts of things in the data center.  like it or love the term doesn’t look like it’s going away anytime soon.

Virtualization in all of its forms is a pillar of Cloud Computing especially in the private/internal cloud architecture.  To define it loosely for the purpose of this discussion let’s use ‘The ability to divide a single hardware device or infrastructure into separate logical components.

Virtualization is key to building cloud based architectures because it allows greater flexibility and utilization of the underlying equipment.  Rather than requiring  separate physical equipment for each ‘Tenant’ multiple tenants can be separated logically on a single underlying infrastructure.  This concept is also known as ‘multi-tenancy.’  Depending on the infrastructure being designed a tenant can be an individual application, internal team/department, or external customer.  There are three areas to focus on when discussing a migration to cloud computing, servers, network, and storage.

Server Virtualization:

Within the x86 server platform (typically the Windows/Linux environment.) VMware is the current server virtualization leader.  Many competitors exist such as Microsoft’s HyperV and Zen for Linux, and they are continually gaining market share.  The most common server virtualization allows a single physical server to be divided into logical subsets by creating virtual hardware, this virtual hardware can then have an Operating System and application suite installed and will operate as if it were an independent server.  Server virtualization comes in two major flavors, Bare metal virtualization and OS based virtualization.

Bare metal virtualization means that a lightweight virtualization capable operating system is installed directly on the server hardware and provides the functionality to create Virtual Servers.  OS based virtualization operates as an application or service within an OS such as Microsoft Windows that provides the ability to create virtual servers.  While both methods are commonly used Bare Metal virtualization is typically preferred for production use due to the reduced overhead involved.

Server virtualization provides many benefits but the key benefits to cloud environments are: increased server utilization, and operational flexibility.  Increased utilization means that less hardware is required to perform the same computing tasks which reduces overall cost.  The increased flexibility of virtual environments is key to cloud architectures.  When a new application needs to be brought online it can be done without procuring new hardware, and equally as important when an application is decommissioned the physical resources are automatically available for use without server repurposing.  Physical servers can be added seamlessly when capacity requirements increase.

Network Virtualization:

Network virtualization comes in many forms.  VLANs, LSANs, VSANs allow a single physical  LAN or SAN architecture to be carved up into separate networks without dependence on the physical connection.  Virtual Routing and Forwarding (VRF) allows separate routing tables to be used on a single piece of hardware to support different routes for different purposes.  Additionally technologies exist which allow single network hardware components to be virtualized in a similar fashion to what VMware does on servers.  All of these tools can be used together to provide the proper underlying architecture for cloud computing.  The benefits of network virtualization are very similar to server virtualization, increased utilization and flexibility.

Storage Virtualization:

Storage virtualization encompasses a broad range of topics and features.  The term has been used to define anything from the underlying RAID configuring and partitioning of the disk to things like IBMs SVC, and NetApp’s V-Series both used for managing heterogeneous storage.  Without getting into what’s right and wrong when talking about storage virtualization, let’s look at what is required for cloud.

First consolidated storage itself is a big part of cloud infrastructures in most applications.  Having the data in one place to manage can simplify the infrastructure, but also increases the feature set especially when virtualizing servers.  At a top-level looking at storage for cloud environments there are two major considerations: flexibility and cost.  The storage should have the right feature set and protocol options to support the initial design goals, it should also offer the flexibility to adapt as the business requirements change.  Several vendors offer great storage platforms for cloud environments depending on the design goals and requirements.  Features that are typically useful for the cloud (and sometimes lumped into virtualization) are:

De-Duplication – Maintaining a single copy of duplicate data, reducing overall disk usage.

Thin-provisioning – Optimizes disk usage by allowing disks to be assigned to servers/applications based on predicted growth while consuming only the used space.  Allows for applications to grow without pre-consuming disk.

Snapshots – Low disk use point in time record which can be used in operations like point-in-time restores.

Overall virtualization from end-to-end is the foundation of cloud environments, allowing for flexible high utilization infrastructures.

GD Star Rating
loading...

Consolidated I/O

Consolidated I/O (input/output) is a hot topic and has been for the last two years, but it’s not a new concept.  We’ve already consolidated I/O once in the data center and forgotten about it, remember those phone PBXs before we replaced them with IP Telephony?  The next step in consolidating I/O comes in the form of getting management traffic, backup traffic and storage traffic from centralized storage arrays to the servers on the same network that carries our IP data.  In the most general terms the concept is ‘one wire.’  ‘Cable Once’ or ‘One Wire’ allows a flexible I/O infrastructure with a greatly reduced cable count and a single network to power, cool and administer.

Solutions have existed and been used for years to do this, iSCSI (SCSI storage data over IP networks) is one tool that has been commonly used to do this.  The reason the topic has hit the mainstream over the last 2 years is that 10GB Ethernet was ratified and we now have a common protocol with the proper bandwidth to support this type of consolidation.  Prior to 10GE we simply didn’t have the right bandwidth to effectively put everything down the same pipe.

The first thing to remember when discussing I/O consolidation is that contrary to popular belief I/O consolidation does not mean Fibre Channel over Ethernet (FCoE.)  I/O consolidation is all about using a single infrastructure and underlying protocol to carry any and all traffic types required in the data center.  The underlying protocol of choice is 10G Ethernet because it’s lightweight, high bandwidth and Ethernet itself is the most widely used data center protocol today.  Using 10GE and the IEEE standards for Data Center bridging (DCB) as the underlying data center network, any and all protocols can be layered on top as needed on a per application basis.  See my post on DCB for more information (http://www.definethecloud.net/?p=31.)These protocols can be FCoE, iSCSI, UDP, TCP, NFS, CIFS, etc. or any combination of them all.

If you look at the data center today most are already using a combination of these protocols, but typically have 2 or more separate infrastructures to support them.  A data center that uses Fibre Channel heavily has two Fibre Channel networks (for redundancy) and one or more LAN networks. These ‘Fibre Channel shops’ are typically still using additional storage protocols such as NFS/CIFS for file based storage.  The cost of administering, powering, cooling, and eventually upgrading/refreshing these separate networks continues to grow.

Consolidating onto a single infrastructure not only provides obvious cost benefits but also provides the flexibility required for a cloud infrastructure.  Having a ‘Cable Once’ infrastructure allows you to provide the right protocol at the right time on an application basis, without the need for hardware changes.

Call it what you will I/O Consolidation, Network Convergence, or Network Virtualization, a cable once topology that can support the right protocol at the right time is one of the pillars of cloud architectures in the data center.

GD Star Rating
loading...