HP’s FlexFabric

There were quite a few announcements this week at the HP Technology Forum in Vegas.  Several of these announcements were extremely interesting, of these the ones that resonated the most with me were:

Superdome 2:

I’m not familiar with the Superdome 1 nor am I in any way an expert on non x86 architectures.  In fact that’s exactly what struck me as excellent about this product announcement.  It allows the mission critical servers that a company chooses to, or must run on non x86 hardware to run right alongside the more common x86 architecture in the same chassis.  This further consolidates the datacenter and reduces infrastructure for customers with mixed environments, of which there are many.  While there is a current push in some customers to migrate all data center applications onto x86 based platforms, this is not: fast, cheap, or good for every use case.  Superdome 2 provides a common infrastructure for both the mission critical applications and the x86 based applications.

For a more technical description see Kevin Houston’s Superdome 2 blog: http://bladesmadesimple.com/2010/04/its-a-bird-its-a-plane-its-superdome-2-on-a-blade-server/.

Note: As stated I’m no expert in this space and I have no technical knowledge of the Superdome platform, conceptually it makes a lot of sense and it seems like a move in the right direction.

Common Infrastructure:

There was a lot of talk in some of the key notes about common look feel and infrastructure of the separate HP systems (storage, servers, etc.)  At first I laughed this off as a ‘who cares’ but then I started to think about it.  If HP takes this message seriously and standardizes rail kits, cable management, components (where possible), etc. this will have big benefits for administration and deployment of equipment.

If you’ve never done a good deal of racking/stacking of data center gear you may not see the value here, but I spent a lot of time on the integration side with this as part of my job.  Within a single vendor (or sometimes product line) rail kits for server/storage, rack mounting hardware, etc can all be different.  This adds time and complexity to integrating systems and can sometimes lead to less than ideal systems.  For example the first vBlock I helped a partner configure (for demo purposes only) had the two UCS systems stacked on top of one another on the bottom of the rack with no mounting hardware.  The reason for this was the EMC racks being used had different rail mounts than the UCS system was designed for.  Issues like this can cause problems and delays, especially when the people in charge of infrastructure aren’t properly engaged during purchasing (very common.)

Overall I can see this as a very good thing for the end user.

HP FlexFabric

This is the piece that really grabbed my attention while watching the constant Twitter stream of HP announcements.  HP FlexFabric brings network consolidation to the HP blade chassis.  I specifically say network consolidation, because HP got this piece right.  Yes it does FCoE, but that doesn’t mean you have to.  FlexFabric provides the converged network tools to provide any protocol you want over 10GE to the blades and split that out to separate networks at a chassis level.  Here’s a picture of the switch from Kevin Houston’s blog: http://bladesmadesimple.com/2010/06/first-look-hps-new-blade-servers-and-converged-switch-hptf/.

HP Virtual Connect FlexFabric 10Gb/24-Port Module

The first thing to note when looking at this device is that all the front end uplink ports look the same, so how do they split out Fibre Channel and Ethernet?  The answer is Qlogic (the manufacturer of the switch) has been doing some heavy lifting on the engineering side.  They’ve designed the front end ports to support the optics for either Fibre Channel or 10GE.  This means you’ve got flexibility in how you use your bandwidth.  The ability to do this is an industry first, although the Cisco Nexus 5000 hardware ASIC is capable and has been since FCS it was implemented on a per-module basis rather than per-port basis like this switch. 

The next piece that was quite interesting and really provides flexibility and choice to the HP FlexFabric concept is their decision to use Emulex’s OneConnect adapter as the LAN on Motherboard (LOM.)  This was a very smart decision by HP.  Emulex’s OneConnect is a product that has impressed me from square one, it shows a traditionally Fibre Channel company embracing the fact that Ethernet is the future of storage but not locking the decision into an Upper Layer protocol (ULP.)  OneConnect provides 10GE connectivity, TCP offload, iSCSI offload/boot, and FCoE capability all on the same card, now that’s a converged network!  HP seems to have seen the value there as well and built this into the system board. 

Take a step back and soak that in, LOM has been owned by Intel, Broadcom, and other traditional NIC vendors since the beginning.  Emulex until last year was looked at as one of two solid FC HBA vendors.  As of this week HP announced the ousting of the traditional NIC vendor for a traditional FC vendor on their system board.  That’s a big win for Emulex.  Kudos to Emulex for the technology (and business decisions behind it) and to HP for recognizing that value.

Looking a little deeper the next big piece of this overall architecture is that the whole FlexFabric system supports HP’s FlexConnect technology which allows a server admin to carve up a single physical 10GE link into four logical links which are presented to the OS as individual NICs.

The only drawback I see to the FlexFabric picture is the fact that FCoE is only used within the chassis and split into separate networks from there.  This can definitely increase the required infrastructure depending on the architecture.  I’ll wait to go to deep into that until I hear a few good lines of thinking on why that direction was taken.


HP had a strong week in Vegas, these were only a few of the announcements, several others including mind blowing stuff from HP labs (start protecting John Conner now) can be found on blogs and HP’s website.  Of all of the announcements FlexFabric was the one that really caught my attention.  It embraces the idea of I/O consolidation without clinging to FCoE as the only way to do it and it greatly increases the competitive landscape in that market which always benefits the end-user/customer.

Comments, corrections, bitches moans, gripes and complaints all welcome.

GD Star Rating

Collapsing Server Management Points with UCS

I was invited to post a blog on the WWT virtualization blog, I chose to discuss Cisco Pass-Through switching with UCS.  See the blog there at: http://vblog.wwtlab.com/2010/06/22/collapsing-server-management-points-with-ucs/.

There are several other great posts there you should take a look at.

GD Star Rating

Building a Hybrid Cloud

At a recent Data Center Architect summit I attended cloud computing was a key focus.  Of the concepts that were discussed one that was a recurring theme was Hybrid Clouds.  Conceptually a Hybird-Cloud is a mix of any two cloud types, typically thought of as a mix of a Private Cloud and Public Cloud services.  For more information on the cloud types see my previous post on the subject (http://www.definethecloud.net/?p=109.)  There are several great use cases for this type of architecture, the two that resonate most with me are:

Cloud Bursting: 

Not to be confused with the psychokinesis exercise from “The Men Who Stare At Goats.”  Cloud Bursting is the ability to utilize public cloud resources for application burst requirements during peak periods.  This allows a company to maintain performance during expected or unexpected peak periods without maintaining additional hardware.  Simply said on-demand capacity.  This allows companies with varying workloads to maintain core processing in house and burst into the cloud for peaks.

Disaster Recovery / Business Continuity:

Business continuity is a concern for customers of all shapes and sizes but can be extremely costly to implement well.  For the companies that don’t have the budget of a major oil company, bank, etc. maintaining a DR site is typically out of the question.  Lower cost solutions/alternatives exist but the further down the spectrum you move the less data/capability you’ll recover and the longer it will take to do that.  In steps cloud based DR/Business continuity services.  Rather than maintaining your own DR capabilities you contract the failover out to a company that maintains multi-tenant infrastructure for that purpose and specializes in getting your business back online quickly.

Overall I’m an advocate for properly designed hybrid clouds as they provide the ability to utilize cloud resources while still maintaining control of the services/data you don’t want in the cloud.  Even more appealing from my perspective is the ability to use private and hybrid-clouds as a migration strategy for a shift to fully public cloud based IT infrastructure.  If you begin building your applications for in-house cloud infrastructures you’ll be able to migrate them more easily to public clouds.  There are also tools available to use your private cloud exactly as some major public cloud providers do to make that transition even easier.

We also thoroughly covered barriers to adoption for hybrid cloud architectures.  Most of the considerations were the usual concerns:

  • Compliance
  • Security
  • Performance
  • Standardization
  • Service Level Agreements (SLA)

There were two others discussed that I see as the key challenges: Organizational and cloud lock-in.


In my opinion organizational challenges are typically the most difficult to overcome when adopting new technology.  If the organization is on board and properly aligned to the shift they will find solutions for the technical challenges.  Cloud architectures of all types require a change in organizational structure.  Companies that attempt to move to cloud architecture without first considering the organizational structure will at best have a painful transition and at worst fail and pull back to silo data centers.  I have a post covering some of the organizational challenges in more detail (http://www.definethecloud.net/?p=122.)

Cloud Lock-In:

Even more interesting was the concept of not just moving applications and services into the cloud, but also being able to move out.  This is a very interesting concern because it means that cloud computing has progressed in the acceptance stages. Customers and architects have moved past whether migration to cloud will happen and how applications will be migrated onto how do I get them back if I want them?  There are several times when this may become important:

  • Service does not perform as expected
  • Cloud provider runs into business problems (potential for bankruptcy, etc.)
  • Your application outgrows what your provider can deliver
  • Compliance or regulations change
  • etc.

In order for the end-user of cloud services to be comfortable migrating applications into the cloud they need to be confident that they can get them back if/when they want them.  Cloud service providers who make their services portable to other vendor offerings will gain customers more quickly and most likely maintain them longer.


Cloud computing still has several concerns but none of them are road-blocks.  A sound strategy with a focus on analyzing and planning for problems will help ensure a successful migration.  The one major thing I gained from the discussion this week was that cloud has moved from an argument of should we/shouldn’t we to how do we make it happen and ensure it’s a smooth transition.

GD Star Rating

Recent Conversation with the Founder of Tolly Group

This morning I had a brief and interesting exchange with the founder of the Tolly Group Kevin Tolly.  Because Kevin sent this exchange to my blogging email account my assumption was that he intended it to be added to the blog.  Kevin was responding to the tweets posted below above the email exchange, he was even kind enough to correct my spelling which was quite convenient, I’ll be sure to send my future emails to him first for editing.  I’ve reordered the emails top to bottom in chronological order following the tweets that prompted the exchange.

Original tweets:

RT @********: New post summarizing a new Tolly Group report http://bit.ly/d1EFaJ < enough with the @TollyGroup #failures Seriously

@jonisick Let me know what specifically you have issues with in this http://bit.ly/d1EFaJ#analystsprofiling #failure

@******** I haven’t even looked yet but thanks to @tollygroups open bias total technical inaccuracy and HP funding I don’t have to. #bs

@******** HP makes some great products, go have them tested by a real non-biased analyst. @tollyGroup is an absolute #joke 

The person I was conversing with is extremely technically savvy and I was also not questioning the product.  Instead I’m questioning the value of anything published by Tolly after seeing the recent reports they’ve released on UCS which have been shredded repeatedly by engineers, bloggers, and vendors.  Remember these are my opinions and posted on my personal blog as such.

The exchange with Tolly’s founder:

Dear Sir,

It has come to my attention that you have certain unspecified issues with our recently-published report on HP’s X3820 offering. Would you be so kind as to advise me on the specific issues you have with this report?  http://www.tolly.com/DocDetail.aspx?DocNumber=210122


Kevin Tolly




I have not looked at the specific report yet, and most likely will not. Tolly has proven through a series of recent reports to be extremely biased and technically innacurate. HP is spending quite a bit of money with Tolly to test specific equipment in specific ways to show their strengths and others weaknesses.

After digging through the HP funded UCS reports from Tolly it’s obvious that Tolly is willing to be as innacurate as the paying customer would like and test equipment that they have not thoroughly taken the time to understand.

Tests like your recent set completely tarnish any idea of independant accurate testing from Tolly, and because of that I only read the reports when I need to speak to the innacuracies.



Dear Mr Onisick,

I repeat my request. You have made public comments about a specific report. If you believe there to be inaccurate information in that report, I would be pleased to review your concerns.

Otherwise, I would think it prudent to refrain from criticizing a report that you admit that you have not read. Please also be aware that public comments such as yours can be viewed as libelous.


Kevin Tolly

P.S. Please spell check: independent and inaccurate are spelled inaccurately in your message.



I greatly appreciate the threat, thanks. Please take a look back at my comments and you’ll notice I made no specific comments about that particular report. I’m sure both your time and your lawyers time can be best spent elsewhere.



People need to find better things to do with their time.  If you’d like a great overview of what vendor funded independent testing is actually worth take a look at Dave Alexander’s blog on the subject: http://www.unifiedcomputingblog.com/?p=161.

GD Star Rating

UCS Server Failover

I spent the day today with a customer doing a proof of concept and failover testing demo on a Cisco UCS, VMware and NetApp environment.  As I sit on the train heading back to Washington from NYC I thought it might be a good time to put together a technical post on the failover behavior of UCS blades.  UCS has some advanced availability features that should be highlighted, it additionally has some areas where failover behavior may not be obvious.  In this post I’m going to cover server failover situations within the UCS system, without heading very deep into the connections upstream to the network aggregation layer (mainly because I’m hoping Brad Hedlund at http://bradhedlund.com will cover that soon, hurry up Brad 😉

**Update** Brad has posted his UCS Networking Best Practices Post I was hinting at above.  It’s a fantastic video blog in HD, check it out here: http://bradhedlund.com/2010/06/22/cisco-ucs-networking-best-practices/

To start this off let’s get up to a baseline level of understanding on how UCS moves server traffic.  UCS is comprised of a number of blade chassis and a pair of Fabric Interconnects (FI.)  The blade chassis hold the blade servers and the FIs handle all of the LAN and SAN switching as well as chassis/blade management that is typically done using six separate modules in each blade chassis in other implementations.

Note: When running redundant Fabric interconnects you must configure them as a cluster using L1 and L2 cluster links between each FI.  These ports carry only cluster heartbeat and high-level system messages no data traffic or Ethernet protocols and therefore I have not included them in the following diagrams.

UCS Network Connectivity


Each individual blade gets connectivity to the network(s) via mezzanine form factor I/O card(s.)  Depending on which blade type you select  each blade will either have one redundant set of connections to the FIs or two redundant sets.  Regardless of the type of I/O card you select you will always have 1x10GE connection to each FI through the blade chassis I/O module (IOM.)

UCS Blade Connectivity

image In the diagram your seeing the blade connectivity for a blade with a single mezzanine slot.  You can see that the blade is redundantly connected to both Fabric A and Fabric B via 2x10GE links.  This connection occurs via the IOM which is not a switch itself and instead acts as a remote device managed by the fabric interconnect. What this means is that all forwarding decisions are handled by the FIs and frames are consistently scheduled within the system regardless of source and or destination.  The total switching latency of the UCS system is approximately equal to a top-of-rack switch or blade form factor LAN switch within other blade products.  Because the IOM is not making switching decisions it will need another method to move 8 internal mid-plane ports traffic upstream using it’s 4 available uplinks.  the method it uses is static pinning.  This method provides a very elegant switching behavior with extremely predictable failover scenarios. Let’s first look at the pinning later what this means for the UCS network failures.

Static Pinningimage

The chart above shows the static pinning mechanism used within UCS.  Given the configured number of uplinks from IOM to FI you will know exactly which uplink port a particular mid-plane port is using.  Each half-width blade attaches to a single mid-plane port and each full width blade attaches to two.  In the diagram the use of three ports does not have a pinning mechanism because this is not supported.  If three links are used the 2 port method will define how uplinks are utilized.  This is because eight devices cannot be evenly load-balanced across three links.

IOM Connectivity


The example above shows the numbering of mid-plane ports.  If you were using half width blades their numbering would match.  When using full-width blades each blade has access to a pair of mid-plane ports (1-2, 3-4, 5-6, 7-8.)In the example above blade three would utilize mid-plane port three in the left example and one in the second based on the static pinning in the chart.

So now let’s discuss how failover happens, starting at the operating system.  We have two pieces of failover to discuss, NIC teaming, and SAN multi-pathing.  In order to understand that we need a simple logical connectivity view of how a UCS blade see’s the world.

UCS Logical Connectivity


In order to simplify your thinking when working with blade systems reduce your logical diagram to the key components, do this by removing the blade chassis itself from the picture.  Remember that a blade is nothing more than a server connected to a set of switches, the only difference is that the first hop link is on the mid-plane of the chassis rather than a cable.  The diagram above shows that a UCS blade is logically cabled directly to redundant Storage Area Network (SAN) switches for Fibre Channel (FC) and to the FI for Ethernet.  Out of personal preference I leave the FIs out of the SAN side of the diagram because they operate in N_Port Virtualizer (NPV) mode which means forwarding decisions are handled by the upstream NPiV standard compliant SAN switch.

Starting at the Operating System (OS) we will work up the network stack to the FIs to discuss failover.  We will be assuming FCoE is being used, if you are not using FCoE ignore the FC piece of the discussion as the Ethernet will remain the same.

SAN Multi-Pathing:

SAN multi-pathing is the way we obtain redundancy in FC, FCoE, and iSCSI networks.  It is used to provide the OS with two separate paths to the same logical disk.  This allows the server to access the data in the event of a failure and in some cases load-balance traffic across two paths to the same disk.  Multi-pathing comes in two general flavors: active/active, or active passive.  Active/active load balances and has the potential to use the full bandwidth of all available paths.  Active/Passive uses one link as a primary and reserves the others for failover.  Typically the deciding factor is cost vs. performance.

Multi-pathing is handled by software residing in the OS usually provided by the storage vendor.  The software will monitor the entire path to the disk ensuring data can be written and/or read from the disk via that path.  Any failure in the path will cause a multi-pathing failover.

Multi-Pathing Failure Detection


Any of the failures designated by the X’s in the diagram above will trigger failover, this also includes failure of the storage controller itself which are typically redundant in an enterprise class array.  SAN multi-pathing is an end-to-end failure detection system.  This is much easier to implement in SAN as there is one constant target as opposed to a LAN where data may be sent to several different targets across the LAN and WAN.  Within UCS SAN multi-pathing does not change from the system used for standalone servers.  Each blade is redundantly connected and any path failure will trigger a failover.


NIC teaming is handled in one of three general ways: active/active load-balancing, active/passive failover, or active/active transmit with active/passive receive.  The teaming type you use is dependant on the network configuration.

Supported teaming Configurations


In the diagram above we see two network configurations, one with a server dual connected to two switches, and a second with a server dual connected to a single switch using a bonded link.  Bonded links act as a single logical link with the redundancy of the physical links within.  Active/Active load-balancing is only supported using a bonded link due to MAC address forwarding decisions of the upstream switch.  In order to load balance an active/active team will share a logical MAC address, this will cause instability upstream and lost packets if the upstream switches don’t see both links as a single logical link.  This bonding is typically done using the Link Aggregation Control Protocol (LACP) standard.

If you glance back up at the UCS logical connectivity diagram you’ll see that UCS blades are connected in the method on the left of the teaming diagram.  This means that our options for NIC teaming are Active/Passive failover and Active Active transmit only.  This is assuming a bare metal OS such as Windows or Linux installed directly on the hardware, when using virtualized environments such as VMware all links can be actively used for transmit and receive because there is another layer of switching occurring in the hypervisor. 

I typically get feedback that the lack of active/active NIC teaming on UCS bare metal blades is a limitation.  In reality this is not the case.  Remember Active/Active NIC teaming was traditionally used on 1GE networks to provide greater than 1GE of bandwidth.  This was limited to a max of 8 aggregated links for a total of 8GE of bandwidth.  A single UCS link at 10GE provides 20% more bandwidth than an 8 port active/active team.

NIC teaming like SAN multi-pathing relies on software in the OS, but unlike SAN multi-pathing it typically only detects link failures and in some cases loss of a gateway.  Due to the nature of the UCS system NIC teaming in UCS will detect failures of the mid-plane path, the IOM, the utilized link from the IOM to the Fabric Interconnect or the FI itself.  This is because the IOM is a linecard of the FI and the blade is logically connected directly to the FI.

UCS Hardware Failover:

UCS has a unique feature on several of the available mezzanine cards to provide hardware failure detection and failover on the card itself.  Basically some of the mezzanine cards have a mini-switch built in with the ability to fail path A to path B or vice versa.  This provides additional failure functionality and improved bandwidth/failure management.  This feature is available on Generation I Converged network Adapters (CNA) and the Virtual Interface Card (VIC) and is currently only available in UCS blades.

UCS Hardware Failover


UCS Hardware failover will provide greater failure visibility than traditional NIC teaming due to advanced intelligence built into the FI as well as the overall architecture of the system.  In the diagram above HW failover detects: mid-plane path, IOM and IOM uplink failures as link failures due to the architecture.  Additionally if the FI loses it’s upstream network connectivity to the LAN it will signal a failure to the mezzanine card triggering failure.  In the diagram above any failure at a point designated by an X will trigger the mezzanine card to divert Ethernet traffic to the B path.  UCS hardware failover applies only to Ethernet traffic as SAN networks are built as redundant independent networks and would not support this failover method.

Using UCS hardware failover provides two key advantages over other architectures:

  • Allows redundancy for NIC ports in separate subnets/VLANs which NIC teaming cannot do.
  • Provides the ability for network teams to define the failure capabilities and primary path for servers alleviating misconfigurations caused by improper NIC teaming settings.
    IOM Link Failure:

The next piece of UCS server failover involves the I/O modules themselves.  Each I/O module has a maximum of four 10GE uplinks providing 8x10GE mid-plane connections to the blades at an oversubscription of 1:1 to 8:1 depending on configuration.  As stated above UCS uses a static non-configurable pinning mechanism to assign a mid-plane port to a specific uplink from the IOM to the FI.  Using this pinning system allows the IOM to operate as an extension of the FI without the need for Spanning Tree Protocol (STP) within the UCS system.  Additionally this system provides a very clear network design for designing oversubscription in both nominal and failure situations.

For the discussion of IOM failover we will use an example of a max configuration of 8 half-width blades and 4 uplinks on each redundant IOM.

Fully Configured 8 Blade UCS Chassis

image In this diagram each blade is currently redundantly connected via 2x10GE links.  One link through each IOM to each FI.  Both IOMs and FIs operate in an active/active fashion from a switching perspective so each blade in this scenario has a potential bandwidth of 20GE depending on the operating system configuration.  The overall blade chassis is configured with 2:1 oversubscription in this diagram as each IOM is using its max of 4x10GE uplinks while providing its max of 8x10GE mid-plane links for the 8 blades.  If each blade were to attempt to push a sustained 20GE of throughput at the same time (very unlikely scenario) it would receive only 10GE because of this oversubscription.  The bandwidth can be finely tuned to ensure proper performance in congestion scenarios such as this one using Quality of Service (QoS) and Enhanced Transmission Selection (ETS) within the UCS system.

In the event that a link fails between the IOM and the FI the servers pinned to that link will no longer have a path to that FI.  The blade will still have a path to the redundant FI and will rely on SAN multi-pathing, NIC teaming and or UCS hardware failover to detect the failure and divert traffic to the active link.

For example if link one on IOM A fails blades one and five would lose connectivity through Fabric A and any traffic using that path would fail to link one on Fabric B ensuring the blade was still able to send and receive data.  When link one on IOM A was repaired or replaced data traffic would immediately be able to start using the A path again.

IOM A will not automatically divert traffic from Blade one and five to an operational link, nor is this possible through a manual process.  The reason for this is that diverting blade one and fives traffic to available links would further oversubscribe those links and degrade servers that should be unaffected by the failure of link one.  In a real world data center a failed link will be quickly replaced and the only servers that will have been affected are blade one and five. 

In the event that the link cannot be repaired quickly there is a manual process called re-acknowledgement which an administrator can perform.  This process will adjust the pinning of IOM to FI links based on the number of active links using the same static pinning referenced above.  In the above example servers would be re-pinned based on two active ports because three port configurations are not supported. 

Overall this failure method and static pinning mechanism provides very predictable bandwidth management as well as limiting the scope of impact for link failures.


The UCS system architecture is uniquely designed to minimize management points and maximize link utilization by removing dependence on STP internally.  Because of its unique design network failure scenarios must be clearly understood in order to maximize the benefits provided by UCS.  The advanced failure management tools within UCS will provide for increased application uptime and application throughput in failure scenarios if properly designed.

GD Star Rating

Building a Private Cloud

Private clouds are currently one of the most popular concepts in cloud computing.  They promise the flexibility of cloud infrastructures without sacrificing the control of owning and maintaining your own data center.  For a definition of cloud architectures see my previous blog on Cloud Types  (http://www.definethecloud.net/?p=109.)

Private clouds are an architecture that is owned by an individual company typically for internal use.  in order to be considered a true cloud architecture it must layer automation and orchestration over robust scalable architectures.  The intent of private clouds is the ability to have an infrastructure that reacts fluidly to business changes by scaling up and scaling down as applications and requirements change.  Typically consolidation and virtualization are the foundation of these architectures and advanced management, monitoring and automation systems are layered on top.  In some cases this can be taken a step further by loading cloud management software suites on the underlying infrastructure to provide an internal self service Software as a Service (SaaS) or Platform as a Service (PaaS) environment.

Private cloud architectures provide the additional benefit of being an excellent way to test the ability for a company to migrate to public cloud architecture.  Additionally if designed correctly private clouds also act as a migration step to public clouds by migrating applications onto cloud based platforms without exporting them to a cloud service host.  Private clouds can also be used in conjunction with public clouds in order to leverage public cloud resources for extra capacity, failover, or disaster recovery purposes.  This use is known as a hybrid cloud.

Private cloud architectures can be done in a roll-your-own fashion, selecting best of breed hardware, software, and services to build the appropriate architecture.  This can maximize the reuse of existing equipment while providing a custom tailored solution to achieve specific goals.  The drawback with roll-your-own solutions is that it requires extensive in-house knowledge in order to architect the solution properly.

A more common practice for migration into private clouds is to use packaged solutions offered by the major IT vendors, companies like IBM, Sun, Cisco, and HP have announced cloud or cloud-like architecture solutions and initiatives.  These provide a more tightly coupled solution, and in some cases a single point of contact and expertise on the complete solution.  These types of solutions can expedite your migration to the private cloud. 

When selecting the hardware and software for private cloud infrastructures ensure you do your homework.  Work with a trusted integrator or reseller with expertise in the area, gather multiple vendor proposals and read the fine print.  These solutions are not all created equal.  Some of the offered solutions are no more than vaporware and a good number are just repackaging of old junk in a shiny new part number.  Some will support a staged migration and others will require rip-and-replace or at least a new build out.

There are several key factors I would focus on when selecting a solution:

Compatibility and Support:

Tested compatibility and simplified support are key factors that should be considered when choosing a solution.  If you use products from multiple vendors that don’t work together you’ll need to tie the support pieces together in-house and may need to open and maintain several support tickets when things go awry.  Additionally if compatibility hasn’t been tested or support isn’t in place for a specific configuration you may be up a creek without a paddle when something comes up.

Flexibility vs. Guaranteed Performance:

Some of the available solutions are very strict on hardware types and quantities but in return provide performance guarantees that have been thoroughly tested. This is a trade off that must be considered.

Hardware subcomponents of the solution:

Building a private cloud is a large commitment to both architectural and organizational changes.  Real Return On Investment (ROI) won’t be seen without both.  When making that kind of investment you don’t want to end up with a subpar component of your infrastructure (software or hardware) because your vendor tried to bundle their best of breed X and best of breed Y with their so so Z.  Getting everything under one corporate logo has its pros and cons.

Hardware Virtualization and Abstraction:

A great statement I’ve heard about cloud computing was that when defining it if you start talking about hardware you’re already wrong (I don’t remember the source so if you know it please comment.)  This is because cloud is more about the process and people than the equipment.  When choosing hardware/software for private cloud keep this in mind.  You don’t want to end up with a private cloud that can’t flex because your software and process is tied to the architecture or equipment underneath.


Private cloud architectures provide a fantastic set of tools to regain control of the data center and turn it back into a competitive advantage rather than a hole to throw money into.  Many options and technologies exist to accelerate your journey to private cloud but they must be carefully assessed.  If you don’t have the in-house expertise but are serious about cloud there are lots of consultant and integrator options out there to help walk you through the process.

GD Star Rating