Private Cloud Lessons You Can't Learn From Amazon, Google, Etc

When discussing private cloud, questions often come up about how the big boys are doing it; Google, Amazon, Microsoft, etc. The thinking is that the large scale data centers they are running can teach us lessons about smaller scale infrastructure for private clouds, which, on the surface, seems to make sense. Take the lessons learned in big data centers about scale, efficiency and reliability and apply them to smaller scale private cloud deployments. This method is not however without problems. Very little of what the large public cloud providers do is actually applicable to a private cloud. The reason for this is twofold: scale and application. Read the full post on Network Computing: http://www.networkcomputing.com/private-cloud/232602398

Thought Experiment - Forget ROI

Boys and girls, today's homework assignment is a thought experiment. I want you all to put yourselves in the shoes of the CxO team making a decision to move to private cloud. There is of course one catch; you may not factor in ROI. We're dropping ROI because it clouds the subject (bad pun intended.) Let's skip the why should I do this experiment; I'd of course default to 'Because I told you so.' To read the full story click here.

Ignoring BYOD Spells Disaster!

Last week in a blog titled "BYOD--Bring Your Own Disaster," I urged caution and scope for BYOD projects. This week I'm playing devil's advocate with myself. A conversation with Greg Knieriemen (@knieriemen) got me thinking of the consequences of ignoring BYOD. Let's dive into the risk of burying your head in the sand and ignoring the BYOD push.  To see the full article visit: http://www.networkcomputing.com/private-cloud/232300959.

Horton Hears Hadoop

I'm feeling Seuss-ish so here goes (Line 1 and 2 by Ken Oestreich @fountnhead.)

Of this poem you should first realize, of course,

Is based on Big Data, and code open-source.

On disk that was spinning sat data quite large

So much that in fact it would fill up a barge.

This data had value.  To realize it hard.

The data named Horton.  His contents were barred.

You see to run queries, we needed some help,

Then one day from Yahoo came a very faint yelp.

I've got it said Yahoo, we call it Hadoop!

Just give us a minute, we'll give you the scoop.

With this new fangled tool, value we'll recoup.

So Horton sat patient, while Yahoo did tell.

Of a man named Doug Cutting, here we will dwell.

Horton, you are so large your values obtuse.

But we can fix that, with a tool MapReduce.

This tool comes from Google, it's really quite great.

With it and Apache, your value awaits.

We'll take your large size, distribute it broadly.

Place it on servers, with scale of an army.

Each will have data that sits there quite local.

Data divided and sent as a parcel.

You see with this method my very large friend.

We'll run great queries watch your value transcend.

Task Trackers / Data Nodes will do all the work.

You'll be the big hero, no longer the jerk.

With Name Node in charge of tracking the data.

Job Tracker oversees slaves alpha to zeta.

The workload is spread, we parallel process.

To make some sense of this big data nonsense.

With the power of scale, the smallest of all,

Can still have a seat at the processing ball.

They'll all work in tandem to help sort you out.

And this my friend, is what Hadoop is about.

The Idle Cycle Conundrum

One of the advantages of a private cloud architecture is the flexible pooling of resources that allows rapid change to match business demands. These resource pools adapt to the changing demands of existing services and allow for new services to be deployed rapidly. For these pools to maintain adequate performance, they must be designed to handle peak periods and this will also result in periods with idle cycles… To see the full article visit Network Computing: http://www.networkcomputing.com/private-cloud/231903031.

Blades are Not the Future

Kevin Houston, Founder of Blades Made Simple and all around server and blade rocket surgeon, posted an excellent thought provoking article titled ‘Why Blade Servers Will Be the Core of Future Data Centers ( http://bladesmadesimple.com/2011/10/why-blade-servers-will-be-the-core-of-future-data-centers/.)  The article is his predictions and thoughts on the way in which the server industry will move.  Kevin walks through several stages of blade server evolution he believes could be coming.

  1. Less I/O expansion, basically less switching infrastructure required in the chassis due to increased bandwidth.
  2. More on-board storage options, possibly utilizing the space reclaimed from I/O modules.
  3. External I/O expansion options such as those offered by Aprius and Xsigo,
  4. Going fully modular at the rack-level,extending the concept of a blade chassis to rack size and add shelves of PCIe, storage and servers.

I jokingly replied to him that he’d invented the ‘rack-mount’ server, as in the blades are not in a blade chassis, but inserted into a rack, access external storage in the same rack and have connections to shared resources (PCIe) in that rack.  The reality is Kevin’s vision is closer to a mainframe than a rack-mount.

Overall while I enjoyed Kevin’s post for the thought experiment I think his vision of the data center future is way off from where we’re headed.  Starting off I don’t think that blades are the solution for every problem now.  I’ve previously summarized my thoughts on that, and some bad Shakespeare prose, in a blog on my friend Thomas Jones site: http://niketown588.com/2010/09/08/to-blade-or-not-to-blade/.  Basically stating that blades aren’t the right tool for every job.

Additionally I don’t see blades as the long-term future of enterprise and above computing.  I look to the way Microsoft, Google, Amazon and Facebook do their computing as the future, cheap commodity rack-mounts in mass.  I see the industry transitioning this way.  Blades (as we use them today) don’t hold water in that model due to cost, complexity, proprietary nature, etc.  Blades are designed to save space and they’re built to be highly available, as we start to build our data centers to scale and our applications with more reliability designing them for cloud platforms, highly available server hardware becomes irrelevant.  No service is lost when one of the thousands of servers handling Bing search fails, a new server is put in its place and joins the pool of available resources.

If blades, or some transformation of them, were the future I don’t see it playing out the same way as Kevin does.  I think Kevin’s end concept is built on a series of shaky assumptions: external I/O appliances, and blade chassis storage.

Let’s start with chassis based storage (i.e. shared storage in the blade chassis.  This is something I’ve never been a fan of as it limits access of the shared disk to a single chassis, meaning 14 blades max… wait, less than 14 blades because it uses blade slots to provide disk.  In very small scale this may make sense because you have an ‘all-in-one’ chassis, but the second you outgrow that (oops my business got bigger) you’re now stuck with small silos of data.

The advantage of this approach however is the low-latency access and the high bandwidth availability across the blade back/mid-plane.  This makes this a more interesting option now with lightning fast SSDs and cache options.  Now you can have extremely high performance storage within the blade chassis which provides a lot of options for demanding applications.  In these instances local storage in the chassis will be a big hit, but it will not be the majority of deployments without additional features such as EMC’s ‘Project Lightning’ (http://www.emc.com/about/news/press/2011/20110509-05.htm) to free the trapped data from the confines of the chassis.

Next we have external I/O appliances… These have been on my absolute hate list since the first time I saw them.  Kevin suggests a device based on industry standards but current versions are fully proprietary and require not only the vendors appliance but also the vendors cards in either the appliance or the server, this is the first nightmare.  Beyond that these devices create a single-point-of-failure for the I/O devices of multiple servers, and run directly in the I/O path.  Your basically adding complexity, cost and failure points, and for what?  Let’s look at that:

From Aprius’s perspective ‘Aprius PCI Express over Ethernet technology extends a server's internal PCIe bus out over the Ethernet network, enabling groups of servers to access and share network-attached pools of PCIe Express devices, such as flash solid state storage, at PCIe performance levels (www.aprius.com.) I’d really like to know how you get ‘PCIe performance levels’ over Ethernet infrastructure???

And from Xsigo: ’In the Xsigo wire-once infrastructure you connect each server to the I/O Director via a single cable. Then connect networks and storage to the I/O Director. You're now ready to provision Ethernet and Fibre channel connections from servers to data center resource in real time (http://xsigo.com/.)’ Basically you plug all your I/O into their server/appliance then cable it to your server via Infiniband or Ethernet, why???  You’re adding a device in-band in order to consolidate storage and LAN traffic?  FCoE, NFS, iSCSI, etc. already do that on standards based 10GE or 40GE and with no in-band appliance.

Kevin mentions this as a way to allow more space in the blades for future memory and processor options.  This makes sense as HP, IBM, Dell and Sun designs have already run into barriers with the height of their blades restricting processor options.  This is because the blade size was designed years ago and didn’t account for today’s larger processors/heat sinks.  Their only workaround is utilizing two blade slots which consumes too much space per blade.  Newer blade architectures like Cisco UCS take modern processors into account and don’t have this limitation, so don’t require I/O offloading to free space.

Lastly I/O offloading as a whole just stinks to me.  You still have to get the I/O into the server right?  Which means you’ll still have I/O adapters of some type in the server.  With 40GE to the blade shipping this year why would you require anything else?  GPU and cache storage argument?  Sure go that direction and then explain to me why you’d want to pull those types of devices off the local PCIe bus and use them remotely adding latency?

Finally to end my rant a rack size blade enclosure presents a whole lot of lock-in.  You’re at the mercy of the vendor you purchase it from for new hardware and support until it’s fully utilized.  Sounds a lot like the reason we left main frames for x86 in the first place doesn’t it?

Thoughts, corrections comments and sheer hate mail always appreciated!  What do you think?

Choosing The Right Private Cloud Storage

One of the key decisions in architecting an infrastructure for private cloud is selecting a storage platform for the deployment. Storage is a key component of the infrastructure and will play a major role in the overall performance of the private cloud. The storage decision carries additional weight due to its larger investment and typically longer refresh-cycle…..

To view the full article visit Network Computing: http://www.networkcomputing.com/private-cloud/231901384

WWT's Geek Day 2012

A BrightTALK Channel

Private Cloud: It’s Not About ROI

Most private cloud discussions revolve around the return on investment of the architecture. Many discussions begin and quickly end with ROI. The reason is that ROI is very difficult to show in real numbers for any IT investment, but more so when the majority of the costs are soft costs.

ROI is an important factor and can’t be left out of discussions, but it’s not the only factor and likely not the most important factor.

To read the rest see the blog on Network Computing (no registration required): http://www.networkcomputing.com/private-cloud/231601280

How to Boost Cloud Reliability

Clouds fail. That’s a fact. But if your company uses business apps that are tied to the availability of public cloud services, you can—and must—take steps to mitigate these failures by getting schooled on a few key factors:  service-level agreements (SLAs), redundancy options, application design, and the type of service being used. We’ll outline how these factors affect the availability of your applications in the cloud…

 

Read my full article in the August issue of Network Computing (For IT by IT) (Requires a free registration, my apologies.)

http://www.informationweek.com/nwcdigital/nwcaug11?k=nwchp&cid=onedit_ds_nwchp