TCO and ROI: A Cadillac Perspective

This summer I traded in my beloved Toyota 4 runner for a Cadillac CTS V Black Diamond edition.  There was no issue with the 4 Runner and I owed nothing on it.  I was not in need of an extra payment, I just wanted the V.  Maybe it was a mid-life crisis, maybe it was something else.  In reality it was the car itself, something about a Corvette super car engine in iconic Cadillac luxury.  Something about a sports car that can take a stock Porsche off the line in a brand that most assume comes standard with a walker and a turn-signal that never turns off.  Something about sitting comfortably in an American made car that could dust the top German luxury sport cars at 20K less in price.  I love my decision.

When making my decision I went through a thought process very similar to assessing data center infrastructure (ok that’s a total lie it was a spur of the moment decision that went from thought to purchase in less then a week, but it could have gone something like the following.)

Step 1: The Requirements

My requirements for a new vehicle were:

  • Sports car or SUV based on personal taste.  (I want vendor X, or I don’t want vendor Y)
  • Creature Comforts: leather, heated seats, navigation.
    • Cooled seats, OnStar type service, Satellite radio preferred but optional.
  • Aesthetics, something that fit my tastes from a looks perspective.
    • Preferably something that not everyone on the road would be driving.
  • Reasonable maintenance costs/reliability.
  • Adequate trunk space, interior space. (this was going to be my primary vehicle.)
    • 4 Doors preferred for convenience.
  • New vehicle (no refurbished equipment)

Non factors (things that factor in to an average car purchase but were not important to me):

  • Gas-mileage
  • Safety rating
  • Brand / country of origin
  • etc.

Step 2: Narrowing the Playing Field

I narrowed down the vehicles I would be happy with to the following:

  • Jeep Grand Cherokee
  • BMW M5
  • Toyota 4 Runner
  • Land Rover Range Rover
  • Cadillac Escalade
  • Cadillac CTS-V sedan
  • Tesla Roadster

Step 3: ROI and TCO calculations

Next came reality, I had my options that met my requirements to varying degrees, now I had to look at costs.  Starting with initial price I placed the vehicles in order low to high.

  • Toyota 4 Runner
  • Jeep Grand Cherokee
  • Cadillac Escalade
  • Cadillac CTS-V sedan
  • BMW M5
  • Land Rover Range Rover 120K
  • Tesla Roadster 118K

Now in true solution assessment fashion I dropped the top and bottom price options from the running (Tesla and 4-Runner), leaving me with 5 options.  Next it was time to look at TCO.  TCO on a vehicle would include things like gas-mileage, fuel type, maintenance costs, reliability, etc.  Of my remaining options the Grand Cherokee and Escalade looked the best for reliability and maintenance.  I dropped the Range Rover at this point after hearing some horror stories on repair cost and frequency (I did not validate these claims as I don’t care to invest the time so I’m not saying they actually have these problems, hearing they did was enough to drop them with this many great options.)

Down to four options, two in the SUV category and 2 in the sports sedan category it was time to decide what type of vehicle was I most interested in.  Both types met my requirements but I’d been in an SUV for the last 4 years and was ready for a change.  The decision was down to two: CTS-V or the M5.

Between the two the Cadillac had the better maintenance plan, greater stock feature set (amazing what BMW considers extra on an M5) and was solidly the faster vehicle.  Additionally the Black Diamond special edition was exactly to my tastes for style, and would be much more unique on the road than an M5.  I ultimately settled on the Cadillac and have been very happy with the decision.  It wasn’t the lowest TCO of all of the options fitting my requirements, but it most closely fit my total requirements and optional features at an acceptable TCO.

Summary:

When making IT purchase decisions remember that cost isn’t everything.  All too often I work with people that are so wrapped up in TCO conversations they forget to assess what the business objectives of the infrastructure are.  Cheaper solutions that can’t properly deliver the services required by the business, or scale with growth are not better solutions.  Starr the process with the business, defining requirements and assessing possible solutions.  Leave cost to the end.

GD Star Rating
loading...

Blades are Not the Future

Kevin Houston, Founder of Blades Made Simple and all around server and blade rocket surgeon, posted an excellent thought provoking article titled ‘Why Blade Servers Will Be the Core of Future Data Centers ( http://bladesmadesimple.com/2011/10/why-blade-servers-will-be-the-core-of-future-data-centers/.)  The article is his predictions and thoughts on the way in which the server industry will move.  Kevin walks through several stages of blade server evolution he believes could be coming.

  1. Less I/O expansion, basically less switching infrastructure required in the chassis due to increased bandwidth.
  2. More on-board storage options, possibly utilizing the space reclaimed from I/O modules.
  3. External I/O expansion options such as those offered by Aprius and Xsigo,
  4. Going fully modular at the rack-level,extending the concept of a blade chassis to rack size and add shelves of PCIe, storage and servers.

I jokingly replied to him that he’d invented the ‘rack-mount’ server, as in the blades are not in a blade chassis, but inserted into a rack, access external storage in the same rack and have connections to shared resources (PCIe) in that rack.  The reality is Kevin’s vision is closer to a mainframe than a rack-mount.

Overall while I enjoyed Kevin’s post for the thought experiment I think his vision of the data center future is way off from where we’re headed.  Starting off I don’t think that blades are the solution for every problem now.  I’ve previously summarized my thoughts on that, and some bad Shakespeare prose, in a blog on my friend Thomas Jones site: http://niketown588.com/2010/09/08/to-blade-or-not-to-blade/.  Basically stating that blades aren’t the right tool for every job.

Additionally I don’t see blades as the long-term future of enterprise and above computing.  I look to the way Microsoft, Google, Amazon and Facebook do their computing as the future, cheap commodity rack-mounts in mass.  I see the industry transitioning this way.  Blades (as we use them today) don’t hold water in that model due to cost, complexity, proprietary nature, etc.  Blades are designed to save space and they’re built to be highly available, as we start to build our data centers to scale and our applications with more reliability designing them for cloud platforms, highly available server hardware becomes irrelevant.  No service is lost when one of the thousands of servers handling Bing search fails, a new server is put in its place and joins the pool of available resources.

If blades, or some transformation of them, were the future I don’t see it playing out the same way as Kevin does.  I think Kevin’s end concept is built on a series of shaky assumptions: external I/O appliances, and blade chassis storage.

Let’s start with chassis based storage (i.e. shared storage in the blade chassis.  This is something I’ve never been a fan of as it limits access of the shared disk to a single chassis, meaning 14 blades max… wait, less than 14 blades because it uses blade slots to provide disk.  In very small scale this may make sense because you have an ‘all-in-one’ chassis, but the second you outgrow that (oops my business got bigger) you’re now stuck with small silos of data.

The advantage of this approach however is the low-latency access and the high bandwidth availability across the blade back/mid-plane.  This makes this a more interesting option now with lightning fast SSDs and cache options.  Now you can have extremely high performance storage within the blade chassis which provides a lot of options for demanding applications.  In these instances local storage in the chassis will be a big hit, but it will not be the majority of deployments without additional features such as EMC’s ‘Project Lightning’ (http://www.emc.com/about/news/press/2011/20110509-05.htm) to free the trapped data from the confines of the chassis.

Next we have external I/O appliances… These have been on my absolute hate list since the first time I saw them.  Kevin suggests a device based on industry standards but current versions are fully proprietary and require not only the vendors appliance but also the vendors cards in either the appliance or the server, this is the first nightmare.  Beyond that these devices create a single-point-of-failure for the I/O devices of multiple servers, and run directly in the I/O path.  Your basically adding complexity, cost and failure points, and for what?  Let’s look at that:

From Aprius’s perspective ‘Aprius PCI Express over Ethernet technology extends a server’s internal PCIe bus out over the Ethernet network, enabling groups of servers to access and share network-attached pools of PCIe Express devices, such as flash solid state storage, at PCIe performance levels (www.aprius.com.) I’d really like to know how you get ‘PCIe performance levels’ over Ethernet infrastructure???

And from Xsigo: ’In the Xsigo wire-once infrastructure you connect each server to the I/O Director via a single cable. Then connect networks and storage to the I/O Director. You’re now ready to provision Ethernet and Fibre channel connections from servers to data center resource in real time (http://xsigo.com/.)’ Basically you plug all your I/O into their server/appliance then cable it to your server via Infiniband or Ethernet, why???  You’re adding a device in-band in order to consolidate storage and LAN traffic?  FCoE, NFS, iSCSI, etc. already do that on standards based 10GE or 40GE and with no in-band appliance.

Kevin mentions this as a way to allow more space in the blades for future memory and processor options.  This makes sense as HP, IBM, Dell and Sun designs have already run into barriers with the height of their blades restricting processor options.  This is because the blade size was designed years ago and didn’t account for today’s larger processors/heat sinks.  Their only workaround is utilizing two blade slots which consumes too much space per blade.  Newer blade architectures like Cisco UCS take modern processors into account and don’t have this limitation, so don’t require I/O offloading to free space.

Lastly I/O offloading as a whole just stinks to me.  You still have to get the I/O into the server right?  Which means you’ll still have I/O adapters of some type in the server.  With 40GE to the blade shipping this year why would you require anything else?  GPU and cache storage argument?  Sure go that direction and then explain to me why you’d want to pull those types of devices off the local PCIe bus and use them remotely adding latency?

Finally to end my rant a rack size blade enclosure presents a whole lot of lock-in.  You’re at the mercy of the vendor you purchase it from for new hardware and support until it’s fully utilized.  Sounds a lot like the reason we left main frames for x86 in the first place doesn’t it?

Thoughts, corrections comments and sheer hate mail always appreciated!  What do you think?

GD Star Rating
loading...

Cloud Success Factor: Rethink Application Development

You’ve been driving a perfectly suitable family sedan for the last ten years. It’s highly rated by all the gurus who rate such things; it’s safe, reliable and gets acceptable gas mileage. You’ve never loved it in anyway, although you did have a moment of pure capitalist joy when you drove it off the lot, and you’ve never disliked it in any way. Then one day you woke up and out of the blue you were bored and needed a change, a big change…

To see the full post visit Network Computing: http://www.networkcomputing.com/private-cloud/231901846

GD Star Rating
loading...

Hypervisors are not the Droids You Seek

Long ago, in a data center far, far away, we as an industry moved away from big iron and onto commodity hardware. That move brought with it many advantages, such as cost and flexibility. The change also brought along with it higher hardware and operating system software failure rates. This change in application stability forced us to change our deployment model and build the siloed application environment: One application, one operating system, one server….

To see the full post visit Network Computing: http://www.networkcomputing.com/private-cloud/231901662

GD Star Rating
loading...

Choosing The Right Private Cloud Storage

One of the key decisions in architecting an infrastructure for private cloud is selecting a storage platform for the deployment. Storage is a key component of the infrastructure and will play a major role in the overall performance of the private cloud. The storage decision carries additional weight due to its larger investment and typically longer refresh-cycle…..

To view the full article visit Network Computing: http://www.networkcomputing.com/private-cloud/231901384

GD Star Rating
loading...

Build For IT Nirvana

In many data centers large and small there is a history of making short-term decisions that affect long-term design. These may be based on putting out immediate fires, such as rolling out a new application, expanding an old one, or replacing failed hardware. They may also be made by short-sighted or near-sighted policies, or more commonly old policies that aren’t question in the light of new technology. These types of decisions can range from costly to crippling for data center operations…

To see the full post visit NetworkComputing: http://www.networkcomputing.com/private-cloud/231700329

GD Star Rating
loading...

Private Cloud: It’s Not About ROI

Most private cloud discussions revolve around the return on investment of the architecture. Many discussions begin and quickly end with ROI. The reason is that ROI is very difficult to show in real numbers for any IT investment, but more so when the majority of the costs are soft costs.

ROI is an important factor and can’t be left out of discussions, but it’s not the only factor and likely not the most important factor.

To read the rest see the blog on Network Computing (no registration required): http://www.networkcomputing.com/private-cloud/231601280

GD Star Rating
loading...

The Need to Design for Workload Mobility in the Cloud: DR and ROI Considerations

 

The pressure is on for business and information technology services to produce 100% available environments with an equally high return of the capital investment allocated to the infrastructure used to support and operate their technology environments. Despite businesses’ desire for 100% availability and an “availability-as- a-utility” model, a highly available IT infrastructure should not be architected as a utility. The availability-as-a-utility model currently lacks standards and the implementation architectures are complex; it is also interdependent on many components, and the level of people and process complexity in IT service delivery increases the risk of downtime when compared to technology adoption risks.  These components are not easily quantized and their interactions are not well understood, which is preventing practical development of the availability-a-as-utility model.

While availability-as- a-utility may not be practical, architecting your IT environment to be part of an active / active cloud is practical.  A recent study published by Gartner Research suggests that if the business impact of downtime can be considered significant for some business processes, such as those affecting revenue, regulatory compliance, customer loyalty, health, and safety, then the owners of enterprise technology infrastructure should invest in continuous availability architectures whose operating context is active / active (Scott, 2010).

 

Creating an active / active environment can be accomplished by using application level clustering or cloud based virtual mobile workloads.  The traditional approach of application level clustering does not scale at the same rate as a virtualization based application platforms.  In most cases, application level clusters need to be architected and coded on a case-by-case basis.  At the same time, the hosting of these applications on a virtualized server platform typically requires no changes to the application level confirmation or metadata.  Many third party analysts recommend emerging technologies that enable mobile workloads to replace the fragile, script-based or application dependent recovery routines.  These new technologies are easier to maintain and can provide more granularity and greater consistency, and can increase efficiencies in the pursuit of this goal.  Because emerging tools in this space tend to be more loosely coupled, rather than tightly coupled (like that of traditional application clustering), enterprises will be more likely to reduce the “spare” infrastructures required for recovery, and thus reduce the overall cost of providing highly available recovery infrastructures.  In addition, as more virtualized cloud environments are deployed into production, these tools will be able to make use of the underlying virtual platform for providing something close to availability-as- a-utility via virtual server mobility (Witty & Morency, 2010).  Therefore, both large and small organizations gain a greater ROI to virtualize the hosted application and rely on virtualized mobile workloads to provide availability versus investing in an application level active / active deployment.

 

Keep in mind that a subset of cloud, automated utility compute environments, do not improve availability alone. To deliver high preforming and highly available services and applications, storage and networking infrastructures must also be designed to support these environments via support for workload mobility (Filks & Passmore, 2010).  For this, the best solution is to prepare your applications and infrastructure to exist within a virtual datacenter environment or to utilize fabric computing. This type of strategy can offer a number of advantages to an organization, such as improved time to deployment, greater infrastructure efficiencies, and increased resource utilization in the datacenter.  In addition, recent studies found that placing fabric computing and creating a virtualized datacenters on the priority list of data center architecture planning when your virtualization plans call for a dynamic infrastructure (Weiss & Butler, Febuary 2011).  High availability, highly efficient multiple datacenter implementations are prime examples of the previously mentioned dynamic infrastructure.

 

One of the tools to implement virtualized mobile workloads is the use of long-distance live migration of virtualized workloads through one of the various types of datacenter bridging technologies.  The live migration of virtualized workloads enables an IT organization to move workloads as required.  This can be a manual process such as in anticipation of a disaster, datacenter moves, workload migrations, and planned maintenance.  It is also implemented automatically to rebalance capacity across datacenters.  Architecting your application infrastructure to support mobile workloads will reduce or eliminate the downtime associated with these initiatives or projects.   Moreover, the support for long-distance live migration could be used to enable live workload migration across internal and external service providers.  An example of this is leveraging additional utility compute resources of cloud datacenters and hybrid private / public cloud architectures.

Consider a VDI deployment deployed in virtualized datacenter model over two geographic locations.  This deployment would leverage long distance live migrations of workloads, first host redundancy protocol localization for egress traffic, an application delivery network for ingress traffic selection, and active / active SAN extensions to ensure storage consistency.

In this scenario:

  • The operations team is able to migrate workloads between datacenters and perform routine maintenance without the need for specialized maintenance windows.  This allows for an increased level of operational productivity by way more efficient time management.
  • The need to maintain state of infrastructure metadata and configuration revisions is diminished significantly as the active / active virtualized datacenter is providing continuous validation of operational consistency.  This also increases productivity and reduces the task load of the operations team.
  • The investment of the compute, network, and storage infrastructure at both sites is being realized on a continual basis; one whole set of infrastructure is not sitting dormant for lengthy periods of time.
  • The need for periodic full scale “failover-test” is eliminated.  Both site’s operational veracity is validated through continuous use.  Again, this reduces operational staff requirements and workload.  It also can result in removing the capitol required to secure large recovery centers for testing purposes only.

This short example demonstrates where ROI can be increased while simultaneously providing for increased application performance and utilization.

The purposeful design and integration of workload mobility technologies into an organization’s IT strategy has significant potential business benefits.  Most enterprises approach availability in an opportunistic way after they have put their IT infrastructure into production. However, achieving 100% or near-100% availability and infrastructure efficiency requires a comprehensive planning and integration; ad-hoc or point-in-time designs and implementations will not suffice.  When constructing your cloud or virtualized datacenter environment, it is critical to not just consider enabling specific piece-parts of workload migrations and automation, but also enable the entire end-to-end information technology service including network and storage infrastructures (Witty & Morency, 2010).
In some security circles there are the sayings, “secure by design” and “an environment that is 99% secure is eventually 100% insecure,” which are lessons directly related to the deployment of clouds and virtualized datacenters (in addition to the direct implications of the obvious InfoSec context).  Specifically, a cloud environment should be designed with location agnosticism via virtualized mobile workloads from the start.  It should not rely on legacy scripting, warm-standby modes, or offline migration processes that work 99% of the time.  Doing so increases the probability for a costly redesign to improve infrastructure productivity, or worse, failure – to 100% of the time.

 

 

Jason Maki is a Datacenter Business Consultant with World Wide Technologies.  He currently leads the cloud architecture design and implementation efforts for datacenter, commercial service providers, and federal customers.  Jason was chosen to speak at VMWorld to comment on the trajectory of information infrastructure best practices in the business continuity and disaster planning space.  Jason’s solutions have linked technical engineering and operational efficiencies, creating profitable innovative solutions.  During Jason’s career he has been honored by Cisco, VMware, SunGard Availability Services, Dell, and Fujitsu Network Services as being an architectural leader in the datacenter and business continuity space.


References

Filks, V., & Passmore, R. E. (2010). How to Implement High-Availability Storage for Server Virtualized Environments. Gartner Report

Scott, D. (2010). Continuous Availability Architectures. Garnter Report

Weiss, G. J., & Butler, A. (Febuary 2011). Fabric Computing Poised as a Preferred Infrastructure. Gartner Report

Witty, R. J., & Morency, J. P. (2010). Hype Cycle for Business Continuity Management and IT. Gartner Report

 

GD Star Rating
loading...

Why FCoE Standards Matter

Mike Fratto at Network Computing recently wrote an article titled ‘FCoE: Standards Don’t Matter; Vendor Choice Does’ (http://www.networkcomputing.com/storage-networking-management/231002706.)

I definitely differ from Mike’s opinion on the subject.  While I’m no fan of the process of making standards (puts sausage making to shame), or the idea of slowing progress to wait on standards, I do feel they are an absolutely necessary part of FCoE’s future.  It’s all about the timing at which we expect them, the way in which they’re written, and most importantly the way in which they’re adhered to.

Mike bases his opinion on Fibre Channel history and accurately describes the strangle hold the storage vendors have had on the customer.  The vendor’s Hardware Compatibility List (HCL) dictates which vendor you could connect to, and which model and which firmware you can use.  Slip off the list and you lose support.  This means that in the FC world customers typically went with the Storage Area Network (SAN) their VAR or storage vendor recommended, and stuck with it.  While not ideal this worked fine in the small network environment of SAN with the specialized and dedicated purpose of delivering block data from array to server.  These extreme restrictions based on storage vendors and protocol compatibility will not fly as we converge networks.

As worried as storage/SAN admins may be about moving their block data onto Ethernet networks, the traditional network admins may be more worried because of the interoperability concept.  For years network admins have been able to intermix disparate vendors technology to build the networks that they desired, best-of-breed or not.  A load-balancer here, firewall there, data center switch here and presto everything works.  They may have had to sacrifice some features (proprietary value add-that isn’t compatible) but they could safely connect the devices.  More importantly they didn’t have to answer to an HCL dictated by some end-point (storage disk) or another on their network.

For converged networking to work, this freedom must remain.  Adding FCoE to consolidate infrastructure cannot lock network admins into storage HCLs and extreme hardware incompatibility.  This means that the standards must exist, be agreed upon, be specific enough, and be adhered to.  While Mike is correct, you probably won’t want to build multi-vendor networks day one, you will want to have the opportunity to incorporate other services, and products, migrate from one vendor to another, etc.  You’ll want an interoperable standard that allows you to buy 3rd party FCoE appliances for things like de-duplication, compression, encryption or whatever you may need down the road.  We’re not talking about building an Ethernet network dedicated to FCoE, we’re talking about building one network to rule them all (hopefully we never have to take it to Mordor and toss it into molten lava.)  To run one network we need the standards and compatibility that provide us flexibility.

There is no reason for storage vendors to hold the keys to what you can deploy any longer.  Hardware is stable, and if standards are in place the network will properly transport the blocks.  Customers and resellers shouldn’t accept lock in and HCL dictation just because that has been the status quo.  We’re moving the technology forward move your thinking forward.  The issue in the past has been the looseness with which IEEE FCBB-5 is written on some aspects since it’s inception.  This leaves room for interpretation which is where interoperability issues arise between vendors who are both ‘standards based.’  The onus is on us as customers, resellers and an IT community to demand that the standards be well defined, and that the vendors adhere to them in an interoperable fashion. 

Do not accept incompatibility and lack of interoperability in your FCoE switching just because we made the mistake of allowing that to happen with pure FC SANs.  Next time your storage vendor wants a few hundred thousand for your next disk array tell them it isn’t happening unless you can plug it into any standards compliant network without fear of their HCL and loss of support.

GD Star Rating
loading...

Flexpod Discussion with Vaughn Stewart and Abhinav Joshi

I enjoyed a great conversation with Netapp’s Vaughn Stewart and Cisco’s Abhinav Joshi about FlexPod last week during Cisco Live 2011. Check out the video below.

GD Star Rating
loading...