Skip to content
Define The Cloud

The Intersection of Technology and Reality

Define The Cloud

The Intersection of Technology and Reality

Have We Taken Data Redundancy too Far?

Joe Onisick (@JoeOnisick), July 10, 2010

During a recent conversation about disk configuration and data redundancy on a storage array I began to think about everything we put into data redundancy.  The question that came to mind is the title of this post ‘Have we taken data redundancy too far?’

Now don’t get me wrong, I love data as much as the next fellow, and I definitely understand it’s importance to the business and to compliance. I’m not advocating tossing out redundancy or data protection, etc etc.  My question is when is enough enough, and or is there a better way?

To put this in perspective let’s take a look at everything that stacks up to protect enterprise data:

Disks:

We start with the lowly disk, which by itself has no redundancy.  While disks themselves tend to have one of the highest failure rates in the data center they have definitely come a long way.  Many have the ability to self protect and warn of impending failure at a low level, and they can last for years without issue.

Disks alone are a single point-of-failure in which all data on the disk is lost if the drive fails.  Because of this we’ve worked to come up with better ways to apply redundancy to the data.  The simplest form of this is RAID.

RAID:

RAID is ‘Redundant Array of Inexpensive Disks’, it’s also correct to say it ‘Redundant Array of Independent Disks.’  No matter what you call it RAID allows what would typically be a single disk on its own to act as part of a group of disks for the purposes of redundancy, performance or both.  You can think of this like disk clustering.

Some common RAID types used for redundancy are:

    • RAID 0 – Disk striping, data is striped across 2 disks to improve performance, each disk becomes a single point of failure for the entirety of the stored data.
    • RAID 1 – Disk mirroring, data is written to both disks simultaneously as an exact copy.  This method allows either disk to fail with no data loss. 
    • RAID 5 – Raid 5 is striping with parity.  What this mean is that using three or more disks all data is written in stripes across available disks and additional parity data is striped across the disks to provide redundancy.  Because of the parity data one disk can be lost from the group without data loss.  Raid 5 is N-1 capacity, meaning you lose 1 disk worth of space to the parity data.
    • RAID 6 – Disk striping with double parity.  Think RAID 5 with an extra disk lost for parity but the ability to lose two disks without data loss.  This is N-2 capacity.

In many cases ‘hot-spares’ will also be added to the RAID groups.  The purpose of a hot’-spare is to have a drive online but not participating in the RAID for failure events.  If a RAID disk fails the hot-spare can be used to replace it immediately until an administrator can swap the bad drive.

Snapshots:

Another level of redundancy many enterprise storage arrays will use is snapshots.  Snapshots can be used to perform point-in-time recoveries.  Basically when a snapshot is taken it locks the associated blocks of data ensuring they are not modified without copying them.  If a block needs to be changed it will be written in a new location without effecting the original.  In order to revert to a snapshot the change data is simply removed leaving the original locked blocks.  While snapshots are not a backup or redundancy feature on their own they can be used as part of other systems, and are excellent for development environments where testing is required on various data sets, etc.  Snapshots consume additional space as two copies are kept of any locked block that is changed.

Primary/Secondary replication:

Another method for creating data redundancy is tiered storage.  In a tiered redundancy model the primary storage serving the applications is held on the highest performing disk and data is backed up or replicated to lower performance less expensive disk or disk arrays.

Virtual Tape Libraries (VTL):

Virtual tape libraries are storage arrays that present themselves as standard tape libraries for the purposes of backup and archiving.  VTL is typically used in between primary storage and actual tape backups as a means of decreasing the backup window.

Tape backups:

In most cases the last stop for backup and archiving is still tape.  This is because tape is cheap, high density, and ultra-portable.  Large amounts of data can be streamed to tape libraries which can store the data and allow tapes to be sent to off-site storage facilities.

Adding it up:

When you put these redundancy and recovery systems together and start layering them on top of one another you end up with high ratios of storage media being purposed for redundancy and recovery compared to the actual data being served.  10:1, 20:1, 100:1 or more is not uncommon when considering archive/redundancy space compared to usable space.

Summary:

My summary is more of a repeat of the same question.  Have we taken this too far?  Do we need protection built in at each level, and layered on top of one another?  Can we afford to continue down this path adding redundancy at the expense of performance and utilization?  Should we throw higher parity RAID at our arrays and make up the performance hit with expensive cache?  Should we purchase 10TB of media for every 1TB we actually need to serve?  Is there a better way?

I don’t have the answer to this one, but would love to see a discussion on it.  The way I’m thinking now is bunches of dumb independent disk pooled and provisioned through software.  Drop the RAID and hot spares, use the software to maintain multiple local or global copies on different hardware.  When you start moving the disk thinking to cloud environments and talking about Petabytes or more of data the current model starts unraveling quickly.

Share this:

  • Facebook
  • X

Related posts:

  1. Redundancy in Data Storage: Part 1: RAID Levels
  2. Redundancy in Data Storage: Part 2: Geographical Replication
  3. Data Center 101: Server Systems
  4. Digging Into the Software Defined Data Center
  5. Data Center Overlays 101
Concepts dataredundancyStorage

Post navigation

Previous post
Next post

Related Posts

Concepts

The Art of Pre-Sales

July 16, 2010December 23, 2018

On a recent customer call being led by a vendor account manager and engineer I witnessed some key mistakes by the engineer as he presented the technology to the customer.  None of the mistakes were glaring or show stopping but they definitely kept the conversation from having the value that…

Share this:

  • Facebook
  • X
Read More

The Biggest Threat to Your Private-Cloud Deployment: Your IT Staff

July 1, 2012May 18, 2020

People are the No. 1 reason why private clouds fail. The traditional IT staff is a tactically driven, deeply technical group of hardware and software problem solvers who aren’t familiar with strategic IT thinking and don’t have time for it. They aren’t accustomed to aligning IT processes with business drivers….

Share this:

  • Facebook
  • X
Read More
Concepts

We Live in a Multi-Cloud World: Here’s Why

September 29, 2018May 18, 2020

It’s almost 2019 and there’s still a lot of chatter, specifically from hardware vendors, that ‘We’re moving to a multi-cloud world. This is highly erroneous. When you hear someone say things like that, what they mean is ‘we’re catching up to the rest of the world and trying to sell…

Share this:

  • Facebook
  • X
Read More

Comments (2)

  1. Pingback: Redundancy in Data Storage: Part 1: RAID Levels — Define The Cloud
  2. Steven Senecal says:
    January 23, 2015 at 7:49 am

    I saw a link to this article, good read, I would love to get a follow up as you wrote this slightly over 5 years ago.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Creative Commons License
This work by Joe Onisick and Define the Cloud, LLC is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

Disclaimer

All brand and company names are used for identification purposes only. These pages are not sponsored or sanctioned by any of the companies mentioned; they are the sole work and property of the authors. While the author(s) may have professional connections to some of the companies mentioned, all opinions are that of the individuals and may differ from official positions of those companies. This is a personal blog of the author, and does not necessarily represent the opinions and positions of his employer or their partners.
©2025 Define The Cloud | WordPress Theme by SuperbThemes