Site Loader

It goes without saying that modern business relies on information technology.  As a result, it is essential that operations personnel consider the business impact of outages and plan accordingly.  As an illustration, Virgin Blue recently experienced a twenty-hour outage in its reservation system that resulted in losses of up to $20 million dollars.  The cloud provides both considerable opportunities and significant challenges relating to disaster recovery.

In general, organizations must currently build multiple levels of redundancy into their systems to reach high-availability targets and to protect themselves from catastrophic outages during a natural or man-made disaster.  A disaster recovery strategy requires that data and critical application infrastructure be duplicated at a separate location, away from the primary datacenter.  Cutting over to a disaster recovery site is usually not instantaneous and redundancy is often lost during the contingency operating plan.  For this reason, site-local redundancy mechanisms – such as high availability network systems, failover for portions of the application stack, and SAN-level redundancy are also required to achieve availability goals.  Public clouds often further complicate disaster recovery planning, as the organization’s critical systems may now be spread across their own infrastructure and a multitude of outside vendors, each with their own data model and recovery practices.

Business requirements and application criticality should guide the approach chosen for business continuity.  Consider the concepts of RPO (Recovery Point Objective) and RTO (Recovery Time Objective). The RPO of a system is the specified amount of data that may be lost in the event of a failure, while the RTO of a system is the amount of time that it will take to bring the system back online after a failure.  In general, site-local mechanisms will provide near-instantaneous RPO and RTO, while disaster recovery systems often will have an RPO of several hours or days of information, and an RTO measured in tens of minutes. Through increasingly sophisticated (and costly) infrastructures, these times can be reduced but not entirely eliminated.

Timeline illustrating concepts of RPO and RTO
Illustration of RTO and RPO in a backup system

Dedicated redundancy infrastructure, both site-local and for disaster recovery purposes, must be regularly tested.  Additionally, it is essential to ensure that the disaster recovery environment is compatible with the existing infrastructure and capable of running the critical application.  This is an area where change management procedures are important, to ensure that critical changes to the production infrastructure are made in the standby environment as well.  Otherwise, the standby environment may not be able to correctly run the application when the disaster recovery plan is activated.

The primary factor that determines RTO and RPO is the approach used to move data to the contingency site.  The easiest and lowest cost approach is tape backup.  In this case, the RPO is the time between successive backups moved off-site (perhaps a week or more) and the RTO is the amount of time necessary to retrieve the backups, restore the backups, and activate the contingency site.  This may be a significant amount of time, especially if personnel are not readily available during the disaster scenario.  Alternatively, a hot contingency site may be maintained, and database log-shipping or volume snapshotting/replication can be used to send business data to the secondary site.  These systems are costly, but readily attain an RTO of under an hour, and an RPO of perhaps one day.  With substantial investment and complexity, RPO can even be reduced to the range of minutes.  However, organizations have often been surprised to find that the infrastructure doesn’t work when it is called upon, often because of the complexity of the infrastructure and the difficulties involved in testing a standby site.

When procuring IaaS (Infrastructure as a Service) or SaaS (Software as a Service), it is essential for the organization to perform due diligence regarding what disaster recovery mechanisms the service vendor uses. The stakes are too high to trust service level agreements alone (in the case of a catastrophic failure during a disaster, will the vendor be solvent and will the compensation received be sufficient to compensate for business losses?).

Disaster Recovery as a Service, or DRaaS, is an emerging category for organizations that wish to control their own infrastructure but not maintain the disaster recovery systems themselves.  With a DRaaS offering, an IT organization does not directly build a contingency site, but instead relies on a vendor to do so on a dedicated or utility computing infrastructure.  The cloud’s advantages in elasticity and cost-reduction are significant benefits in a disaster recovery scenario, and service offerings allow organizations to outsource portions of contingency planning to vendors with expertise in the area.  However, many of the complexities remain and it is essential to perform the due diligence to ensure that the contingency plan will work and provide a sufficient level of service if called upon.

Finally, there are emerging technologies that combine site-local redundancy and disaster recovery into a unified system.  For example, distributed synchronous multi-master databases allow an application to be spread across multiple locations, including cloud availability zones, with the application active and processing transactions in all of them.  A specified portion of the system can be lost without any downtime or recovery effort.  These emerging systems offer the prospect of dramatically reducing costs and minimizing the risk of contingency sites not functioning properly.

About the Author

Michael Lyle (@MPLyle) is CTO and co-founder of Translattice, and is responsible for the company’s strategic technical direction.  He is a recognized leader in developing new technologies and has extensive experience in datacenter operations and distributed systems.

Post Author: Joe Onisick (@JoeOnisick)

12 Replies to “Disaster Recovery and the Cloud”

  1. Thanks for the marvelous posting! I truly enjoyed reading it, you will be a great
    author.I will make certain to bookmark your blog and will often come
    back later in life. I want to encourage you to continue your great posts, have a nice morning!

  2. Hey there, You’ve done an incredible job. I will definitely digg it
    and personally recommend to my friends. I am sure they will be benefited from this site.

  3. Hey there! I know this is kind of off topic but I was wondering which blog platform are you using for this site?
    I’m getting tired of WordPress because I’ve had problems with hackers and I’m looking at alternatives for another platform.
    I would be fantastic if you could point me in the direction of a good platform.

  4. An outstanding share! I’ve just forwarded this onto a coworker who has been doing a little research on this.
    And he actually ordered me dinner simply because I discovered it for him…
    lol. So allow me to reword this…. Thank YOU for the meal!!
    But yeah, thanks for spending the time to talk about
    this topic here on your internet site.

  5. Hey there! This is kind of off topic but I need some guidance from an established blog.
    Is it very hard to set up your own blog? I’m not very techincal but I can figure things out pretty
    quick. I’m thinking about creating my own but I’m not sure where to start.
    Do you have any ideas or suggestions? Thanks

  6. With havin so much content do you ever run into any issues of plagorism or copyright infringement?
    My blog has a lot of completely unique content I’ve either written myself
    or outsourced but it looks like a lot of it is popping it up all over
    the internet without my authorization. Do you know any methods to help
    prevent content from being ripped off? I’d genuinely
    appreciate it.

  7. What i don’t realize is in fact how you are
    not actually a lot more well-favored than you may be right now.
    You’re so intelligent. You already know thus significantly in the case of this subject, produced me in my opinion consider it from numerous numerous angles.
    Its like men and women are not interested unless
    it’s something to accomplish with Girl gaga!
    Your personal stuffs outstanding. At all times deal
    with it up!

Leave a Reply

Your email address will not be published. Required fields are marked *

Shopping cart

Shipping and discount codes are added at checkout.