The pressure is on for business and information technology services to produce 100% available environments with an equally high return of the capital investment allocated to the infrastructure used to support and operate their technology environments. Despite businesses’ desire for 100% availability and an “availability-as- a-utility” model, a highly available IT infrastructure should not be architected as a utility. The availability-as-a-utility model currently lacks standards and the implementation architectures are complex; it is also interdependent on many components, and the level of people and process complexity in IT service delivery increases the risk of downtime when compared to technology adoption risks. These components are not easily quantized and their interactions are not well understood, which is preventing practical development of the availability-a-as-utility model.
While availability-as- a-utility may not be practical, architecting your IT environment to be part of an active / active cloud is practical. A recent study published by Gartner Research suggests that if the business impact of downtime can be considered significant for some business processes, such as those affecting revenue, regulatory compliance, customer loyalty, health, and safety, then the owners of enterprise technology infrastructure should invest in continuous availability architectures whose operating context is active / active (Scott, 2010).
Creating an active / active environment can be accomplished by using application level clustering or cloud based virtual mobile workloads. The traditional approach of application level clustering does not scale at the same rate as a virtualization based application platforms. In most cases, application level clusters need to be architected and coded on a case-by-case basis. At the same time, the hosting of these applications on a virtualized server platform typically requires no changes to the application level confirmation or metadata. Many third party analysts recommend emerging technologies that enable mobile workloads to replace the fragile, script-based or application dependent recovery routines. These new technologies are easier to maintain and can provide more granularity and greater consistency, and can increase efficiencies in the pursuit of this goal. Because emerging tools in this space tend to be more loosely coupled, rather than tightly coupled (like that of traditional application clustering), enterprises will be more likely to reduce the “spare” infrastructures required for recovery, and thus reduce the overall cost of providing highly available recovery infrastructures. In addition, as more virtualized cloud environments are deployed into production, these tools will be able to make use of the underlying virtual platform for providing something close to availability-as- a-utility via virtual server mobility (Witty & Morency, 2010). Therefore, both large and small organizations gain a greater ROI to virtualize the hosted application and rely on virtualized mobile workloads to provide availability versus investing in an application level active / active deployment.
Keep in mind that a subset of cloud, automated utility compute environments, do not improve availability alone. To deliver high preforming and highly available services and applications, storage and networking infrastructures must also be designed to support these environments via support for workload mobility (Filks & Passmore, 2010). For this, the best solution is to prepare your applications and infrastructure to exist within a virtual datacenter environment or to utilize fabric computing. This type of strategy can offer a number of advantages to an organization, such as improved time to deployment, greater infrastructure efficiencies, and increased resource utilization in the datacenter. In addition, recent studies found that placing fabric computing and creating a virtualized datacenters on the priority list of data center architecture planning when your virtualization plans call for a dynamic infrastructure (Weiss & Butler, Febuary 2011). High availability, highly efficient multiple datacenter implementations are prime examples of the previously mentioned dynamic infrastructure.
One of the tools to implement virtualized mobile workloads is the use of long-distance live migration of virtualized workloads through one of the various types of datacenter bridging technologies. The live migration of virtualized workloads enables an IT organization to move workloads as required. This can be a manual process such as in anticipation of a disaster, datacenter moves, workload migrations, and planned maintenance. It is also implemented automatically to rebalance capacity across datacenters. Architecting your application infrastructure to support mobile workloads will reduce or eliminate the downtime associated with these initiatives or projects. Moreover, the support for long-distance live migration could be used to enable live workload migration across internal and external service providers. An example of this is leveraging additional utility compute resources of cloud datacenters and hybrid private / public cloud architectures.
Consider a VDI deployment deployed in virtualized datacenter model over two geographic locations. This deployment would leverage long distance live migrations of workloads, first host redundancy protocol localization for egress traffic, an application delivery network for ingress traffic selection, and active / active SAN extensions to ensure storage consistency.
- The operations team is able to migrate workloads between datacenters and perform routine maintenance without the need for specialized maintenance windows. This allows for an increased level of operational productivity by way more efficient time management.
- The need to maintain state of infrastructure metadata and configuration revisions is diminished significantly as the active / active virtualized datacenter is providing continuous validation of operational consistency. This also increases productivity and reduces the task load of the operations team.
- The investment of the compute, network, and storage infrastructure at both sites is being realized on a continual basis; one whole set of infrastructure is not sitting dormant for lengthy periods of time.
- The need for periodic full scale “failover-test” is eliminated. Both site’s operational veracity is validated through continuous use. Again, this reduces operational staff requirements and workload. It also can result in removing the capitol required to secure large recovery centers for testing purposes only.
This short example demonstrates where ROI can be increased while simultaneously providing for increased application performance and utilization.
The purposeful design and integration of workload mobility technologies into an organization’s IT strategy has significant potential business benefits. Most enterprises approach availability in an opportunistic way after they have put their IT infrastructure into production. However, achieving 100% or near-100% availability and infrastructure efficiency requires a comprehensive planning and integration; ad-hoc or point-in-time designs and implementations will not suffice. When constructing your cloud or virtualized datacenter environment, it is critical to not just consider enabling specific piece-parts of workload migrations and automation, but also enable the entire end-to-end information technology service including network and storage infrastructures (Witty & Morency, 2010).
In some security circles there are the sayings, “secure by design” and “an environment that is 99% secure is eventually 100% insecure,” which are lessons directly related to the deployment of clouds and virtualized datacenters (in addition to the direct implications of the obvious InfoSec context). Specifically, a cloud environment should be designed with location agnosticism via virtualized mobile workloads from the start. It should not rely on legacy scripting, warm-standby modes, or offline migration processes that work 99% of the time. Doing so increases the probability for a costly redesign to improve infrastructure productivity, or worse, failure – to 100% of the time.
Jason Maki is a Datacenter Business Consultant with World Wide Technologies. He currently leads the cloud architecture design and implementation efforts for datacenter, commercial service providers, and federal customers. Jason was chosen to speak at VMWorld to comment on the trajectory of information infrastructure best practices in the business continuity and disaster planning space. Jason’s solutions have linked technical engineering and operational efficiencies, creating profitable innovative solutions. During Jason’s career he has been honored by Cisco, VMware, SunGard Availability Services, Dell, and Fujitsu Network Services as being an architectural leader in the datacenter and business continuity space.
References
Filks, V., & Passmore, R. E. (2010). How to Implement High-Availability Storage for Server Virtualized Environments. Gartner Report
Scott, D. (2010). Continuous Availability Architectures. Garnter Report
Weiss, G. J., & Butler, A. (Febuary 2011). Fabric Computing Poised as a Preferred Infrastructure. Gartner Report
Witty, R. J., & Morency, J. P. (2010). Hype Cycle for Business Continuity Management and IT. Gartner Report
loading...



