Information Security Today Home

New Books

The Green and Virtual Data Center
Enterprise Systems Backup and Recovery: A Corporate Insurance Policy
Enterprise Architecture A to Z: Frameworks, Business Process Modeling, SOA, and Infrastructure Technology
Business Resumption Planning, Second Edition
VMware ESX Essentials in the Virtual Data Center

Dealing with High Availability/Disaster Recovery Issues in Multi-tier, Heterogeneous Environments

Dan Lamorena

Data centers are transitioning from big, static applications to distributed multi-tier applications. With organizations' increased reliance upon these complex, integrated services, the requirements for end-to-end application availability have increased. Yet next-generation data centers, with their need for virtualized, scalable and distributed environments, push IT managers and technologies to the very edge - and in some cases, past the breaking point.

In fact, according to research conducted by Symantec in 2008, virtualization is the major factor causing 55 percent of organizations to reevaluate their disaster recovery (DR) plans. Globally, 35 percent of respondents cited too many different tools as the biggest challenge in protecting mission-critical data and applications within physical and virtual environments. Complications with having different tools for physical and virtual environments include higher training costs, operating inefficiencies, greater software costs and workforces that work in silos.

The goal of any high availability/disaster recovery (HA/DR) plan is uptime and quick recovery. And, with IT departments providing the infrastructure for nearly every group within an organization to do business, IT departments have the added challenge of ensuring reliable services to these groups. Because of this, it is not surprising that HA/DR has surfaced as a key business priority.

Set and Agree on Expectations
Keeping business critical information available puts much pressure on the IT department, as they struggle to find new ways to keep their systems up and running. This challenge, when combined with an ever-growing number of mission critical and business critical applications which demand the same level of availability is more than enough to stress out any data center manager.

In order to protect their sanity, and set the proper expectations, IT professionals must communicate well with their internal clients and business leaders about what to expect and what is expected from them in terms of uptime. This most often is done in the form of recovery time objectives (RTO) and recovery point objectives (RPO)

RTO and RPO are quantitative ways to determine how much application outage is acceptable (that's RTO) and how much data loss is acceptable (that's RPO). Once you have a good sense of what the RTO and RPO are for a given application, you can then tailor the right HA/DR solution for that mission-critical app-whether the company's email system, ERP or financial system, CRM tool, portal, or other transactional system.

And the "right" HA/DR approach will likely involve a variety of technologies. For example, data backup likely meets the need of environments with an RPO and RTO of a day or two, and storage management and data replication tools can help ensure that a copy of critical data is always available at a DR site. With such tools in place, data is recoverable to a point measured in minutes. Organizations can then concentrate on making sure their important applications are also protected, using automated provisioning tools that help reduce application re-start time from days to just hours. And, for the "platinum standard" in application HA/DR, with an RTO of minutes, clustering is the answer.

It's not difficult to see that the synergy created when all of these technologies are used together can make for a much more complete HA/DR solution.

Address the Problem
When companies put together an HA/DR solution, it's often in response to nightmares about what would happen if their key hardware went down, their critical software failed, their storage and networking components had problems, or a natural disaster took down their data center.

IT professionals often underestimate the "human error" factor as the leading cause of downtime. Configuration or other errors caused by administrators and users. They find that even after doing all they can to prevent external problems they also need to look internally to protect their data centers against downtime.

In fact, one of the problems that often arises is that a company doesn't recognize which systems or applications are dependent on one another. Change management tools help them do that kind of mapping and then determine what the impact of change might be on multiple systems, not just the one the change was made on. These change management systems can tell you who made the change, when it was made, and what was actually changed so that if downtime should occur you can actually diagnose the problem a lot faster than you otherwise could.

In clustered environments, configuration drift is a real threat to HA/DR. This happens when a change is made to one server but not to a backup server, and it can really cause problems for these clustered systems. If you can add change management into the clustered environment, then you can prevent this drift and make sure you're getting more out of your HA/DR investments.

Simplify Management of Systems
Organizations that are at the forefront of technology development are quick to adopt technologies to solve business problems. However, this causes a number of challenges for their distributed, multi-tier data center applications. The biggest one is multiple management tools for individual multi-tiered applications.

For example, an organization may have a database server running on one OS platform, a middle tier application running a different one, and a web tier on yet another. Each of these multi-tiered environments must be able to be controlled, integrated and reigned to provide a single business service.

Because so many of these applications must have a high degree of integration, organizations need visibility into the application environment and the ability to monitor all these different components in order to provide true HA.

Even if a database server is up, there are many other applications that connect to that database server and provide the actual visual services to the end user. To maintain HA in such an environment, the data center manager must be able to manage not only the backend database that actually holds the data and takes the customer's credit card transaction, but all those moving parts in between as well because if any one of those moving parts stops moving, then as far as the customer is concern, the service is down.

Most products aim at protecting just one database. But by taking time to find a tool that allows organizations to manage virtualized environments such as VMware, Solaris, and others alongside physical environments is the ideal situation. By consolidating to a single tool, rather than using point tools or application-specific tools, IT departments can manage any application in any environment, whether the application is running in a virtual machine or a physical machine.

In addition, they may have the tool manage which virtual machines are running at any given time and manage the CPU utilization of that box so you can turn off computer power when it is not needed. By doing so, the organization can mitigate the risk of having multiple parts of the environment negatively impacted by a failure in one part.

By choosing a single management tool, from an operations standpoint, IT departments can have the full ability to log in and see what is currently in their environments and the services that are running in the enterprise.

Conclusion
During times of budget crunches, some organizations may be tempted to remove disaster recovery as a line item, however. This short-sighted view can have huge implications. The cost of downtime is much higher and most studies show that ROI from HA/DR initiatives pay for themselves in terms of reduced losses. With an increase in new technologies that raise the bar for protection without breaking the budget, now may be the best time to be spending on disaster recovery initiatives. When looking at multi-tiered applications, a failure to one part of the system may require more downtime than when a monolithic application goes down. The reason is the coordination between all of the moving parts. By putting in solutions that monitor configuration drift, monitor the entire system, and automate the recovery, companies will likely find that the solution pays for itself in terms of reducing revenue, reputation, and productivity lost - meaning happier customers, partners, employees, and management.


Related Reading

So, You Want to Write a Disaster Recovery Plan?

What to Expect When Expecting a Disaster

Ten Tips for Successful IT Disaster Recovery Planning


About the Author
Dan Lamorena is a Senior Product Marketing Manager at Symantec.

 
Subscribe to
Information Security Today






Powered by VerticalResponse



© Copyright 2009 Auerbach Publications