What Is a Disaster?
Clearly, the answer to this question changes with time. When this book first published in the 1990s, Oklahoma City, Hurricane Katrina, the Tokyo Subway incident and, of course, 9/11 had not occurred. These and other events have changed and colored our definition of disasters to the point where they have perhaps permanently altered our very psychology as a nation. What has remained constant over this time is the fact that computers and communications are more of an indispensable component of our economy than ever. Whole new classes of "businesses without storefronts" have appeared. These all depend almost exclusively on one form of "value-added-sand" or another, whether these silicon chips are computer based or the telephone.
The classical scenarios of fire, flood, earthquake, tornado, sabotage, and other disasters still apply. Buildings still burn and get flooded. The impact of such disasters, however, is intensified today when they take enabling technologies with them and potentially affect millions of people.
At the 100,000-foot level we can split disasters into three categories: natural causes, human error, and intentional causes. Virtually all kinds of disasters can be grouped into one of these categories. A fourth category can also be added called acts of God as a catch-all for disasters that defy classification (the legal term for this is force majeure). With all this said, let's jump right in. (See Figure 1.)
Figure 1. Causes of disaster.
Finding the Resources to Complete the Plan
Whether your responsibility is as a LAN/open systems manager, telecommunications manager, mainframe systems manager, or other infrastructure manager, planning for catastrophic disruptions in the systems you control should be an integral part of your job. There are portions of this task that can be shared between departments, spreading the workload over more people, the objective being to hopefully come up with a superior plan faster.
Consider the fact that the lines separating the voice communications, data communications, and local area network departments are becoming more blurred than ever. Just a few years ago, when the Internet was down you lost only data services. Today, with the advent of VoIP (Voice integrated with data over the same network) phone service, many companies now lose their voice and data services when an internal, previously all-data network is down.
Reflective of these changes, equipment component categories themselves are becoming blurred as well. (First, consider that one network resides "in house" and another very similar network that resides "out house.") Ed Pope and I predicted that this would happen when we wrote our 1993 book Understanding Emerging Network Services, Pricing and Regulation. We predicted that fiber optics would make telecommunications like Doritos (eat all you want, we'll make more) and that the network would become increasingly independent of whether the services were voice, data, or something else. We predicted that it would come down to how many "gigacells" would traverse the network and how the providers would manage them. You may recall that in 1993, the only technology that would reliably manage "gigacells" was ATM (Asynchronous Transfer Mode).
As it turned out, it is gigapackets that are managed today, as IP (Internet protocol) has won over ATM in most environments. Even so, it's amazing to see the degree to which today's IP networks have become multipurpose and completely independent of whether the payload is voice, data, image, video, or something else. That fact needs to be reflected in our recovery plans today, because routers, for example, now do more than only data. Switches do more now than only voice. In some environments, physically speaking there is literally no difference between the two because Doritos are Doritos and data packets are data packets. Eat all you want, we'll make more. Our economy has had an insatiable appetite over the last few years. This brings me to another point:
As it is no longer necessary to physically segregate many types of equipment as we did in days past (voice is really data; data is really data, too - understand?), the recovery-planning task has in some ways gotten easier. Think about it. Traditional telecommunications switches (those that are still left after IP!) are large computers and require the same protection and operating standards as mainframes.
Mainframes, in turn, don't require a lot of the excess baggage they once used to require, like chilled water, 400 Hz power, etc. They can sustain themselves just fine in a well-conditioned space, not necessarily the "environment" that they used to require.
Many "mission-critical" frontline applications continue to migrate to the "open" server environment. Therefore, operation and security standards that used to apply only to the mainframe should now apply to the servers as well. It's not the platform that's important, it's the application the platform supports, and how long the company can survive without it.
Chances are that all three systems, telecom, open systems, and mainframes, reside today in the same equipment rooms in your organization. That means if you protect the room, you have protected all three technologies. In later chapters we discuss in detail about how you can share the duty with other departments (and the cost). For the remainder of this chapter, we will provide some basic information about what your planning objectives should be, what it should cost, where to get resources, and where you should start.
How Does One Begin?
I think it's safe to say that most of the people initially tasked with responsibility for a disaster recovery plan by their organizations will not really know where to start. Indeed, the responsibility to maintain the integrity of the business in the event of a natural disaster, catastrophic human error, major system failure, or even a terrorist attack can be a daunting task at first glance. When you think about it, however, as technologists we get presented with all kinds of difficult impossible deadlines and most of the time we do just fine. So what else is new?
The key to a successful project, as any good project manager will tell you, is organization. You will need to define your goals and expectations, set clear objectives, and have a measurement in place to gauge your progress. To put it another way, you need a project plan for the recovery plan.
You will undoubtedly have financial constraints and probably will not have all the people you need for the project. Been there. Done that. It is possible, however, to get a plan in place even so, and at reasonable cost, if the project manager:
- Secures firm management commitment before beginning
- Uses expensive resources such as outside consultants judiciously to accomplish specific and well-defined goals
- Exploits the internal resources already available in the company
- Has a good project plan and means to measure progress
Consider Figure 2, which illustrates a four-step process to achieve the goals set forth earlier.
Figure 2. Four phases of the planning process.
I have personally seen this type of plan utilize as few as three steps, and as many as six. You know your organization best, so you decide. For the purposes of this article, I have settled on four. In this case they are as follows:
Phase I: Business Impact Analysis (BIA) and Executive Commitment
Phase II: The Standards Phase
Phase III: Documentation of the Recovery Plan
Phase IV: Integration with Corporate Plan
Phase I: Business Impact Analysis and Executive Commitment
The idea in Phase I is to utilize the most expensive resources as little as possible, but to accomplish some very complicated goals. One of the first tasks includes a preliminary Business Impact Analysis (BIA). You are probably not going to be privy to a lot of the details of the core business in your organization, because chances are you work in technical services. Even if you find out about details and can describe to them, management may not believe you. Management will believe the right consultant, however.
Why does management believe consultants but not the company's own people? It doesn't seem fair, does it? In fact, count the number of times you have chanted ad infinitum that "this must be a priority," only to have Ernst & Young come in and play a round of golf with your CEO. On Monday, the CEO comes in with the enthusiasm of a revivalist preacher proclaiming the gospel that "this must be a priority!"- the same advice, incidentally, that you have been giving for the last two years.
What does the Big 4 consulting company have that you don't? After all, isn't it logical that you would know more about your business than they do?
What they have that you don't have (but can acquire) is the ability to speak to management in terms they understand. That means business terms, not technical terms. The role of a good consultant is to borrow management's watch to tell them what time it is. You are the watch. The outside consultants are going to come to you for a lot of this information anyhow. Don't get me wrong, Big 4 consultants are very good at what they do. With a little coaching, you can use that to your advantage.
Oh, and by the way, if you as the reader are a Big 4 consultant, there is something here for you too. This same advice is a great way to package your services so that client companies can afford you. You will also delight your client because the techniques described here will not give your client a fish, they will teach your client to fish. Nothing makes for a better and more satisfying consulting engagement than the sense from your client that they have truly learned from you.
Getting back to the project manager, remember that high-end consulting resources are expensive. You will need to limit their participation to certain essential, clearly defined goals. In the meantime, learn everything you can from the consultant, first and foremost because it broadens your skill set and makes you more valuable, even on other non-disaster-recovery-related projects and, second, so that you can become the flag bearer for the disaster recovery project in Phase II - not the expensive consultant.
Let's assume this first phase is being performed by a high-level consultant, like PricewaterhouseCoopers, Ernst & Young, or one of the others. Bring your wallet. Your organization is going to pay a high rate for the consultants. But there is no reason that it cannot limit the hours somewhat and use this expensive resource judiciously. In other words, use the consultant but only for a relatively short time. During this phase, the following action items are undertaken.
The All-Important Executive Pitch:
As I stated earlier, consultants carry credibility with executive management and speak a language in terms executive management understands. This means that when properly utilized, consultants can be very useful for securing financial commitment from management.
First, the consultants may conduct a preliminary business impact analysis (BIA). They may also make the executive pitch complete with some very classy audiovisual material. They may also produce an executive white paper with lots of graphs that condense 5000 words into four pages (seriously, another very useful talent that a good consultant will possess). The consultants will make the compelling point that disaster recovery is important, presenting all the reasons management needs to fund and endorse the project. All for only $500.00 an hour.
Oh, well, I was probably doing a pretty good job selling you until I got to that hourly number. So now what do you do?
I remind you again that you are an experienced project manager. This will not be the first or the last time you will have to work within financial constraints. It's also not the first time you have been tasked with a complex project. The name of the game is what it has always been: resource optimization. Sure, consultants will be an expensive resource, but you will only utilize them to accomplish specific objectives in order to keep the cost down. The most important of these is to sell your boss.
Pros and Cons of Consultants:
One course of action you can consider if you can't afford a high-powered consultant to pitch the top brass is to do the executive presentation yourself. There are career advantages from the visibility you will receive; after all, for many companies disaster recovery planning is a board-of-directors-level issue. If, on the other hand, this prospect intimidates you, you will probably want to get someone to champion it for you. If you end up doing it yourself, there are a few tips on how to do it presented later in this book. The important thing is not to be intimidated by this project simply because it is something you have not done before. Think of it as a new learning experience that will elevate your standing as a technologist and broaden your horizons.
Presenting the Case to Management:
Ironically, you have to actually ask permission to plan. Without management buy-in and endorsement on the project (as well as funding), you are spinning your wheels. At best you can expect to be assigned the project to complete in your copious spare time, or at home in the evening on the kitchen table. If you expect to have people, money, and resources to complete a plan, there are some steps to take first. The first one is to sell your boss. The second one is permission to plan. When asking permission to plan, there are three possible answers. Which one answer do you think is given the most by management?
C. Let's study this some more.
Now, why do you suppose "C" is the answer most often given? Stated another way, have you had a disaster recovery project that lasted five years? This is, in part, why. Management never gets off the dime in supporting the plan and the organization "studies" it forever. This is not to hang the blame on management, however. This problem is usually because technical people are not always very adept at presenting to management in terms management understands. A consultant can help because they are adept at these presentations. This is discussed in more detail in the following chapters.
For the moment, however, as this is only an overview, let's return to our four-step process defined previously. We are now on Phase II.
Phase II is more "nuts and bolts" in orientation and, hopefully, less expensive. It centers on information gathering and standards.
Phase II: The Standards Phase
At the 100,000-ft level, activities undertaken in Phase II might include:
- Recruiting the planning team
- Training staff and hosting seminars
- Developing operating and security standards
- Gathering information via interviews and questionnaires
- Identification of critical databases to import into the plan
- Making long-term technology recommendations and making capital requests
- First draft of the actual plan
- Beginning to integrate the technology plan into the overall corporate plan
Phase III: Documenting the Recovery Plan
After completion of Phase I and Phase II (typically 90 to 120 days), you will finally begin writing the plan. This is not to say you will have no plan during the ensuing 120-day period. Indeed, many things like equipment inventories and personnel call out lists are actually compiled in Phase II. If you do things right, you should be able to compile something good enough to get the auditors off your back in 90 to 120 days. A complete plan, however, takes two years or more to finish. It is almost always under refinement and, besides, you can't trash all the equipment you have today and buy new equipment. You have to phase out what you have and replace it with equipment having fault-tolerant or disaster-resistant characteristics. That takes time, but eventually it will get done.
Phase IV: Integration with Corporate Plan
You can't expect to plan in a vacuum. Consider the elemental issue of who "owns" the building? The data center manager may think he or she owns the building. There is a guard in a blue suit with a badge, however, who sits at the front door, and this person has different ideas. You may have a landlord. You can't just plan based on your department; you must involve others. Your plan will eventually have to be integrated into the Corporate Recovery Plan. This is part of what goes on in Phase IV. This is the time you will test and verify your plan. When you refer back to the four-step diagram, do you notice how the cost decreases with each subsequent phase? This is because you are using fewer and fewer resources like outside consultants, and you are doing (and learning) more and more of the work yourself.
Wow, we make it sound so easy, don't we? Similar to any complicated project, the devil is in the details. That's why even though we have laid out a thumbnail sketch of a plan and how to implement it, the remaining several hundred pages will dive right into the details. These include not only the obvious things, like budget and technology limitations, but the less obvious ones as well, such as departmental "turf issues" and other politics.
In summary, often the most difficult part of the planning process is simply getting off square one, and starting. We hope this book helps you do that. You have handled complex projects before, so don't be afraid of this one. Disaster recovery planning is a thoughtful and methodical process. As most experienced managers have dealt first hand with projects of equal or greater complexity, most are up to the task of producing a plan. Sometimes, though, it helps to have a starting point and a template for the project. We hope to provide you exactly that in the subsequent chapters.
With that said, let's kick off your plan!