APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

Disaster Recovery

© December 2005 Michael Desrosiers

This month's topic is about disaster recovery and how this is an integral part of any organizations overall information security strategy.

A disaster recovery plan covers both the hardware and software required to run critical business applications and the associated processes to transition smoothly in the event of a natural or human error caused disaster. To plan effectively, you need to first assess your mission critical business processes and associated applications before creating the full disaster recovery plan. All disaster recovery plans assume a certain amount of risk, the primary one being how much data is lost in the event of a disaster. The planning is much like the insurance business in many ways. There are compromises between the amount of time, effort, and money spent in the planning and preparation of a disaster and the amount of data loss you can sustain and still remain operational following a disaster. Time also factors into the equation, too.

Assess the impact of a disaster on your business from both a financial and infrastructural perspective by asking the following questions:

How much will it affect your value and market confidence?
How much of the organization's resources could be lost?
How long will it take to recover?
What are the total costs?
What is the impact on the overall organization?
What efforts are required to rebuild?
How are customers affected, what is the impact on them?

Many organizations simply cannot function without the computers they need to stay in business. So their recovery efforts may focus on quick recovery or even zero down time, by duplicating and maintaining their computer systems in separate facilities spread out over a geographic area. Over the years, dependence upon the use of computers in the day-to-day business activities of many organizations, large and small has become the norm. Today you can find very powerful computers within every department of an organization. These machines are linked together by a sophisticated network that provides communications with other machines within the company and around the world. Vital functions of these systems depend on the availability of this network. Performance indicators provide the mechanism by which you can measure the success of your disaster recovery process and plan. Performance indicators are somewhat different from those used to measure network performance, because they are a combination of project status and test runs from within the infrastructure.

Now let's talk about Management Awareness. Management Awareness is the most important step in creating a successful disaster recovery plan. To obtain the necessary resources and time required from each area of your organization, senior management has to understand and support the business impacts and risks. Several key tasks are required to achieve management awareness. First, identify the top disasters that could effect your company and analyze their impact on your business. Your analysis should cover effects on communications with suppliers and customers, the impact on operations, and disruption on key business processes.

Senior management needs to be involved in the disaster recovery planning process and should be aware of the risks and potential impact on the organization. Once management understands the financial, physical, and business costs associated with a disaster, it is then able to build a strategy and ensure that this strategy is implemented across the organization. The initial step is an announcement of the disaster recovery project and kickoff of a planning group or steering committee, which should be led by a member of the senior management team.

In the disaster recovery planning stage, you should identify the mission critical, important, and less important processes, systems, and services in your network and put in place plans to ensure these are protected against the effects of a disaster. Key elements of this plan should include the following:

Establish your planning group;

Perform risk assessments and audits;

Establish priorities for your infrastructure;

Develop recovery strategies;

Prepare accurate inventory lists and documentation;

Develop validation criteria and procedures;

Implement your plan.

Establish a planning group to manage the development and implementation of the disaster recovery strategy and plan. Key people from each business unit or operational area should be members of the team, responsible for all disaster recovery activities, planning, and providing regular monthly reports to senior management. In order to create the disaster recovery plan, your planning group needs to thoroughly understand the business and its processes, technology, networks, systems and services. The disaster recovery planning group should prepare a risk analysis and business impact analysis that includes at least the top ten potential disasters. The risk analysis should include the worst case scenario of completely damaged facilities and destroyed resources. It should address geographic situations, current design, lead times of services and existing service contracts. Each analysis should also include an estimate on the financial impacts of replacing damaged equipment, drafting additional resources and setting up extra service contracts. When you've analyzed the risks posed to your business processes from each disaster scenario, assign a priority level to each business process.

Just as the analysis of the business processes determine the priorities of the network, applications and systems, the same analysis should be applied to your network design. The site priorities and location of key services contribute to a fault tolerant design, with resilience built into the network infrastructure and services and resources spread over a wide geography. Develop a recovery strategy to cover the practicalities of dealing with a disaster. Such a strategy may be applicable to several scenarios however, the plan should be assessed against each scenario to identify any actions specific to different disaster types.

Your recovery strategy should include the expected down time of services, action plans and escalation procedures. Your plan should also determine thresholds, such as the minimum level at which the business can operate, the systems that must have full functionality (all staff must have access) and the systems that can be minimized. It is important to keep your inventory current and have a complete list of all locations, devices, vendors, used services, and contact names. The inventory and documentation should be part of the design and implementation process of all solutions.

Your disaster recovery documentation should include:

Complete inventory, including a prioritization of resources;

Review process structure assessments, audits, and reports;

Gap and risk analysis based on the outcome of the assessments and audits;

Implementation plan to eliminate the risks and gaps;

Disaster recovery plan containing action and escalation procedures;

Training and awareness material.

Once you've created a plan, you should create a validation process to prove the disaster recover strategy and if your strategy is already implemented, review and test the implementation. It's important that you test and review the plan frequently. We recommend documenting the validation process and procedures and designing a proof of concept process. The validation process should include an experience cycle. A disaster recovery plan is based on experience and each disaster has different rules. You may want to call on experts to develop and prove the concept and product vendors to design and verify the plan.

There you have it. The premise behind developing a sound and workable disaster recovery plan, is to think it through before it is needed. When an emergency situation presents itself, you have to have a plan that will provide criteria for a systematic and well thought out recovery that will be proactive and not reactive.

To respond to this or previous newsletters or to inquire about an on-site presentation, please feel free to call us at 508-995-4933 or email us at mdesrosiers@m3ipinc.com.


Michael Desrosiers
m3ip, Inc.
We Manage Risk So You Can Manage Business

Got something to add? Send me email.

(OLDER)    <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> Disaster Recovery

Inexpensive and informative Apple related e-books:

Take Control of IOS 11

El Capitan: A Take Control Crash Course

Take control of Apple TV, Second Edition

Take Control of Apple Mail, Third Edition

Take Control of Automating Your Mac

More Articles by © Michael Desrosiers

Printer Friendly Version

Have you tried Searching this site?

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us

Printer Friendly Version

Simplicity is prerequisite for reliability. ((Edsger W. Dijkstra)

Linux posts

Troubleshooting posts

This post tagged:


Unix/Linux Consultants

Skills Tests

Unix/Linux Book Reviews

My Unix/Linux Troubleshooting Book

This site runs on Linode