Thursday, May 21, 2009

Business Continuity Series, pt 4 - Building a BCP Plan

Before you can build a plan you need to understand what value you are deriving out of the system. Unfortunately, in the real world, Business Continuity Program planning is constrained by resource allocation like any other project, so understanding the value derived from the program. It is possible to quantifying the problem by understanding:
  • Frequency of outages
  • Average duration of outage
  • Time value of outage
  • Value of data lost
  • Opportunity cost of capital investment in plan
Total cost of outages = Frequency x Duration x Time Value

This basic consideration will give you a foundation for justifying budgeting more or less funds into your Business Continuity Program.

When you've arrived at a stage where you need to begin to start choosing a strategy, there are several categories for recovery strategies, each with an escalating financial and resource commitment and proportional recovery / resiliency benefit:
  • Passive-Passive - Cold solution. New equipment may need to be ordered at the time of the event. Capital on-hand 'just-in-case'. Can be improved with planning (better use of capital). Essentially a "do nothing" solution. Probably manifests as a paper plan only with no physically available resources.
  • Active-Passive - Warm redundant systems - Literally: Turn-key or push button solutions. There is equipment ready but it is not currently in use. It is on hand and can be activated on short notice. This is usually because of technology or financial limitations.
  • Active-Active - Traffic is load balanced across multiple systems. Disrupted systems are by-passed and traffic is routed to different machines. Usually minor disruptions pass unnoticed. Only catastrophic events knocking out the entire system are noticed by users. The main concerns of an Active-Active system are costs and capacity. Problems generally only become visible when enough modules are knocked out such that the system is over capacity.
As usual, better plans usually cost more resources, however sometimes there are non-zero sum gains to be had. For instance, a Passive-Passive solution might be to have $5M allocated in the budget as "contingency" in the event of a disaster. Perhaps rather than have $5M budgeted as "contingency" you can employ $1M in capital expenditures to build resiliency into your processes. Although this investment will depreciate over time, it could potentially be better than keeping the capital idle and the economic loss of the internal rate of return (IRR) of $5M.

Also, systems which are heavily used or mission critical will require more active plans. For instance, if Google or 911 suffered any downtime, people would notice.

When putting together a plan there are other important considerations. For instance, is the a correlation between risk factors and support infrastructure? What is the geographical distance between my redundant systems and what is the possibility of a single event knocking out both my systems? Understanding and process mapping all interdependency is paramount in any BCP endeavour.

Before you think it is too unlikely, recall the power outage in the summer of 2003 which knocked out power for the Ontario and North East USA. If you located redundant system for Toronto was in New York (or vice versa) thinking that locating in a different country was enough insulation and redundancy, this event showed that it sometimes isn't enough.

1 comment:

NSK, Inc. said...

Great post. Very intuitive