How to prepare your business to an IT disaster

January 17, 2020

What is an IT disaster?

Most businesses now rely heavily on IT infrastructure to quickly create, process, and communicate volumes of data. That’s why even a few hours of business disruption, whether it’s a system failure or a mere slowdown, can throw your business into a tailspin. Such disruptions are often the result of an IT disaster, which in turn may be caused by a variety of factors, from natural disasters to human errors to cyber attacks.

What is an IT disaster recovery plan and why is it important?

The best way to minimize the negative effects of an IT disaster is to have a sound disaster recovery plan in place. An IT disaster recovery plan, or a DRP, is a set of step-by-step instructions designed to minimize the impact of natural or human-induced IT disasters. With it in hand, you’ll have the following benefits:

Minimal to no interruption in business operations
Quick recovery of mission-critical workloads
Well prepared IT personnel with a defined action plan
Predictable and limited extent of damage

To be able to put a disaster recovery plan together, you’ll need to take specific actions. Let’s take a closer look.

Step 1. Outline processes and resources that are crucial to your business

First, prioritize your business processes and figure out which of them must be restored right after disruption at all costs. Then, define the applications your most important processes depend on. Analyze the acceptable outage duration for each application and assign your staff members who will be responsible for recovery. Next, take a look at your business processes and resources from your company’s standpoint by conducting a Business Impact Analysis. It will help you find out how various aspects of your business will be impacted by downtime. This analysis should encompass legal and compliance risks, as well as the cost of downtime and revenue loss. The step should result in a list sorted by the importance of your assets.

Step 2. Determine how long your business can afford to be down

In this step, you’re going to deal with such terms as RPO, RTO, and MTD. RPO, or the recovery point objective, is basically the maximum acceptable amount of time between your data backups. RTO stands for the recovery time objective, which is the maximum amount of time needed for the full restoration of your system after a disruption. MTD, or the maximum tolerable downtime, is an overall metric that shows how much time your business can afford to be offline without being jeopardized. Determine these metrics carefully, then move on to the next step.

Step 3. Find a cost-effective disaster recovery solution that suits you best

If you ask Google about disaster recovery, you’ll see dozens of service providers with hundreds of options — an embarrassment of riches. To avoid getting lost in this abundance, remember that your goal is to get your data protected while keeping costs reasonable. Use the results of the research you’ve conducted in the previous steps to define the right set of features and properties for your case.

Usually, an effective disaster recovery solution includes the following essential features:

Automatic backup. The software you choose should comply with your RPO, so that your data is backed up automatically within the required intervals.
Deduplication. This feature allows you to optimize the amount of data by getting rid of duplicates, thus minimizing your storage costs.
Replication. An often overlooked option, which is, in fact, a cornerstone of almost any disaster recovery plan. Replication enables you to have exact copies of your data on several servers and is perfect for quickly restoring access to mission-critical systems after a disaster, ensuring shorter RTOs.
Disaster Recovery Plan. Any DR solution should include a disaster recovery plan. Most solution providers offer DR plan development based on analysis of your company and its mission-critical workloads and assets.
Orchestration. Being a form of automation, orchestration ensures that critical systems and business operations come online during an outage.
Automated failback. This feature allows the systems to return to their original primary site as soon as its functionality is restored.
Snapshot download. If you want to be able to return a part, or even your entire environment to an original or some previous state, you might want to consider opting for this feature.
Monitoring. A robust disaster recovery solution should enable you to monitor the current state of your IT assets and eventually prevent any disasters in advance.

The full list is not limited to these features only — advanced DR solutions are complicated and entail more aspects. With so many offerings within the DR solutions niche, it will be best to ask your potential DR providers to help you make the best choice, analyze the technologies they offer and assess their expertise in the process.

Step 4. Test, adjust, and then keep your disaster recovery plan up-to-date

To make your DR plan work, you need to test it using the test-to-fail approach. In other words, consider scenarios where your plan is going to fail — this way, you’ll be able to find its major flaws and work them out. Also, this will help keep you and your key personnel in shape in case of a real emergency. In addition, check your software, hardware and internal processes.

There are different ways to test your DR plan — from cheap and simple walkthroughs to full-interruption tests, which are the most thorough and the most disruptive tests that provide real-world conditions.

Given that any IT infrastructure is continually evolving, you should ensure that all of the changes made are reflected in your disaster recovery plan. Keep your plan up-to-date and be sure that your staff members are aware of its latest updates.

What else should you take into consideration to prepare your business for an IT disaster?

Remember that coming up with a disaster recovery strategy might be not enough to keep you safe from an IT disaster. In fact, it’s just a part of a business continuity plan, which is a more strategic approach that will help you bounce back after a disaster.

Apart from DR and BC plans, you might want to consider another approach,IT resilience, which is all about IT disaster prevention. It is intended to protect all of the critical business workloads from failure before any issues occur. Although prevention is always better than cure, no system can be fully resilient. So, you’ll always need disaster recovery and business continuity plans to keep your mission-critical workloads up and running.

Read more about building robust BCDR strategy in our previous article.

Enter your email to be notified about new and relevant content.

Thank you for joining us!

We hope you'll find it usefull.

You can unsubscribe from these communications at any time. Privacy Policy

News & Reports

Slide deck

FinOps and MLOps

A full description of OptScale as a FinOps and MLOps open source platform to optimize cloud workload performance and infrastructure cost. Cloud cost optimization, VM rightsizing, PaaS instrumentation, S3 duplicate finder, RI/SP usage, anomaly detection, + AI developer tools for optimal cloud utilization.

How-tos

FinOps, cloud cost optimization and security

Discover our best practices:

How to release Elastic IPs on Amazon EC2
Detect incorrectly stopped MS Azure VMs
Reduce your AWS bill by eliminating orphaned and unused disk snapshots
And much more deep insights

OptScale

Optimize RI/SP usage for ML/AI teams with OptScale

Find out how to:

see RI/SP coverage
get recommendations for optimal RI/SP usage
enhance RI/SP utilization by ML/AI teams with OptScale