Whitepaper 'FinOps and cost management for Kubernetes'
OptScale is fully available as an open source solution under Apache 2.0 on GitHub
Ebook 'From FinOps to proven cloud cost management & optimization strategies'

Backup and Disaster Recovery
in Public Clouds

Should I build a DR strategy in public clouds as they tend to be resilient or just use backup? If yes, what is the best DR strategy for me? Let’s discuss…

First of all, let’s discuss in brief the difference between Disaster Recovery and Backup as the majority see no difference or even use the terms incorrectly. Backup is a process when you replicate some state of your data to some storage (tape, NAS, cloud storage etc.) and then have a way to restore a missing item or items from the restore points. It is used to protect from data loss, ransomware or system failure according to some retention settings and policies. The best backup solution is the one that stores data efficiently and provides various recovery options like granular file/folder recovery, restore to a database, etc. Usually, recovery is supposed to run in the same or similar environment. In the meantime, Disaster Recovery is about fast replication and recovery of the applications and the entire infrastructure; storage consumption is important but the main focus is on low RPO (Recovery Point Objective  – time between replications) and RTO (Recovery Time Objective  – time to restore the entire system after a disaster). Data is stored in a ready-to-use format, granular file/folder recovery is an advantage but not a core aspect. An ideal DR solution should be capable of restoring on another cloud or region and have a smooth failback (returning apps back when the disaster is fixed) functionality.

Now, when we know the difference, what technology should be used? Well, I would say that at least you need to have a backup functionality. It’s better than nothing and will help you to restore after a failure or a disaster. You can use standard public cloud capabilities to take volume snapshots or use some vendor for that. Just be sure that you understand how long it takes to recover and where the data is stored, how much you need to pay for the backup (not just for licenses if it’s not for free but also for storage and data transfer) and how to restore data or VMs.

Backup is nice when you need to restore some piece of your information but doesn’t help a lot when the whole datacenter or availability zone is down.

A few years ago companies considered public clouds as something insecure and unstable. Now there is an opposite trend  – people tend to think that public clouds are extremely reliable, store all the data in a few copies and provide up to 100% uptime. In some cases it’s like that, but the truth is in between.

Public clouds do have issues from time to time  –  their regions or specific services may be down, affecting their customers  – including  you, as their customer. You can see public cloud status here, here or here. Let’s say you run your VMs in AWS in the us-west region. If it has issues due to connectivity, your apps and VMs are in trouble. You may say that it happens too seldom, and you don’t need to worry about that. Well, I don’t remember a month without any cloud services issues for one of the Big Three. If up to 6 hours of outage is ok for you, you can skip it; public clouds do a good job to restore their services in this timeframe.

If not, you need to calculate two things:

  1. How much an hour of downtime costs you.
  2. How long you will restore your entire infrastructure from a backup.

This will not only help to be protected from a cloud outage but also to be prepared to ransomware, human error (about 70% of all outages happen because of this) or any hardware failure.

If multiplying p.1 and p.2 gives you some unacceptable number, you need to have a DR solution.


There are multiple DR solutions available on the market. I suggest you to keep in mind the following criteria:

  1. Use a different public cloud for a failover, if possible. This will prevent you from being affected by a global error and give you a true workloads mobility. That means that you are not bound to any specific cloud and can use the best from all of them.
  2. Run regular DR tests. It’s a pity to see companies not leveraging a DR strategy, and even more disappointing to see people paying for something that doesn’t work. Run the test once a month, at least.
  3. Find a balance between native cloud services and running applications on your own. Cloud services are convenient and easy to use, but there is no simple way to failover them.
  4. Benchmark multiple DR software  –  some backup companies are aware about backup/DR confusion and pretend they can run full infrastructure failover. Test and see whether you are satisfied.


Public clouds are ideal to be used as a failover site: you don’t need to build a separate DR site with hardware, software licenses and support it, taking into account that 80% of time it will stay idle being prepared for a failover. Otherwise, public clouds can be used to store snapshots and you don’t need to pay for compute until you run a failover.

Keep in mind that, at least, a backup solution is a ‘must have’ nowadays. Consider how critical for your business it is to be down until you recover from a backup and think about a DR solution, public clouds are the best option to use as a failover site. Remember that there are two types of companies  –   a) which do not backup yet and b) which already backup.

Please, feel free to read my recent article ‘Top three public cloud services used’ here.

News & Reports

Realize your company’s FinOps adoption potential

A full description of Hystax OptScale as a FinOps enablement platform – product features, benefits and functionality.

Public cloud usage report

Great critical insights on hybrid cloud management benchmarks, trends and best practices.

Optimize your cloud usage with Hystax OptScale

Discover how to analyze cloud metrics  and get cloud optimization recommendations based on your usage.