Whitepaper 'FinOps and cost management for Kubernetes'
OptScale is fully available as an open source solution under Apache 2.0 on GitHub
Ebook 'From FinOps to proven cloud cost management & optimization strategies'

Backup and Disaster Recovery
in Public Clouds

Should I build a DR strategy in public clouds, as they tend to be resilient, or use the backup? If yes, what is the best DR strategy for me? Let’s discuss…

First, let’s briefly discuss the difference between Disaster Recovery and Backup, as the majority see no difference or even misuse the terms. Backup is when you replicate some state of your data to some storage (tape, NAS, cloud storage, etc.) and then have a way to restore a missing item or items from the restore points. According to some retention settings and policies, it protects from data loss, ransomware, or system failure. The best backup solution is the one that stores data efficiently and provides various recovery options like granular file/folder recovery, restores to a database, etc. Usually, recovery is supposed to run in the same or similar environment. In the meantime, Disaster Recovery is about fast replication and recovery of the applications and the entire infrastructure; storage consumption is essential, but the main focus is on low RPO (Recovery Point Objective  – the time between replications) and RTO (Recovery Time Objective  – time to restore the entire system after a disaster). Data is stored in a ready-to-use format; granular file/folder recovery is an advantage but not a core aspect. An ideal DR solution should be able to restore on another cloud or region and have a smooth failback (returning apps when the disaster is fixed) functionality.

Now that we know the difference, what technology should be used? Well, I would say that at least you need to have a backup functionality. It’s better than nothing and will help you to restore after a failure or a disaster. You can use standard public cloud capabilities to take volume snapshots or use some vendor for that. Just be sure that you understand how long it takes to recover, where the data is stored, how much you need to pay for the backup (not just for licenses if it’s not for free but also for storage and data transfer), and how to restore data or VMs.

Backup is excellent when you need to restore some piece of your information but doesn’t help a lot when the whole data center or availability zone is down.

A few years ago, companies considered public clouds insecure and unstable. Now there is an opposite trend  – people tend to think that public clouds are highly reliable, store all the data in a few copies, and provide up to 100% uptime. Of course, sometimes it’s like that, but the truth is in between.

Public clouds do have issues from time to time  –  their regions or specific services may be down, affecting their customers  – including  you, as their customer. You can see the public cloud status here, here, or here. For example, let’s say you run your VMs in AWS in the us-west region. If it has issues due to connectivity, your apps and VMs are in trouble. You may say it happens too seldom, and you don’t need to worry about that. I don’t remember a month without any cloud services issues for one of the Big Three. If up to 6 hours of the outage is ok for you, you can skip it; public clouds do an excellent job restoring their services in this timeframe.

If not, you need to calculate two things:

  1. First, how much an hour of downtime costs you?
  2. How long will you restore your entire infrastructure from a backup?

This will not only help to be protected from a cloud outage but also be prepared for ransomware, human error (about 70% of all outages happen because of this), or any hardware failure.

If multiplying p.1 and p.2 gives you some unacceptable numbers, you need a DR solution.


There are multiple DR solutions available on the market. I suggest you to keep in mind the following criteria:

  1. Use a different public cloud for a failover, if possible. This will prevent you from being affected by a global error and give you a true workloads mobility. That means that you are not bound to any specific cloud and can use the best from all of them.
  2. Run regular DR tests. It’s a pity to see companies not leveraging a DR strategy, and even more disappointing to see people paying for something that doesn’t work. Run the test once a month, at least.
  3. Find a balance between native cloud services and running applications on your own. Cloud services are convenient and easy to use, but there is no simple way to failover them.
  4. Benchmark multiple DR software  –  some backup companies are aware about backup/DR confusion and pretend they can run full infrastructure failover. Test and see whether you are satisfied.


Public clouds are ideal to be used as a failover site: you don’t need to build a separate DR site with hardware, software licenses and support it, taking into account that 80% of time it will stay idle being prepared for a failover. Otherwise, public clouds can be used to store snapshots and you don’t need to pay for compute until you run a failover.

Keep in mind that, at least, a backup solution is a ‘must have’ nowadays. Consider how critical for your business it is to be down until you recover from a backup and think about a DR solution, public clouds are the best option to use as a failover site. Remember that there are two types of companies  –   a) which do not backup yet and b) which already backup.

Please, feel free to read my recent article ‘Top three public cloud services used’ here.

News & Reports

Realize your company’s FinOps adoption potential

A full description of Hystax OptScale as a FinOps enablement platform – product features, benefits and functionality.

Public cloud usage report

Great critical insights on hybrid cloud management benchmarks, trends and best practices.

Optimize your cloud usage with Hystax OptScale

Discover how to analyze cloud metrics  and get cloud optimization recommendations based on your usage.