Whitepaper 'FinOps and cost management for Kubernetes'
Please consider giving OptScale a Star on GitHub, it is 100% open source. It would increase its visibility to others and expedite product development. Thank you!
Ebook 'From FinOps to proven cloud cost management & optimization strategies'
OptScale — FinOps
FinOps overview
Cost optimization:
MS Azure
Google Cloud
Alibaba Cloud
OptScale — MLOps
ML/AI Profiling
ML/AI Optimization
Big Data Profiling
Acura — Cloud migration
Database replatforming
Migration to:
MS Azure
Google Cloud
Alibaba Cloud
Public Cloud
Migration from:
Acura — DR & cloud backup
Migration to:
MS Azure
Google Cloud
Alibaba Cloud

Understanding and leveraging disaster recovery and backup in public clouds

Should I implement a disaster recovery (DR) plan in public clouds, given their inherent resiliency, or would employing backups suffice? If I need a DR strategy, how can I determine the best one for my needs? Let’s delve into these questions…

What is the difference between disaster recovery and backup?

To begin with, it’s crucial to clarify the distinction between disaster recovery and backup since they are often incorrectly used interchangeably. The process of backup involves duplicating your data to a storage system (tape, NAS, cloud storage, etc.), which can then be used to restore lost or corrupted data due to events like data loss, ransomware attacks, or system failure, all by specified retention settings and policies. The most effective backup solutions efficiently store data and offer diverse recovery options, such as granular file/folder recovery or restoration to a database. Usually, recovery operations are expected to run in the same or similar environment. On the other hand, disaster recovery focuses on the rapid replication and restoration of applications and the entire infrastructure. Although storage efficiency matters, the priority is minimizing RPO (Recovery Point Objective – the time interval between replications) and RTO (Recovery Time Objective – the timeframe to restore the entire system post-disaster). Data is stored in a ready-to-launch format; while granular file/folder recovery is beneficial, it’s not a focal point. The best DR solution should allow restoration in a different cloud or region and provide seamless failback (reverting apps to post-disaster resolution) capabilities.


Having clarified these distinctions, what technology should you opt for? At the bare minimum, it’s safe to say that incorporating backup functionality is necessary. This provides a safety net for data restoration after failures or disasters. You can rely on native public cloud features for volume snapshots or utilize third-party vendors. However, it’s vital to be clear about the recovery duration, the data storage location, backup costs (not just license costs, if any, but also those associated with storage and data transfer), and how to restore data or Virtual Machines (VMs).

While backup is a convenient solution when you require the restoration of a specific piece of information, it might not be overly beneficial when your entire data center or availability zone encounters downtime.

Years back, public clouds were viewed skeptically by organizations, perceived as unstable and insecure. However, the current trend leans towards the opposite sentiment – many now regard public clouds as extremely robust, believing they maintain multiple data copies and guarantee near 100% uptime. While this holds true in some instances, the actual scenario lies somewhere in the middle.

Occasionally, public clouds do face issues – with their regions or specific services experiencing downtime, impacting their clientele – you included. You can monitor the status of public cloud services via various online resources (AWS, MS Azure, Google Cloud). For instance, if you’re running your Virtual Machines (VMs) in the AWS us-west region and it experiences connectivity issues, your applications and VMs may suffer. It’s easy to dismiss such occurrences as infrequent and therefore, not a serious concern. However, I can’t recall a single month devoid of cloud service issues affecting one of the Big Three. If an outage of up to 6 hours is tolerable for you, then public clouds are generally reliable as they typically restore their services within this duration.

Otherwise, you’ll need to make two crucial calculations:

  1. The cost to your business per hour of downtime.
  2. The time it would take to restore your entire infrastructure from a backup.

These considerations equip you to deal with a cloud outage but also help you prepare for events like ransomware attacks, human error (responsible for about 70% of all outages), or any hardware failure.

If the product of points 1 and 2 results in a financially unacceptable figure, it becomes clear that you require a Disaster Recovery (DR) solution.

What to keep in mind when choosing a disaster recovery strategy

Various Disaster Recovery (DR) solutions exist in the market today. Here are some important criteria to consider when choosing:

  1. If possible, use a different public cloud for failover. This will shield you from the impact of a widespread error and allow true workload mobility. In other words, you aren’t tied to a specific cloud and can utilize the best features from all of them.
  2. Conduct regular DR tests. It’s regrettable when companies neglect their DR strategies, and even worse when they’re paying for a service that doesn’t deliver as promised. Aim for a monthly test, at the very least.
  3. Strike a balance between using native cloud services and operating your own applications. While cloud services are user-friendly and convenient, they don’t readily facilitate failover.
  4. Evaluate several DR software options – some backup companies, cognizant of the backup/DR confusion, claim to offer comprehensive infrastructure failover capabilities. Put them to the test to ensure satisfaction.

Public clouds make excellent failover sites: they eliminate the need to set up a separate DR site complete with hardware, software licenses, and support – all of which may remain unused 80% of the time, simply waiting for a failover event. Alternatively, public clouds can be used to store snapshots, incurring compute costs only during a failover.

At the very least, it’s non-negotiable to have a backup solution in the present digital age. Reflect on the criticality of your business operations being disrupted until recovery from a backup is completed, and consider a DR solution. Public clouds are optimal for use as failover sites. Keep in mind that companies generally fall into one of two categories – a) those that have yet to implement backups, and b) those that already have backup systems in place.

👆🏻 Explore further into the realm of cloud-based disaster recovery and understand the advantages it offers over conventional disaster recovery methods

Enter your email to be notified about new and relevant content.

Thank you for joining us!

We hope you'll find it usefull

You can unsubscribe from these communications at any time. Privacy Policy

News & Reports

FinOps and MLOps

A full description of OptScale as a FinOps and MLOps open source platform to optimize cloud workload performance and infrastructure cost. Cloud cost optimization, VM rightsizing, PaaS instrumentation, S3 duplicate finder, RI/SP usage, anomaly detection, + AI developer tools for optimal cloud utilization.

FinOps, cloud cost optimization and security

Discover our best practices: 

  • How to release Elastic IPs on Amazon EC2
  • Detect incorrectly stopped MS Azure VMs
  • Reduce your AWS bill by eliminating orphaned and unused disk snapshots
  • And much more deep insights

Optimize RI/SP usage for ML/AI teams with OptScale

Find out how to:

  • see RI/SP coverage
  • get recommendations for optimal RI/SP usage
  • enhance RI/SP utilization by ML/AI teams with OptScale