Should I build a DR strategy in public clouds, as they tend to be resilient, or use the backup? If so, what is the best DR strategy for me? Let’s discuss…
First, let’s briefly discuss the difference between Disaster Recovery and Backup, as the majority see no difference or even misuse the terms. Backup is when you replicate some state of your data to some storage (tape, NAS, cloud storage, etc.) and then have a way to restore a missing item or items from the restore points. According to some retention settings and policies, it protects from data loss, ransomware, or system failure. The best backup solution is the one that stores data efficiently and provides various recovery options like granular file/folder recovery, restores to a database, etc. Usually, recovery is supposed to run in the same or a similar environment. In the meantime, Disaster Recovery is about fast replication and recovery of the applications and the entire infrastructure; storage consumption is essential, but the main focus is on low RPO (Recovery Point Objective – the time between replications) and RTO (Recovery Time Objective – time to restore the entire system after a disaster). Data is stored in a ready-to-use format; granular file/folder recovery is an advantage, but not a core aspect. An ideal DR solution should be able to restore to another cloud or region and have a smooth failback (returning apps when the disaster is fixed) functionality.
Now that we know the difference, what technology should be used? Well, I would say that you need to have a backup functionality. It’s better than nothing and will help you recover after a failure or a disaster. You can use standard public cloud capabilities to take volume snapshots or use a vendor for that. Just be sure that you understand how long it takes to recover, where the data is stored, how much you need to pay for the backup (not just for licenses if it’s not for free, but also for storage and data transfer), and how to restore data or VMs.
Backup is excellent when you need to restore some piece of your information, but it doesn’t help a lot when the whole data center or availability zone is down.
A few years ago, companies considered public clouds insecure and unstable. Now there is an opposite trend – people tend to think that public clouds are highly reliable, store all the data in a few copies, and provide up to 100% uptime. Of course, sometimes it’s like that, but the truth is in between.
Public clouds do have issues from time to time – their regions or specific services may be down, affecting their customers, including you, as their customer. You can see the public cloud status of AWS, Azure, or GCP. For example, let’s say you run your VMs in AWS in the US-West region. If it has issues due to connectivity, your apps and VMs are in trouble. You may say it happens too seldom, and you don’t need to worry about that. I don’t remember a month without any cloud services issues for one of the Big Three. If up to 6 hours of the outage is ok for you, you can skip it; public clouds do an excellent job of restoring their services in this timeframe.
If not, you need to calculate two things:
This will help protect against a cloud outage and prepare for ransomware, human error (about 70% of all outages happen because of this), or any hardware failure.
If multiplying p.1 and p.2 gives you some unacceptable numbers, you need a DR solution.
There are multiple DR solutions available on the market. I suggest you keep in mind the following criteria:
Public clouds are ideal for use as a failover site: You don’t need to build a separate DR site with hardware and software licenses and support it, considering that 80% of the time it will stay idle being prepared for a failover. Otherwise, public clouds can be used to store snapshots, and you don’t need to pay for compute until you run a failover.
Keep in mind that, at least, a backup solution is a ‘must-have’ nowadays. Consider how critical it is for your business to be down until you recover from a backup, and think about a DR solution. Public clouds are the best option for a failover site. Remember that there are two types of companies: a) those that have not backed up yet, and b) those that have already backed up.
Please, feel free to read my recent article ‘Top three public cloud services used‘
A full description of Hystax OptScale as a FinOps enablement platform – product features, benefits and functionality.
Great critical insights on hybrid cloud management benchmarks, trends and best practices.
1250 Borregas Ave, Sunnyvale, CA 94089, USA | [email protected]