How to avoid double bubble during cloud migration

September 3, 2020

I’ve recently read a nice article by Intuit engineering team where they mentioned an interesting topic — double bubble. In terms of clouds it’s a state during cloud migration or digital transformation when you pay for both clouds, source and target sites. Let’s discuss how common it is.

A standard cloud migration project consists of a defined set of steps:

1. IT justification and proving a business need. The justification can be to avoid vendor lock-in, datacenter lease hardware or software license renewal cycle, scaling velocity, TCO or public cloud discount etc. Usually, when there is a clear justification, it’s not an issue to prove the business need.

2. Project scope definition. At this step a responsible person (project manager) and specific applications / resources are defined.

3. Cloud migration phase.

4. Post-mortem.

We’ll be talking only about p. 2 and 3 for now. It’s normal during the migration process there are times when you have resources running on both, source and target cloud. Some of the resources may be already migrated, some still can be in a queue or even not defined for migration and still running on a source cloud. But there is an interesting case when you migrate some resources (in some cases hundreds or thousands of machines) and need to pay for them twice.

Achieving effective cost management is all about optimization and tracking. With a set of policies, principles and processes in place, businesses can appease stakeholders and ensure their cloud spends remain under control. If your cloud spend bill comes as a surprise each time you receive it, you’re simply not taking advantage of all of the cost governance tools out there.

When companies define what to move to a cloud, they usually think in categories of applications, departments, or entire resources. Experienced migration consultants or vendors will always advise splitting resources into chunks of 30–50 VMs and migrating by phases. It reduces risk from one side; from the other, it helps blow off the bubble. Ideally, a single application should be in one chunk. In that case, you can migrate and test the whole granular part of your system and avoid data locality issues. Remember that cross-region and outbound cloud traffic is not for free and is pretty expensive. It’s better to think about that before you receive your first cloud bill 🙂

The root cause of the bubble is in ‘replication -> testing’ -> cutover scope. When you migrate some chunk, it takes some time to replicate data, define how you’ll grab increments (better in an automated way), test the chunk on a target cloud, and schedule a maintenance window to execute the cutover. And those 3 phases form the bubble. You store the data in block devices or object storage, run VMs on a target cloud, and pay for compute. In most cases, test migrations can run for 1–3 weeks (can be even more) until a team that owns the migrated application validates that all is fine with it on a target cloud and there are no performance degradation or other issues. And if you migrate multiple chunks in parallel, the bubble will grow.

So how to avoid the bubble…

1. First of all, identify your migration pace. Be very frank with yourself. This is exactly how to learn a new skill—very slow in the beginning and much better after a few iterations.

2. Define a queue of applications/chunks. Put it into your migration project calendar.

3. Figure out a way to replicate VMs without downtime and without constantly re-replicating them. Dozens of tools do that, saving you time and money as you pay less for storage.

4. Communicate with the teams owning the chunks or apps. Define acceptance criteria and the cutover process with them. The earlier they start thinking about that, the more prepared they will be when the time comes. Define time slots when they need to test the migration. This is the most important step as testing blows the bubble, and usually, teams don’t have any idea how to test applications, what the components are, or who owns individual machines.

5. Define the waiting period—how long you wait until you remove migrated VMs from the source environment. Don’t forget that you need a backup plan if something goes wrong with VMs and apps on a target cloud, and you still pay (directly or indirectly) for the machines on a source side.

6. Shut down the tests as soon as you see that the team is not prepared or it’s not their proper priority. If they are not motivated, they will just waste time (equal to money in public clouds) or, even worse, make a decision (accept or reject) based on some odd criteria and either proceed to use the machines on a source cloud or will figure out that were issues when source VMs would have been already removed. Reiterate with their manager or upper management to adjust both teams’ priorities.

7. If tests pass, proceed with the cutover. Remove all snapshots and test migrations for the migrated applications. Remember to start the clock for their waiting time.

8. Revisit your pace estimates and adjust your schedule.

You’ll have some sort of double bubble in any migration project, but you can control how big it will be by proper planning and communication with application and VM owners. Only brave people migrate the entire infrastructure in a single run; smart people plan and do it in chunks and phases.

Enter your email to be notified about new and relevant content.

Thank you for joining us!

We hope you'll find it usefull.

You can unsubscribe from these communications at any time. Privacy Policy

News & Reports

New release

OptScale AI – AI Governance Platform

Hystax has announced the release of OptScale AI, extending its OptScale FinOps platform with capabilities designed to help organizations manage and optimize AI usage across teams, models, and AI agents.

How-tos

FinOps, cloud cost optimization and security

Discover our best practices:

How to release Elastic IPs on Amazon EC2
Detect incorrectly stopped MS Azure VMs
Reduce your AWS bill by eliminating orphaned and unused disk snapshots
And much more deep insights

OptScale

Optimize RI/SP usage for ML/AI teams with OptScale

Find out how to:

see RI/SP coverage
get recommendations for optimal RI/SP usage
enhance RI/SP utilization by ML/AI teams with OptScale