Nick: Hi Everybody, we’ll be discussing “How to significantly reduce costs and optimize cloud spend” today.
Nick: I’m Nick Smirnov, CEO at Hystax, and Max Bozhenko, CTO and Engineering Manager who’s been running Engineering teams for more than 10 years.
Max: Hi Nick, hi everybody. Today we are going to talk about cloud consumption and how to take the cloud bill under control and also reduce it.
Nick: Thank you, Max. But before we start a few words about Hystax:
1. Hystax is a global leader in cloud migration and cross-cloud and hybrid cloud Disaster Recovery
2. We have customers on 5 continents and employees in 4 countries
3. And now building a Cloud Platform for Cost Governance and Collaborative
Nick: 1. Ok, Max, can you please talk to us about what a cloud bill usually consists of?
Max: Yes, sure. When you receive a bill from the cloud and then go to the details, you see a list of cloud services, such as compute, storage or networking, which you used during the billing period.
You might also see positions related to third-party services, such as costs for software from the cloud marketplace. Some of the bill items are pretty straightforward – when you pay for compute resources and storage, you can more or less predict the range of your spend according to instance flavors and storage size. Some others need more investigation in order to become transparent – data transfer charges for example.
Long story short, it varies from company to company and depends on use cases: some may see compute and storage to be the main cost driver, other companies may extensively use lambdas, voice recognition or IoT services. It might sound obvious but you need to grab some of your recent invoices, select your top three services, and work with them to figure out the best way to optimize.
There are dozens of different cloud services so there is no secret sauce to doing it manually but, in general, as I said previously, compute and storage resources are the top two lines in cloud bills.
Nick: 2. Ok, so it’s not just about compute resources but there are other costs which may not be so obvious. What is the standard ratio of cloud services in a cloud bill?
Max: It depends on the use cases. If we speak about enterprises they still heavily utilize compute, storage and networking. For compute, it’s primarily about VMs but containers are growing very fast. Enterprises also use cloud native services but it’s worth mentioning one interesting thing – they prefer to run big data, AI, IoT and other use cases based on third-party solutions and not to use cloud native services. I think that the explanation is in vendor lock-in and internal policies to avoid it.
For storage they use both block devices and object storage. Since the majority of enterprises use multi or hybrid models, they have intensive traffic between sites.
Startups are born with new methodologies, they are free of legacy apps and code and they don’t use too many legacy resources but instead they extensively utilize containers and cloud-native services. Recently we conducted a survey, and the majority of our respondents said that on AWS they use RDS, Lambda and Fargate as their top three. Even at Hystax all the applications and services are containerized and we use EC2 only for kubernetes nodes.
Nick: 3. Great. So it mostly depends on type of workloads and use cases. Ok, and what approaches significantly decrease cloud expenses in your opinion?
Max: The first step in expense reduction is to understand what are the sources of those expenses, how they relate to your needs and which workloads don’t need to be used all the time, or are being used inefficiently. In other words, list the top three expenses and focus on them.
Work with your team to review your resources and check whether all of them are still needed. Create clean up practices and implement them once a week, or once in a sprint. Only an established process can help you to reach stable results. Otherwise, if these are set aside and not attended, the same issues will reappear in 3-4 weeks.
Cloud optimization tools can also help you to break down costs and build practice of cloud governance and control for your team. These tools take information about your cloud usage from different cloud reports and allow you to analyse it more effectively , while advising on how to spend more efficiently. Of course, these cloud optimization tools also cost some money, usually it is a percentage of your annual cloud bill.
Nick: 4. So, just to summarize: you suggest using cloud optimization tools and strict cloud cost governance and control – we’ll talk about that later. But there are some actions IT managers can do on their own to reduce costs. Imagine for a second that I am an IT Manager with a cloud budget. What should I do in order to optimize compute resources?
Max: There are two basic things that you can do to get an immediate effect on your compute expenses.
First – in many cases, VM resources are underutilized – they have too much flavor relative to their actual consumption. For example, an instance with 8 vCPUs and 16 gigabytes of RAM utilized only for ten to fifteen percent of its compute potential. You can save several hundreds of dollars if you just reflavor it to a more respectful one – and this is just for a single instance, although I’m sure that you have more than one such instance. Second – if you can forecast your compute requirements for the upcoming months or years – you can save up to 60% of VM costs by switching to reserved instances or by using savings plans. Most of the public clouds offer you significant discounts for paying upfront.
Also, you need to know who is using and for what purpose they are using the compute resources. Usually it’s done by resource tagging, where you define a cloud resource tagging policy which your team should follow. Those tags are used for different breakdowns, project resource usage reports and clean up activities.
There are interesting cases when just a simple task to tag resources can already save you some money – people start the resource inventory process and realize that some of them are no longer being used. Some resources may be even left from engineers who are no longer with a company.
Nick: 5. Great, so it’s re-flavoring, or using reserved instances that can save up to 60% of VM costs and tagging-tagging-tagging. But if we do it once, how do we ensure that in one month or so, we won’t have the same problems?
Max: There is only one answer to this question – you should do it regularly or assign it to your subordinates. People have a tendency to forget about things that are not in their scope, so if you don’t remind them, they will focus on their current tasks. At the end of the day, it’s not their responsibility to save money, but yours.
And of course, this obviously can’t be a manual activity – software tools always perform more efficiently.
Nick: 6. I see, so it’s tricky and takes a lot of time to monitor without automation. Ok, let’s say we have all the tags set, how can this save us some cash?
Max: Once tagging policies are set and your team follows them, you can create clean up scripts that will periodically list and remove all untagged resources in your account. Also, it’s a good psychological point for your team – if they see their resources they start to think whether they need them. Tagging brings some order and mind to their provisioning habits.
We had the same issues at Hystax before we started using tags for all of the resources. We used AWS for our R&D resources and had several issues when somebody forgot to remove a cluster or shutdown a VM which was not needed anymore. As a result, we got a bill 4-5 times more than we expected. Basically that was one of the reasons that motivated us to explore this topic more and build a product around.
Nick: 7. Isn’t it too tough to remove all the untagged resources? What if somebody just forgot to add a tag?
Max: It might look too tough and may cause some hiccups at first, but if you look at this from a different perspective – you will see that having untagged resources basically means that you don’t know anything about them except the fact that they just inflate your cloud bill. You don’t want to pay for something you don’t know about and that you don’t need, right? So, clean-up scripts might not be an ideal cost saving technique, but it is better than nothing.
Nick: 8. Great. Got it. So a clean up script is better than nothing. Is there any way to save some money on storage?
Max: The same goes for storage – both object and block storage need to be tagged. In addition, you can extend your clean up script with unused volume and snapshot detection in cloud block storage to clean these up as well. Many don’t think about that, but snapshots, volumes and images can cost you a lot. Engineers are more focused on running VMs, and therefore they may forget to remove unnecessary snapshots, unattached volumes and redundant images. Tagging should help here as well.
Nick: 9. Thank you Max. Did I hear you correct that we can remove unused volumes and snapshots as a part of a clean up script?
Max: Yes, correct. Block storage is quite expensive and lost unused volumes may make up a significant piece of your cloud bill. I suggest you make it a golden rule – unassigned and untagged volume should be removed.
Nick: 10. Cool, what should be done with object storage? Is there any way to optimize here?
Max: Absolutely. In addition to tagging techniques – you can go for storage tier optimizations, when you use cold storage for data that is not accessed frequently and also for geographical placement optimization which will also affect your spend on the network traffic.
I strongly recommend creating a policy on how to use stored resources – one bucket per engineer should be a good starting point. Later you can just review their sizes and ask whether all the data is needed. It also makes sense to consider whether you should move some data to a cold storage – public clouds do provide an option for fairly cheap object storage for data that is stored for a long period of time and that isn’t used frequently. As a trade-off, and, of course, there is always a trade-off, you’ll get a longer read access time for such data due to the nature of the storage.
Nick: 11. Thank you. I heard that network traffic can be a real pain and it’s not an obvious expense till you get your first bill. What are the best practices here?
Max: Usually, all network traffic within one region or availability zone of the cloud is free. The same goes for incoming traffic. But, any cross-region or outgoing traffic should be paid for, and the amount is hard to estimate. Please be cautious, network costs can be very high.
To avoid that, you need to think about cloud regions and data locality of your compute and storage resources to minimize cross-regional and external traffic. You need to collaborate with your architects to identify data locality issues before you scale in a cloud. If you can, simply place all application components in one region.
Also, don’t forget about CDN services. Certainly, it depends on your applications and use cases but in the majority of cases your data should be in closer proximity to your customers than to your engineers.
Nick: 12. But do public clouds help you with any cost control and optimization?
Max: Public clouds follow the needs of their customers and offer integrated services for cost reports and analysis. But usually they require that you build the process of expense control and tagging scheme on your own or use consulting services.
They also don’t include optimization scenarios, so you need to do this on your own as well. So, building proper expenses control using only cost exploration tools from cloud vendors is quite complex and not ready-to-use.
Nick: 13. Cool, everything that we discussed today sounds reasonable for a single cloud. But what if I use a multicloud or a hybrid cloud strategy?
Max: It becomes much more complex if you go to multicloud. You will need to use different cloud billing consoles with very different feature sets and workflow. And even more complex if you use hybrid cloud – you will also need tooling for on-premises workloads which will integrate with public clouds billing. So, your time as a manager is doubled, at a minimum, to both establish the optimization technique and to execute it.
I strongly recommend focusing on one cloud first, establishing a process, achieving some results, and only after that trying to populate this on the other clouds that you use.
Nick: 14. Previously you said that just optimization is not enough and there should be cost governance and control. Please talk us through that.
Max: Ultimately, all of these tagging techniques serve as a cost governance tactic. Ideally, if your resources are properly tagged, you can map every cent in your cloud bill to a specific business unit, team or project within your organization. It helps you to budget your requirements properly and to control cloud expenses on every level of your organization.
All the scenarios we discussed today should work immediately, but their benefits would significantly increase if performed automatically, and in real-time. It’s the same as learning something new: if you don’t practice regularly the result would be less significant.
Nick: 15. Ok, Max, let’s summarize the steps IT Managers can take now to optimize their cloud expenses.
Max: I would summarize with the following list of action items:
1. Identify the top three items in your cloud bill
2. Tag resources and implement clean-up techniques
3. Identify and review data locality practices
4. Consider using automation for cost governance and optimization
Nick: 16. Great. Max, can you please tell us whether the platform that we are currently building covers the scenarios described.
Max: Yes, sure. We are building a hybrid cloud management platform, which is focused on cost governance, control and optimization. It provides you with immediate optimization scenarios and enables a ready-to-use flow to map your cloud expenses to your organizational structure across all cloud platforms used. In addition, you receive sophisticated provisioning capabilities for collaborative R&D, such as basic provisioning, templates and versioning.