As many companies have had to undergo drastic cost-cutting measures this year, the importance of reviewing and monitoring variable expenses has never been more critical.

In the digital age, one such variable expense that nearly every company incurs is its cloud costs. Similar to other business metrics, cloud costs are notoriously difficult to monitor using traditional tools due to their inherent complexity.

This guide explores what cloud cost monitoring is, how machine learning can drive this initiative, as well as how Anodot is using its own ML-based business monitoring to cut a projected $360K in annual cloud costs.

What is Cloud Cost Monitoring?

As you may know, monitoring and optimizing cloud costs have become increasingly complex due to the growing number of services, regions, and instance types.

What makes monitoring cloud costs even more challenging is that traditional tools offered by cloud providers typically have a delay in the time it takes to report on costs. What this means is that if there are glitches in your cloud environment, the resulting surprise expenses can significantly impact the bottom line.

In addition to this data lag, cloud costs are difficult to monitor for three main reasons:

Context: Each metric that makes up cloud expenses derives significance from its own context, which means it cannot be evaluated in absolute terms.

Topology: The business topology of cloud costs is also unknown, which makes the relationships and correlations between metrics highly complex and dynamic.

Volatility: Finally, cloud-based metrics typically have irregular sampling, which means that there may be delays of several minutes or hours between metrics being used at their normal capacity.

To deal with the challenges of cloud cost monitoring, machine learning offers the ability to deal with these complex metrics and their dynamic behavior.

Machine Learning for Cloud Cost Monitoring

As described in this article, there are three layers to AI-based cloud cost monitoring:

Cost monitoring: Algorithms are used to track cloud costs at each individual service, region, team, and instance type. This means that when anomalies do occur, you can dive into the dimensions associated with the root cause.

Usage monitoring: Next, cloud usage is monitored on an hourly basis. As mentioned, traditional cloud cost monitoring tools often have a data lag of 8 to 48 hours, which means you can be more proactive in preventing runaway usage and costs.

Cost forecasting: Finally, since machine learning can monitor each individual metric and learn new normal behavior as it evolves, this allows the solution to provide more accurate cost forecasts and ultimately improved budget planning and resource allocation.

Now that we’ve discussed the three layers of monitoring, let’s review how Anodot’s R&D team uses their cloud cost monitoring solution to cut a projected $360K from the company’s monthly bill.

Case Study: Using Autonomous Business Monitoring to Optimize Cloud Costs

As the global economy witnessed drastic changes for nearly every business, companies have had to go into what investor Elad Gil has coined as “Startup Offense and Defense in the Recession”. In other words, we’ve had to do everything we can to improve on the sales side and also cut costs wherever possible.

As discussed, one of the largest variable expenses that companies have today is their cloud costs. With this in mind, we chose to make optimizing and reducing cloud costs one of the pillars of our defensive strategy. In particular, we had the following three objectives in mind:

The process of optimizing cloud costs needed to take less than one month with three full-time engineers working on it.
The new system had to fit within our existing code base without any major changes required.
The change needed to persist over the long run, so that our expenses would continue to be reduced in the following months.

With these objectives clearly laid out, three of our engineers set out to solve this challenge by making changes in four key verticals:

Tag AWS Resources: The first step was to pinpoint exactly what costs every AWS service was incurring by tagging each resource with the associated instance type, components, and processes. This allowed us to go through each resource and determine where we can store, compress, and process data more efficiently.

Minimize Workloads in R&D: After pinpointing costs with each resource, we realized that we could optimize our workload size in the development cycle, using Feature Branches so that each branch has its own environment. This allowed us to minimize the operation size and remove non-essential services from development clusters, which ultimately led to a roughly 50 percent reduction in cost savings in this particular cycle.

Efficient Cloud Usage Planning: Another benefit of tagging AWS resources is that we were able to predict usage more efficiently. This not only increased our financial planning capabilities but also allowed us to reserve and switch between instance types more efficiently.

Cloud Cost Monitoring in Real Time: Finally, since we knew we wanted these cloud cost savings to persist in the coming months we implemented a cost monitoring solution independently monitored by machine learning. As mentioned, this gave us hourly updates on cloud usage instead of the previous 8- to 48-hour delay we had previously. Additionally, real-time alerts on anomalous cloud usage contained related instances and influencing events. Providing this correlation analysis, enabled our engineers to zero in on the root cause and remediate incidents much faster.

By tackling cloud costs in these four verticals we were able to resolve any incidents before runaway costs impacted our bottom line.

Summary: Cloud Cost Monitoring

When looking to better manage operational costs, cloud usage is an area ripe for optimization. To initiate this effort, teams should consider implementing machine learning for real-time cost monitoring, usage monitoring, and cost forecasting. What’s more, ML-based solutions offer the granularity, speed, and accuracy to detect and resolve incidents before any serious financial damage has been done.

In order to adopt a more proactive cost monitoring strategy this year, Anodot’s R&D team was able to use our own technology to cut $360k from our annual expenses. In particular, Anodot accomplished this by tagging cloud resources, reducing R&D workload sizes, planning for usage more efficiently, and monitoring our cloud costs in real time.

 

Written by Anodot

Anodot is the leader in Autonomous Business Monitoring. Data-driven companies use Anodot's machine learning platform to detect business incidents in real time, helping slash time to detection by as much as 80 percent and reduce alert noise by as much as 95 percent. Thus far, Anodot has helped customers reclaim millions in time and revenue.

alert decoration image

What can Autonomous Business Monitoring do for you?

Book a discovery call. It’s short and painless, and a quick way to see what we've done for businesses like yours.