Aside from ensuring each service is working properly, one of the most challenging parts of managing a cloud-based infrastructure is cost monitoring. There are countless services to keep track of—including storage, databases, and computation—each with their own complex pricing structure.
Monitoring cloud costs is quite different from other organizational costs in that it can be difficult to detect anomalies in real-time and accurately forecast monthly costs.
Many cloud providers such as AWS, Google Cloud, and Azure provide you with a daily cost report, but in most cases, this is not enough. For example, if someone is incorrectly querying a database for a few hours this can cause costs to skyrocket—and with a daily report, you wouldn’t be able to detect the spike until it’s too late.
While there are cloud management platforms that allow you to interpret costs, again these technologies often fall short as they don’t provide the granularity that’s required in real-time monitoring. Similarly, without a real-time alert to detect and resolve the anomaly, the potential to negatively impact the bottom line is significant.
As we’ll see from the examples below, only an AI-based monitoring solution can effectively monitor cloud costs. In particular, there are three layers to Anodot’s holistic cloud monitoring solution, these include:
- Cost monitoring: Instead of just providing generic cloud costs, one of the main advantages of AI-based monitoring is that costs are specific to the service, region, team, and instance type. When anomalies do occur, this level of granularity allows for a much faster time-to-resolution.
- Usage monitoring: The next layer consists of monitoring usage on an hourly basis. This means that if usage spikes, you don’t need to wait a full day to resolve the issue and can actively prevent cost increases.
- Cost forecasting: Finally, the AI-based solution can take in every single cloud-based metric, learn its normal behavior on its own, and create cost forecasts which allow for more effective budget planning and resource allocation.
Now that we’ve discussed the three layers of AI-based cloud cost monitoring, let’s review several real-world use cases.
Network Traffic Spikes
In the example below, we can see that the service is an AWS EC2 instance, which is being monitored on an hourly basis. As you can see, the service experienced a 1000+ percent increase in network traffic, from 292.5M to 5.73B over the course of three hours.
In this case, if the company was simply using a daily cost report this spike would have been missed and costs would have also skyrocketed as it’s likely that the network traffic would have stayed at this heightened level at least until the end of the day.
With the real-time alert sent to the appropriate team, which was paired with a root-cause analysis, you can see the anomaly was resolved promptly, ultimately resulting in cost-savings for the company.
Spike in Average Daily Bucket Size
The next use case is from an AWS S3 service on an hourly time frame. In this case, the first alert was sent regarding a spike in head request by bucket. As you may know, bucket sizes can go up and down frequently, but if you’re looking at the current bucket you often don’t actually know how much you’re using relative to normal levels.
The key difference in the example below is that, instead of simply looking at absolute values, Anodot’s anomaly detection was looking at the average daily bucket size. You can see that the spike in the bucket size is not larger than the typical spikes, but what is anomalous is the time of day of the spike. In this case, by looking at the average daily bucket size and monitoring on a shorter time frame, the company received a real-time alert and was able to resolve it before it incurred a significant cost.
Spike in Download Rates
A final example of cloud cost monitoring is monitoring the AWS CloudFront service, which was again being monitored on an hourly timescale.
In this case, there was an irregular spike in the rate of CloudFront bytes downloaded. Similar to other examples, if the company was only monitoring costs reactively at the end offo the day, this could have severely impacted the bottom line. By taking a proactive approach to cloud-cost monitoring with the use of AI and machine learning, the anomaly was quickly resolved and the company was able to save significant amount of otherwise wasted costs.
Summary: Cloud Cost Monitoring
As we’ve seen from these three examples, monitoring a cloud-based infrastructure requires a highly granular solution that can monitor 100 percent of the data in real time.
If this unexpected cloud activity isn’t tracked in real-time, it opens the door to runaway costs, which in most cases is entirely preventable.
AI models allow company’s to shift from being reactive to their cloud costs to a proactive approach by catching and alerting anomalies as they occur. Each alert is paired with a deep root-cause analysis so that incidents can be remediated as fast as possible.
By distilling billions of events into a single scored metric, IT teams are able to focus on what matters leave alert storms, false positives, and false negatives behind.