As the business world continues to integrate AI and machine learning to better manage big data processes, one area that arguably has benefited the most is business monitoring. From IT management to business intelligence, the last few years have seen a drastic shift in how companies are monitoring their data.
In particular, traditional IT monitoring has put a significant burden on DevOps teams as they needed to constantly update and maintain systems manually. This often involves spending countless hours digging through log records to try and resolve an incident. When the team does find the root cause, they’d often manually update a static threshold and hope this doesn’t happen again, only to find new anomalous incidents occur the next week — and the cycle continues.
Many companies have tried to feed business data, such as business activity, into IT or APM monitoring solutions, only to discover the data is too dynamic for static thresholds. Some companies choose to depend on analyze BI dashboards to find issues, but that leaves anomaly detection to chance – let’s face it, our eyes can’t be everywhere, all the time.
As companies have tried to solve these challenges, AI is driving a future where monitoring business data is monitored autonomously.
Before we look at how AI, specifically machine learning, is transforming business monitoring, let’s first review how this field has evolved over the last decade and its origins in IT and application monitoring.
The Traditional Monitoring Paradigm
The starting point began with IT monitoring, an area where machine learning was embraced to confront many of the same challenges now present in business monitoring.
The way organizations monitored their IT processes largely depended on human involvement. Teams need to create static thresholds and alerting rules for the incidents that most impact their back-end processes.
Alerting was actually one of the few things in IT monitoring that was automated; ironically, that “enhancement” created quite a significant amount of work for users. As one adtech company realized, setting static alerts either triggered too many false positives.
Distinguishing which alerts were slight deviations and which were significant anomalies was an arduous task.
Other than alerts, the rest of the IT monitoring workflow has required DevOps teams to manually perform tasks like troubleshooting, root cause analysis and resolving incidents.
Service Mesh Monitoring
One area where DevOps has outpaced traditional monitoring is the service mesh.
With the rise of rapid software releases and the mindset of “move fast and break things”, the service oriented architecture (SOA) has taken over the software development landscape. Specifically, microservices have become an increasingly popular approach to reduce risk, deploy faster and scale services.
While microservices have theoretically simplified deployment, monitoring these services at scale poses a significant challenge. To solve this, service mesh technologies emerged as an additional layer on top of microservices, mainly to increase observability.
What many companies have realized, however, is that observability within a large service mesh isn’t practical, and certainly isn’t autonomous. In particular, gaining visibility requires significant time and resources to visually monitor clusters and still fails to actually detect issues in real time.
As a result, many forward-thinking companies have turned to AI and machine learning as a more efficient method for scaling IT monitoring while reducing the manual work of DevOps teams.
AI & Machine Learning: The Driving Force of Autonomous Monitoring
AI/ML is the driving force in the new age of autonomous business monitoring. Before looking at how this technology is currently being applied in the real-world, let’s first review how it actually works.
What is Unsupervised Learning?
One of the main technologies powering autonomous business monitoring is a particular branch of machine learning called unsupervised learning.
We won’t cover this topic extensively in this article, but the key point about unsupervised learning is that it can be fed data without any pre-existing labels or without any human intervention, and still identify patterns in the data that may not be apparent to a human observer.
A few key advantages of unsupervised learning for monitoring:
- Unsupervised anomaly detection: These AI-based algorithms can handle any number of data metrics and automatically learn each one’s normal behavior on its own. As soon as an anomaly occurs, the team is alerted to the incident in real time.
- Auto correlation and detection: Not only can the system learn each data metric individually, it can also correlate related alerts to prevent alert storms, provide root cause analysis, and ensure the shortest time to resolution possible.
- Data agnostic: Another unique feature of unsupervised learning is that it can detect patterns and anomalies in any type of data imaginable, from IT data such as machine temperature, to complex business metrics such as bounce rate, conversion rate and more.
Why has it taken so long for autonomous monitoring to work in a business setting?
Another important part of using machine learning for monitoring is not only that it initially learns the unique behavior of each data metric, but also that it adjusts to new patterns as user behavior evolves.
For example, COVID-19 has had an immense impact on behavior in the travel industry. The graph below shows the volume of bookings for a travel company, with the blue shaded area indicating where the data would normally register for that period. As countries began to announce social restrictions and lockdown measures, the number of bookings dropped dramatically. The autonomous monitoring solution was able to identify and adjust to the new normal within days — something that would have almost certainly required a manual adjustment with a traditional monitoring solution.
Use Cases: Early Adopters of Autonomous Monitoring
Now that we’ve discussed how autonomous monitoring works in theory, let’s review some of the business use cases for autonomous monitoring in different industries. In this section, we’ll focus on the telecommunications industry, although you can find more use cases here.
One of the best ways to improve the customer experience in the telecommunications industry is to identify changes in service quality and find glitches in real time. These are a few ways data leaders in telco are embracing autonomous monitoring:
- Ensuring network stability: The success of a telco network relies on monitoring countless data streams autonomously and in real-time. To do so, AI is used to monitor each metric individually and then correlate these with any sudden drops in network performance. On top of network monitoring for critical infrastructure issues, machine learning can be used to minimize overhead and maximize profitability.
- Roaming service transfer: Another major challenge in Telco is monitoring the sheer number of countries, partners, seasonal patterns and the complex interconnectivity between networks. Machine learning automatically learns the traffic pattern of each of these interdependent variables to monitor roaming services, so that when an incident does arise companies can instantly steer coverage to another partner.
- Revenue monitoring: A final example of autonomous monitoring in telco is monitoring the complex pipeline of infrastructure to uncover any potential revenue leaks. In particular, autonomous monitoring is used to immediately detect CDR loss, gain visibility into revenue-impactful events, and prevent customer refunds. Traditional analytics has proven to be too reactive in this field, while an AI-based solution can proactively identify anomalies and map out related metrics so you know exactly how to fix the issue.
In short, by monitoring billions of daily events, telco companies that have embraced autonomous monitoring are aware of critical issues before they impact the customer experience.
Another use case that has seen surges in demand this past year is in eCommerce. As more and more people are shopping from home, the need to detect and resolve incidents in real time is more important than ever. In particular, several ways eCommerce companies can use autonomous monitoring includes:
- Conversion rate monitoring: As conversion rate has a direct impact on revenue, monitoring for sudden drops can alert a company to errors in their checkout process and save a significant amount of otherwise lost revenue.
- Revenue & cost monitoring: Aside from just conversion rate, autonomous monitoring can be applied to all revenue and cost-related metrics.
- Customer fraud: AI analytics can also be applied to fraud detection by protecting merchants against unexpected patterns in user behavior.
Summary: Autonomous monitoring is here to stay
The last decade has seen immense progress in the field of monitoring, but the field changes rapidly, so companies need to adapt quickly to keep up.
While monitoring metrics manually was feasible in the past, as the rate of data creation has exploded, many companies have become overwhelmed with the amount of human intervention required to keep these systems running effectively.
AI is super-charging business monitoring to handle the scale, granularity and accuracy that data leaders need to keep tabs on big data and take immediate action.