Delivering fast results at scale: a closer look at unsupervised machine learning anomaly detection techniques
In this final installment of our three-part series, let’s recap our previous discussions of anomalies – what they are and why we need to find them. Our starting point was that every business has many metrics which they record and analyze. Each of these business metrics takes the form of a time series of data, and each time series has a normal baseline of behavior. For some metrics, it may be some sort of seasonal pattern, while others will be distributed tightly around some nominal value. If a data point deviates outside the usual range expected for that metric at that point in time, that data point is considered an anomaly.
Since business metrics report data relating to some facet of a business, an anomaly in the data can reflect an important event or change in the business, possibly affecting the revenue stream. The anomaly in the data could be a sign of an opportunity to earn more money – or of a problem that’s causing you to lose it.
Either way, companies need to know what their data is trying to tell them right away in order to take advantage of opportunities or fix costly problems, and this is why real time anomaly detection is a requirement for modern businesses. The fact that a single company can actively monitor thousands or even millions of metrics means that simple manually configured monitoring and traditional BI is insufficient. Automated real time anomaly detection methods must be used when working at this scale.
Anodot’s spin on unsupervised anomaly detection techniques
Let’s discuss these methods in more detail. In our first post of the series, we used an example of a neighborhood jogger to illustrate how our brains notice odd occurrences in everyday life:
- A series of observations establish and confirm a mental model
- That mental model provides a specific expectation for the next observation
- That next observation is then made and compared against what the model predicted
- If that observation and the prediction don’t agree, that observation is flagged as strange or odd
Anodot’s real time anomaly detection techniques do the same thing, but with time series data of business metrics. How they do it is made possible by machine learning, a branch of artificial intelligence (AI).
There are two main categories of machine learning methods: supervised and unsupervised. In a nutshell, supervised machine learning algorithms are trained by example. Humans feed them datasets containing examples which are already labeled or categorized, and this enables the algorithm to build a general model of each category. Then, the algorithm processes the real (un-categorized) data and attempts to put each item into one of the pre-learned categories. Since a supervised algorithm only knows the categories on which it has been trained, and its training was conducted on pre-labeled examples, a supervised machine learning algorithm cannot place an item into a category it has not seen an example of. This means that an automated anomaly detection system built on such an algorithm would have to be given examples of every single possible type of anomaly on every possible data distribution, pattern and trend.
Unsupervised machine learning algorithms, however, learn what normal is, and then apply a statistical test to determine if a specific data point is an anomaly. A system based on this kind of anomaly detection technique is able to detect any type of anomaly, including ones which have never been seen before. The main challenge in using unsupervised machine learning methods for detecting anomalies is deciding what is normal for the time series being monitored.
At Anodot, we utilize a hybrid “semi-supervised” machine learning approach. The vast majority of the classifications are done in an unsupervised manner, yet customers can also give feedback, indicating “this is a real anomaly, but that is not a real anomaly.” This very small subset of all the examples that can be identified classified as one or the other provides valuable input into the mainly unsupervised system.
Unsupervised and Adaptive: Go with the flow (of anomaly detection)
Not only are Anodot’s algorithms unsupervised, they are also adaptive, which means they adjust to and eventually accept changes in the time series when they change to a new normal. Going back to our jogger metaphor, this is analogous to you not seeing your neighbor jogging each morning for several weeks in a row. Eventually, you adapt your mental model about your neighbor: she doesn’t jog at 7 AM every morning. Thus, not seeing her jog doesn’t register as a strange event. The first morning that you didn’t see her jogging was an anomaly. Weeks later, that same observation is just confirmation of (the now normal) behavior of your neighbor.
Anodot’s adaptive learning algorithms, being composed of computer instructions, achieve this same sort of adaptability by assigning to anomalies an increasing ability to change the “normal” model the longer that anomaly persists. The result is an automated anomaly detection method which both flags anomalies, yet adapts to changes in the data pattern.
Intelligent anomaly detection at each level
Once the anomalies are found by these interacting machine learning algorithms, a whole other layer of machine learning – one that utilizes deep neural networks, among other clustering and similarity algorithms – works to discover the relationships between metrics so that the flood of discovered anomalies can be distilled down to a much more manageable number of correlated incidents, which can then be investigated by human experts.
By filtering out the massive amount of insurmountable data and pinpointing the issues at hand, we can extract actionable insights effortlessly from the anomalies, which empowers us to turn issues into opportunities and errors into learning curves.