Small Glitches, Big Trouble: How Checking for Potential Outliers in Time Series Data is a Necessity in eCommerce
Just before we get into how Anodot extracts actionable insights from time series data, it’s worthwhile to recap what exactly a time series is and how businesses typically generate them.
First, companies take a certain metric (a value or quantity) which is considered important, most often it’s one of the usual key performance indicators (KPIs): revenue, profit, or cost. Then, the company decides how often they’re going to sample (update) that number, and then they pull in data samples at that interval. Lastly, those two-item data points then go into some designated data bucket, such as a database. Analytics tools like dashboards then retrieve the data as a set, generate a plot and update it as each data point comes in. Depending on the type of data, the amount of noise and sampling rate – the actual data distribution (and thus appearance of the plotted time series) can vary widely.
It’s important to note that the data bucket is for the benefit of later in-depth analysis, not for Anodot’s machine-learning powered outlier detection. This is because Anodot uses “online” algorithms – computational processes which learn and update its models with each incoming data point, without having to go back to all the data points in that time series before it. All the previous data points are “encoded” into the models already.
Examples of time series data
Time series data sets are very useful when monitoring a given value over time. Since this is a near-universal need in many industries, it’s no surprise that all of today’s fastest growing (and data-intensive) industries use time series data sets. In ad tech, for example, there are metrics such as cost per lead, impression share, cost per click, bounce rate, page views and click-through rate. Common metrics in ecommerce include conversion rate, revenue per click, number of transactions, and average order value.
However, the actual time series data which are measured are far more specific because each of those above examples are often broken down by geographic region (e.g. North America or Asia), or operating system – this is especially true for mobile app metrics, since revenue-sapping compatibility problems are often OS specific.
This level of granularity allows companies to spot very specific anomalies, especially those which would get smoothed out and thus left unnoticed by more encompassing metrics, especially averages and company-wide totals.
How Anodot checks for different types of potential outliers
In the first installment of this series, we discussed the three different categories of outliers: global (or point), contextual (also called conditional) and collective outliers. All three of these can occur in time series data and all three can be detected. Take for example, a large spike in transaction volume at an ecommerce company which reaches a value never before seen in the data, thus making it a textbook example of a global outlier. This can be a great thing, since more sales usually means more revenue.
Well, usually. A large spike in sales volume can also indicate that you have a stampede of online shoppers taking advantage of a pricing glitch. In such a case, your average revenue per transaction might actually be dipping down a little, depending on the ratio of glitch sales to normal sales. This slight dip might actually be completely normal at other times of year (like the normal retail slow periods which occur outside of the holiday season or back to school shopping), but not when you’re running a promotion. In this case, the low values of average revenue per transaction would be considered a contextual outlier.
Hmmm, perhaps the promotional sale price for those TVs was entered as $369 instead of $639.
Anodot is able to detect both types of outliers thanks to its ability to account for any and all seasonal patterns in a time series metric, thus catching the contextual outliers, and accurately determine whether a data point falls far outside the natural variance of that time series, thus catching global outliers.
Anodot’s first layer – univariate outlier detection – is all about identifying global and contextual outliers in individual metrics. A second layer then focuses on what are called collective outliers (a subset of data which, as a collection, deviates from rest of the data it’s found in). This second layer uses multivariate outlier detection to group related anomalies together. This two-layer approach provides analysts both granularity and concise alerts at the same time.
The advantage of automated outlier detection for finding potential outliers
Human beings are quite good at spotting outliers visually on a time series plot, because we benefit from possessing the most sophisticated neural network we know of. Our “wetware”, however, can’t do this instantly at the scale of millions of metrics in real time. Under those constraints, not only are recall and precision important, but also detection time. In the case of our hypothetical price glitch above, each second that glitch persists unfixed means thousands of dollars lost.
Automated anomaly detection simplifies the detection and correlation of outliers in your data, giving you real-time insights and real-world savings.
Considering building a machine learning anomaly detection system? Download this white paper.