Unknown Unknowns are Big Data Opportunities
In business, few surprises are good. When you learn that, for example, a business process error has been preventing customers from completing their orders for over a week, you probably won’t be happy. You’ve just tripped over an “unknown unknown – the ones we don’t know we don’t know.”
The term, Unknown Unknowns, attributed to Donald Rumsfeld, refers to those situations where you don’t know you have a problem, so you don’t know that you need to apply resources to solve it.
What is an Unknown Unknown in Business?
The problem here isn’t that business processes fail – that may still happen. Rather, problems occur when you not only can’t see the failure, you don’t see the shadow in the data that the failure creates. In the example above, the shadow would come in the form of a decline in sales, and you might not see it because your BI platform mis-represents the data, making you feel like there is nothing to worry about.
The error and subsequent sales drop are unknown unknowns. Its converse, a known unknown, might take the form of, ‘we know we’re seeing an unusual drop in sales, but we don’t know what’s causing it yet.’
In business intelligence, unknown unknowns are an unfortunate fact of doing business, especially in a world scaling up and dealing with ever large datasets. The problem is you don’t know what you don’t know, until it’s too late. So how do you identify unknown unknowns in a timely way and avoid complex, revenue-losing problems that can persist right under your nose?
Think of the Unknown Unknown Problem as an Opportunity
In our example, the effect of a business process error was masked by seasonality. Seasonality is a data trend which occurs naturally based on customer data. Seasonality refers to the presence of cyclical patterns in time series data. Seasonal patterns are changes we expect; part of the normal behavior of a given metric and thus must be included in the model of that metric. For some metrics, however, there are no seasonal patterns. And sometimes, multiple seasonal patterns are present in a time series. Since this can be mis-identified as outliers that might deserve attention, seasonal variability must be identified, filtered out and ignored.
Finding unknown unknowns in your data might sound like you are flying blind. Even though these are unknown unknowns, very often, your data will hold clues that can point you to the unseen drivers that are impacting your business.
For example, as online commerce has grown exponentially, so has its complexity. Glitches have scaled down in size, but they’ve grown in number, sophistication, and—especially—difficulty to identify. The fact is that multiple micro-sized glitches may cut as deeply, if not more so, than the headline-grabbing failures we often hear about.
Where we deal with such huge populations of statistical data, and where it becomes nearly impossible to take out a representative sample, no matter what the sampling technique is, big data has tremendous value to give us a sense of the bigger story that the data can tell us.
Revenue-losing problems can be discovered early to preserve the revenue stream that would otherwise be lost. In other words, unknown unknowns aren’t necessarily a problem – they’re an opportunity to discover revenue that may otherwise be lost.
Uncovering Unknown Unknowns in Business
Take, AppNexus, a company that experienced unknown unknowns first hand. Their account management team might not see an issue or it would take then a couple of days to surface, like run a report and see that something changed within the business. Then they would start trying to figure out which metric had changed or which site had stopped showing ads.
Handling automated online ad purchasing and processing about 10 billion transactions every day, with each transaction taking just milliseconds, sometimes things were missed. AppNexus’ VP of Engineering, Travis Johnson, explained, “We wanted to be able to reach out to our clients to work with them to resolve these issues faster. Every minute counts, every minute can be a missed impression. However, sifting through 10 billion daily transactions to try and find these signals was a difficult problem for us. We didn’t have the tools to do this. We weren’t a team of data scientists. So, we needed to know really where to look and come up with a way to solve this problem.”
They found that with over 40 metric attributes per transaction, there was no realistic way for human analysts to keep track of the data. “We had too much data that we wanted to watch. We weren’t able to just set up a simple dashboard and give it to our account management team and have them just say hey, watch these, watch these five metrics for your client and understand their health.”
Shining a Light on Unknown Unknowns with AI-Powered Analytics
AppNexus changed the story by applying machine learning to analyze their time series metrics and detect anomalies in real time. Speed was the issue, so that they could know when things were going wrong and reach out to their clients. Enabled with real-time analytics and accurate alerts, AppNexus was able to be aware of errors almost immediately as they happened, without having to rely on skilled data scientists to perform effort-intensive sleuthing. Instead of having to wade through dashboards to find unknown unknown revenue leaking issues, AI-powered analytics could point them out right way.
For instance, where a client had an ad running and an anomaly showed that the ad had all of a sudden dropped to just a trickle of clicks coming in, from the expected pattern of clicks. There could be a lot of reasons why that happens. AI-powered analytics shows all the related metrics that are also warning, giving better insight.
Data analysis is great for finding many things you want to know, but another big advantage is that it can discover what you perhaps don’t want to know – the glitches and the anomalies — the things that just aren’t right and are absolutely crucial to discover and mitigate in near real time. Machine learning analytics disrupts and replaces traditional BI tools, turning the unknown unknowns turn into valuable solvable issues.