Data is a crucial asset for most businesses. But that data is worthless if it is unreliable. While data supports major business initiatives, the level of data accuracy for many organizations is still low. Saul Judah, research director at Gartner, explained, “Poor data quality contributes to a crisis in information trust and business value, such as financial and operational performance. Unless tangible and pragmatic steps are taken to understand, address and control data quality, the situation will worsen.”

In our previous two posts in this series, we discussed the challenges and need for maintaining high levels of data quality followed by an explanation of the different types of costs that an organization can suffer due to poor data quality. Now we’ll explore how an AI analytics solution can help identify data quality issues before compromised data impacts critical business decisions.

Quality of Data is Critical for Business

The quality of the data that is used by business is critical for improving customer experience, enhancing business processes, finding new market opportunities, and enabling new initiatives. Organizations that put a premium on high data quality can leverage their data as a valuable competitive asset, increasing efficiency, enhancing customer service, and, ultimately, driving revenue.

On the other hand, organizations with poor data quality, using outdated, inconsistent, and flawed data, end up wasting time working with contradictory reports and inaccurate business plans, and end up making misguided (and damaging) decisions.

Poor data quality destroys business value, as reported by Gartner, “Recent Gartner research indicates that the average financial impact of poor data quality on organizations is $9.7 million per year. This is likely to worsen as information environments become increasingly complex.” (How to Create a Business Case for Data Quality Improvement – January 9, 2017)

Make Sure Data is Reliable

Quality data can speak volumes about your business. As noted, companies collect tons of data. In this process they usually need to translate and standardize that data in order to send and catalog it in data warehouses. With all the analytics and decisions based on this data, the accuracy of this data is very important. As our customer, Pedro Silva at Credit Karma, recently shared, “We ingest over 5 terabytes of data per day. The quality of data is critical to business decisions. So we need to make sure that the quality is always there.”

Event data is collected from multiple sources. In some cases, even every single event that happens on mobile devices is collected and stored in data warehouses. With this volume of information being processed and at a demanding pace, there is room for errors to creep into the process. Such errors can be:

  • Stuck Values: events that suddenly stopped being reported
  • Inaccurate Values: events are still reported, but there is a problem in translation
  • Wrong Data: when there is a peak in the number of nulls
  • Missing Data: showing empty fields
  • Implausible Values: interpolated data is “just too good to be true”

This can result in decisions being taken based on wrong data. Such lapses may only be discovered weeks after decisions have already been implemented. This causes a lot of damage, by the time the data is located, tidied, sorted and re-applied, it may be virtually out of date and no longer relevant for decisions.

Innovative Solutions for Data Quality Challenges

Rather than stick with established and standard BI reporting tools, organizations should consider more innovative solutions to better address their data quality processes. Adds Forrester Analyst Michele Goetz, “The data quality solutions market is growing because more enterprise architecture professionals see data quality as a way to address their top challenges. In large part, this market growth is due to the fact that EA pros increasingly trust data quality solution providers to act as strategic partners that advise them on top data management and business decisions.” (The Forrester Wave™: Data Quality Solutions, Q4 2015)

A Smarter Way To Approach Data Quality Problems

Since at the end of the day everything is data, a smarter way to approach data quality problems is through AI analytics, leveraging anomaly detection. Anomaly detection flags “bad” data, identifying suspicious anomalies, that can impact data quality. By tracking and evaluating data, anomaly detection gives critical insights into data quality as data is processed.

Collecting Time Series Data

Today’s fastest growing (and data-intensive) industries use time series data sets, tracking business metrics over time. For data quality problems, you want to know about business metric shifts – when error rates change, missing data, gaps in the event data /time series, value spikes, signal noise, data that wasn’t updated or inaccurate data formats.

Time series data is the basic input of an automated anomaly detection system.  Anomaly detection can find actionable signals in the time series data, because those signals often take the form of anomalies (i.e. unexpected deviations in the data), highlighting data quality problems in real-time.

Missing event data can go undiscovered for weeks. Using AI analytics to preserve data quality should be on everyone’s agenda.

AI Analytics Puts Trust Back in Your Data

Large data sets have large data quality issues. When you are dealing with millions of data points, it is a challenge to know where things are changing when there are so many permutations, many business metrics and dimensions. This needs to be processed on an analytics platform that can efficiently run detection algorithms at multiple steps of the data pipeline, identifying data quality issues and business metric shifts.

An AI analytics solution can address data integrity issues at the earliest point of data processing, rapidly transforming these vast volumes of data into trusted business information.

Anodot’s real-time, large-scale AI analytics solution is fully automated at each step in the data collection process: detection, ranking, and grouping – issuing concise alerts about changes in key business metrics like missing data, unexpected data types, nulls where there shouldn’t be, or malformed records.  Based on these notifications, if you have suspicions that things are not quite right with your data, then you can quickly focus directly on the specific issue and consider how to proceed. This level of granularity can help companies spot very specific anomalies in data quality, especially those which would get smoothed out or left unnoticed by wider metrics, like averages and company-wide totals.  

Data credibility is one of the most valuable assets organizations have today, providing proactive alerts around the quality and integrity of that data, can save resources and offer many valuable opportunities.

The End to a Never-Ending Story? Improve Data Quality with AI Analytics

Written by Ira Cohen

Ira Cohen is not only a co-founder but Anodot's chief data scientist, and has developed the company's patented real-time multivariate anomaly detection algorithms that oversee millions of time series signals. He holds a PhD in machine learning from the University of Illinois at Urbana-Champaign and has more than 12 years of industry experience.

You'll believe it when you see it