How Was Autonomous Analytics Created?
We interviewed our teams, from data science to marketing
to engineering, to learn what makes Anodot well, … Anodot.
These 12 characteristics stood out among the rest.
Anodot’s design principles conform to the highest requirements of anomaly detection:
to Autonomous Analytics
to Data Sources
A business monitoring solution can achieve its full value only by covering and correlating between all data streams and metrics, regardless of the business’s original data architecture and silos. Integrating all data sources is essential. At Anodot we rely on turn-key integrations that seamlessly aggregate inputs from storage, databases, analytics, monitoring, APIs and SDKs, CRM and data streams, into one centralized analytics platform.
on 100% of Data
Significant anomalies will and do occur across 100% of business data, so achieving a watertight solution—that can also correlate between disparate anomalies to report on incidents in context—requires complete data coverage. Anodot analyzes 100% of the business’s metrics in real-time and at scale by running its machine learning algorithms on the live data stream itself, without reading and writing to a database. Every data point that flows into Anodot from all data sources is correlated with the relevant metric’s existing normal model, and either flagged as an anomaly or serves to update the normal model.
of Optimal Model
Metrics exhibit a wide variety of behaviors, patterns and distributions, so no single model can be used to cover all metrics. To allocate the optimal model for each metric, we first create a library of model types for different signal types (metrics that are stationary, non-stationary, multimodal, discrete, irregularly sampled, sparse, stepwise, etc.). Every metric that comes in goes through a classification phase, and is matched with the optimal model. Keep in mind that open source models generally work for stationary metrics only, while tending to produce frequent false-positives and false negatives for other signal types.
Learning of Normal
Learning every metric’s “normal behavior” is a prerequisite to identifying anomalous behavior. To accommodate this kind of learning in real-time at scale, you’ll want to use sequential adaptive learning algorithms which initialize a model of what is normal on the fly, and then compute the relation of each new data point going forward. Even well known models such as Double/Triple Exponential (Holt-Winters) or ARIMA require modifications to allow sequential learning. At Anodot we developed a sequential update for all model types that are used for the various metric types.
The two most common methods for determining baseline and seasonality patterns are Fourier transforms and ACF (Auto-Correlation Function). Fourier transform is efficient but sensitive to missing data and multiple seasonal patterns. ACF does better—at a steep computational complexity cost. Anodot’s patented Vivaldi method is based on the more accurate and robust AFC, but uses smart subsampling to reduce computational complexity. As opposed to auto-learning of normal behavior, auto-learning of seasonality can be done offline. We use an Hadoop cluster to efficiently run the algorithm daily on hundreds of millions of metrics.
Grading anomalies is critical for filtering alerts by significance. Alerts are scored according to deviation, duration, frequency, and other related conditions. But results achieved with statistical tests—which score anomalies only relative to normal—aren’t finely-tuned to the business’s needs. That’s because people tend to perceive anomaly significance not only relative to normal, but also relative to each other. Anodot’s patented anomaly scoring method runs probabilistic Bayesian models to evaluate anomalies both relative to normal based on their anomaly pattern, and relative to each other, to arrive at a more accurate score.
Metric correlation combines anomalies at the single metric level so the system can consider them simultaneously in order to describe the whole incident. This contextual awareness depends on an initial understanding of related metrics. Numerous learning methodologies can be applied here, with varying accuracy, efficiency, scale and cost. Anodot uses a patented combination of four derivatives of behavioral topology learning: abnormal behavior similarity, naming similarity, normal similarity, and implicit analytics topology. Scale is achieved through algorithmic metric partitioning and grouping, which enables to maintain rapid run time at any scale, without increasing computational costs.
Events external to the business (such as holidays, weather, traffic), or to the data (version releases, special sale dates etc.) affect businesses on a daily basis. Taking external events into account is critical both for understanding the root causes of incidents (e.g., version release causing bugs that manifest as anomalies), and to normalize the event’s impact on the data (e.g., Black Friday’s impact on e-commerce metrics). Anodot collects external event data through third party integrations, and runs the relevant algorithms—annual seasonality models, regression models etc.—to normalize their effect.
Anodot is built for detection accuracy, reducing false positives and false negatives to a minimum. Alert simulation is used to test the system on historical data in order to fine-tune alert sensitivity pre-launch. Statistical models—such as ratios between metrics and influencing metrics—group and correlate different metrics in order to analyze them according to the specific business context. A patented anomaly scoring methodology, which measures the anomaly delta both relative to normal and relative to other anomalies, filters alerts according to their significance.
Real-time business monitoring alerts stakeholders to mission critical incidents, so it’s imperative that notifications are served without delay. This is where integrations with alert channels come in, enabling the system to notify every user through her choice of channel or channels. At Anodot, integrations include—but are not limited to—Slack, API, Email, pagerduty, Jira, Microsoft Teams OpsGenie, and more.
With the receipt of every alert, users are prompted to give the alert a binary score (good catch / bad catch). This input is fed back into the learning model to further tune it by providing real-life indications about the validity of its performance. By training the algorithms with direct feedback on anomalies, users can influence the system’s functionality and results.
While detection accuracy is the main evaluation criteria of business monitoring, an effective user interface is critical for its functional performance within the business. The platform’s UI/UX must enable users to interact with the system’s data in ways that will empower them to understand and investigate its findings. Visual conciseness and clarity are essential enablers of fast anomaly evaluation and remediation. Investment in UI is also vital for the wide adoption of your monitoring solution across the business.