Resources

FILTERS

Resources

Documents 1 min read

Case Study: Autonomous Monitoring for Telco - OSS, BSS, CEM and More

Learn how telcos are using Anodot to automatically monitor their OSS, BSS and CEM layers and use real-time alerts for proactive incident management.
Blog Post 4 min read

Real-Time Anomaly Detection: Solving Problems, Seizing Opportunities

The business case In the first of our three-part series, What is anomaly detection?, we summarize how machine learning is enabling real-time, automated incident management. In this second post, we’ll discuss the reasons why this capability is so essential to today's data-driven business. The necessity In our previous post, we gave an example of a software update causing online sales from Asia to plummet. Obviously an anomaly in online sales volume for any specific region or device type needs to be detected immediately, and the same is true for other anomalies. This is because many real-life business anomalies require immediate action. That bad software update is causing you to lose a lot of money every second. And since discovering the problem is the first step in resolving it, eliminating the delay between when the problem occurs and when the problem is detected immediately brings you one crucial step closer to rolling back that update and restoring revenue flow from Asia. This is also true for anomalies which aren’t problems to be solved, but opportunities to be seized. For example, an unusual uptick in mobile app installations from a specific geographical area may be due to a successful social media marketing campaign that has gone viral in that region. Given the short lifespan of such surges, your business has a limited time window in which to capitalize on this popularity and turn all those shares, likes and tweets into sales. Real-time anomaly detection is advantageous even when the detected anomalies include ones which don’t require an immediate response. This is because you can always choose to postpone action on an instant alert, but you can never react in real-time to a delayed alert. In other words, real-time anomaly detection is always advantageous over delayed detection. But let’s think about it - what kind of anomaly of detection systems are able to provide this type of real-time notification? For only one or a few KPIs, a human monitoring a dashboard may work. This manual approach, however is not scalable to thousands or millions of metrics while maintaining real-time responsiveness. Beyond the mere number of metrics in many businesses, is the complexity of each individual metric: different metrics have different patterns (or no patterns at all) and different amounts of variability in the values of the sampled data. In addition, the metrics themselves are often changing, often exhibiting different patterns as the data exhibits a new “normal.” Manual vs. automated anomaly detection If manual anomaly detection is inadequate, then automated anomaly detection must be used to achieve real-time anomaly detection at large scale, and it must be sophisticated enough to handle all the complexity described above at the scale of millions of data points or more, updating every second. The machine learning algorithms that power Anodot’s automated anomaly detection system utilize the latest in AI research to meet this task. Our patented machine learning algorithms fall under the “online” category. This means that each data point in the sequence is processed only once and then never considered again. Online machine learning applications have the added benefit of scalability to the massive amount of metrics businesses keep track of. As each data point is processed, the online machine learning algorithms work in a way similar to the human brain in the jogger example of the previous post: A model which fits the data is created. This model, in turn, is used to predict the value of the next data point. If the next data point differs significantly from what the model predicted, that data point is flagged as a potential anomaly. Anodot’s machine learning algorithms use each new data point to intelligently update the model. AI anomaly detection in the real world The power of this application of AI to spot anomalies and the opportunities they present far faster than humans could, has already been used to great scientific success. An AI system developed by NASA’s Jet Propulsion Laboratory was able to detect and command an orbital satellite to image a rare volcanic event in Ethiopia - before volcanologists even asked NASA for that satellite to take images of the eruption. When working with thousands or millions of metrics, real-time decision making requires online machine learning algorithms. Whether it’s saving your business money or gleaning scientific insights from a brief volcanic eruption, real-time anomaly detection has enormous potential for catching the important deviations in the data. In the third post, we’ll dive a little deeper into the anomaly detection techniques which power Anodot’s software.
Documents 1 min read

Case Study: How 5 Leading Adtech Companies Used AI Analytics to Save Millions

Learn how leading adtech companies -- including Rubicon Project, Uprise and NetSeer -- are leveraging the power of machine learning to find outliers in time series data and turn them into valuable business insights.
Documents 1 min read

White Paper: The Build or Buy Dilemma For AI-Based Anomaly Detection

Leveraging the vast amount of business data available today to better meet customer needs and detect business incidents presents organizations with the challenge of whether to build their own anomaly detection system or buy one ready-made. Before organizations make this critical decision, it is important to weigh the benefits and challenges of each approach.
Videos & Podcasts 40 min read

Avoiding the App Trap: Using Anomaly Detection to Optimize Performance, Prevent Issues

Mobile app business models are often built around advertising and cross-promotions. Yet with so many moving parts, there are many opportunities for something to break.
Blog Post 3 min read

Closing the Loop on Anomalies, Alerts and Dashboards

Team Anodot is always busy working on new features and new capabilities for our users. Our most recent version upgrade rolled out yesterday and we’ve already received great feedback. So what’s all the fuss? We just closed the loop between your metrics, anomalies, alerts and dashboards! Almost every BI and visualization tool provides a dashboard...it’s a familiar and logical way to keep track of metrics that you’re interested in. Our newest version upgrade takes the dashboard concept to the next level. By showing anomaly alerts directly in your dashboard tiles, we're making it even easier to uncover and access business insights in real time. Not only will you receive traditional email/JSON/webhook alerts on anomalies in the data streams that are interesting to you, you’ll now also see these alerts in the context of the relevant dashboards. Get Started So how does it work? You’ve created a dashboard with graphs and meters… now click the “settings icon” in the upper right corner of a tile to display the options. Clicking on "Create Alert" tells the system that you are interested in receiving alerts whenever any of the metrics in the tile are anomalous. Once you've created the alert, a small bell outline icon will appear on the top left corner of the tile (see image below). From now on, if the alert bell is completely black, it means that anomalies occurred within the frame you’re looking at. This is in addition to the regular alert notification you would receive, but may have missed. Anomalies Can Hide in Plain Sight! The alert bell will appear even if the anomalies on the dashboard are not obvious to the human eye. In this example, the alert notification icon clearly shows that anomalies occurred in the selected data, but from a quick glance at the dashboard, it is not possible to actually SEE the anomalies. Drill Down to Investigate Root Cause In order to investigate further, you can easily see the full list of alert notifications on the right hand side. Click each notification to drill down into the Anomap page, where you’ll find information about individual anomalies that were alerted on, along with correlated events for other metrics that may not have been displayed on the dashboard. In the example below, we see that upon further investigation, the anomalies that were not obvious in the high level view are easy to understand when you look more closely at the individual alerts and correlations. In this example, we see a correlation between an increase in Payment API Failures which caused the Revenue metrics to decrease. For full documentation, visit our Support entry where you’ll find detailed information about creating and editing alerts as well as viewing dashboard tile alert events.  Got ideas for new features you’d love to see? Drop us an email at [email protected] and let us know. We’d love to hear from you.
Videos & Podcasts 11 min read

Rich Galan of Rubicon Project: The Need for Real-Time Anomaly Detection

Rich Galan of Rubicon Project presents the need for real-time anomaly detection at Innovation Enterprise CTO Conference.
Blog Post 4 min read

Website down? How to Track the Impact on Your Bottom Line

Costs of Unavailability Availability is one of the key measurements for every company with an online presence. The expectation of customers is constantly increasing, and today they expect access to the service at any time, from any device. They expect nothing less than 100% availability. Measuring availability is a difficult but critical task. I strongly advise that no matter how difficult it is, you must take the time to define what availability means for your business and start tracking it. The following table will help you understand the effect of different availability service level agreements (SLA) in terms of potential downtime: 99.9% 99.95% 99.99% Daily 1m 26.4s 43.2s 8.6s Weekly 10m 4.8s 5m 2.4s 1m 0.5s Monthly 43m 49.7s 21m 54.9s 4m 23.0s Yearly 8h 45m 57.0s 4h 22m 58.8s 52m 35.7s   Below, I share some of the potential impacts of unavailability. The emphasis you put on these factors will depend on the service being offered and your own circumstances. Lost Revenue If you are conducting business over the internet, every minute of downtime is directly linked to loss of revenue. There are different ways to calculate lost revenue: Determine how much revenue you make per hour, and use this as a cost to the enterprise for unavailability per hour/min. For example, in this article, Google’s cost of downtime was calculated at $108,000 per minute based on its Q2 2013 revenue of $14.1 billion. In another article, Facebook’s downtime cost was calculated at $22,453 per minute. This is the simple method, but it is not very accurate as revenue changes over time of day, day of week etc. Consider seasonality and recovered revenue, using week-over-week for a comparison of expected behavior vs. previous week. This is a more accurate method. In the following example, we see a significant drop in the transaction volume for about 10 minutes. Let’s assume that the revenue dropped by $110,000, and once the service was restored, users retried and completed their transactions resulting in an increase of $80,000. Now we can calculate the real impact as recovered revenue minus lost revenue: $80,000 - $110,000 = -$30,000 for those 10 minutes of downtime. Contractual Penalties Some organizations face financial penalties in the event of downtime. If your partners rely on your service being available, there is probably an SLA in place to guarantee certain availability. If this is not met, the provider must compensate the partner. Negative Brand Impact Almost every online service, and definitely all mature services, has competition. Uber vs. Lyft, Airbnb vs. VRBO, hotels.com vs. booking.com, and so on. If one service is not available, it is very easy for customers to switch to the competition. The expectation of customers in today’s world is for the service to be available all the time. In a previous post, we discussed the different elements of an incident life cycle.  Major incidents are detected very easily, even with very basic monitoring in place. The real challenge is getting to the root cause of the issue and fixing it quickly. Even if you have the right set of signals across the entire technology stack including infrastructure, application and business metrics, the data most likely resides in silos. Because of this, the person that triages the issue doesn’t have complete visibility, so different teams must investigate the root cause simultaneously. Adopting machine learning-based anomaly detection enables the processing of all relevant metrics in a single system. In this set up, if an anomaly is detected in one of the metrics, it is easier to correlate between all the other metrics and uncover the root cause much faster. In fact, a good anomaly detection system not only detects the issue faster and more accurately than traditional threshold-based alerts, it correlates across all relevant metrics and provides visibility to all other related anomalies. Let’s look at an example of a drop in volume of a specific product in a specific country. In this case, the system sends an alert that an anomaly was detected on conversion rates in the specific country and will provide visibility into signals that may have caused the issue such as: Events that happened at the same time, like code push Another anomaly that occurred at the same time on the DB metrics Network metrics that might indicate DDOS attack The idea is that with an anomaly alert, we will also receive other correlated events and anomalies to help us get to the root cause much faster by shortening the time it takes to triage the issue, thus reducing the impact on the business.
Videos & Podcasts 30 min read

Disrupt the Static Nature of BI with Predictive Anomaly Detection

Anodot's Uri Moaz discusses how predictive anomaly detection can identify revenue-impacting business incidents in minutes(!) not days or weeks.