Real-time anomaly detection is critical for eCommerce companies to detect costly issues within time series data as they’re happening. Long before they end up causing revenue loss, negative social media shares, embarrassing news articles, or calls from upset customers. Anodot Sr. Director of Customer Success Nir Kalish presents how eCommerce organizations can leverage the powerful anomaly detection capabilities of Anodot’s AI analyitcs to proactively address business incidents in real time, protect revenue and keep customers happy.

 

Presented By Nir Kalish

Nir Kalish is the Sr. Director of Customer Success with Anodot. Nir is passionate about the two complimentary worlds of business and technology – strategizing to resolve data mysteries while upholding exceptional user experiences.

Video Transcription

Nir Kalish: What I’m going to present or to talk today, basically, is how e-commerce, and retail companies, and everybody that related to selling something can use anomaly detection, and based on machine learning, in order to identify issues much, much earlier, and opportunities in such a way that they can leverage it.

Let’s talk a little bit about the problem. The problem is that e-commerce and retails swamp with data. You track the data from your website using the web analytics tools. You are tracking the data from your e-commerce platform: number of transactions, number of sales, how much revenue I got today for different product line, on different country, and so on.

And you are tracking the weather to understand if it’s impact your sales. You are tracking your marketing campaigns, what people says on their Facebook, Twitter, Instagram, and so on. And it’s almost impossible today to track all this data all the time in real time, unless you have a huge budget where you can hire many, many people to track it. And the idea is that we want to take a machine that will do the job for us.

It’s not going to replace us, which is very important. It’s going to do the job for me in order to make my day-to-day work much, much easier. Now, let’s take an example that I am a sales…I am responsible for the sales of product line X in my company. In Friday evening, running the reports of what happened in the last week, I understand that I have a drop in revenue in one of my product lines.

The drop is 15%. It’s nothing special happened this week, no elections, no crisis, no war, and I don’t understand what is going on. It means that Monday morning, if not even in the weekend, I’m going to activate… Wow, it’s going to be a very big challenge, I think. Any solution, by any chance?

Okay, in week two, basically, we activate everybody. The DevOp team to understand if there is any errors in the system, and the UI engineering to understand if there is any bugs, and so on. And trying to understand if there is price glitches. And after a week, give or take, one of the employees in the company want to use his 25% discount to buy something for himself.

He went to the site, he choose the latest one and in the transaction flow, he couldn’t complete the transaction. He tries once, twice, third time, give up, open a ticket to the system. We started to analyze it and we understood that we have 30, 40 different product that we cannot purchase. Fifty percent of them are from the latest summer sale. the mysterious bug that doesn’t impact all the product, only few of them.

But those few of them, including the hot summer sale, are being impacted and this is why I see the 15% revenue. Now, this is a case that happen in almost every e-commerce that I worked with in the past and until today. And when you try to understand, okay, how it happen, how we couldn’t discover it, we now, when we’re analyzing it, we are understanding that we cannot because we have gazillions of product lines, gazillions of devices.

And if we are a big e-commerce, we sells in 52 different countries, including the 50 states in United States. How we can actually track all this data all the time? If I will put static thresholds, I will need to manage it all the time. And it also doesn’t make sense. I cannot put static threshold on 5,000 product lines. If I will try to put people in it, they will stare into dashboards all day long and probably, for sure, will miss a lot of issues.

And in e-commerce, the average of incident where we are losing sales, of course, it depends whether we are in the holiday season or not, but we are usually talk about between $15,000 to $200,000 of lost for every hour. Depends on your size and whether it’s seasonal, holiday, or not. And we want to find a way where we can discover it much, much earlier and react quickly. Because every hour will cost us to lose money.

So, once again, every delay in the business insights and in understanding new opportunities, especially in our world where we have Amazon, and eBay, and other competitors that do all kind of things that might impact us, we want to be able to identify lost of millions of dollars or any impact on my brand quickly, and I want to identify opportunities as soon as possible, because I want to react on them.

Oh, I see that there is a small country out there in the Middle East that, suddenly, more people buy more stuff from my website. Maybe it’s a time to think, maybe, to build a physical store there. But how can I track it? Because that country only generate $2 million revenue every month, out of the $5 billion revenue that other countries are presenting to me or generating to me.

So things can get lost in the average. I cannot track those small amount, those small product lines that might have issues. And, of course, this is a nice dashboard that analysts needs to review and to try to find what is going on, and they cannot do it. Not because they are not good. Because a person cannot stare in charts all day long, seven days a week, and try to understand what is going on all the time.

So we want to use a real-time machine learning. And the real-time is important. I want to identify things in real time. I don’t want to ask questions what happened last month after I understood that I lost revenue. I want something that will, all the time, will track the data for me, that will learn the data, will learn the normal behavior, will can establish a baseline into the data, will alter me or identify if there are any abnormal behavior, okay, a.k.a. anomalies, and will alter them in real time.

And my machine learning needs to basically to be able to support different data sources. So I don’t want to build something that is just focused on web analytics. Because a year later, I will need to build something for that my e-commerce platform data, and for my system data, and for the weather data. I want to build a machine learning system that is agnostic. I can give it any data, any metric, and it can establish a baseline, find anomalies, alter me in real time.

And this is just an example of the data and how it looks like. So, the bold line, basically, is my data over time. The shaded area is the baseline, the automatically calculated baseline that the machine learning needs to learn. And the orange line is basically the anomaly. There is something not according to the expected baseline that I want to start to investigate. Now, this is a graph which most of you probably familiar with because it says seasonality.

Okay, we see the weekly seasonality and we see the weekend seasonality. And my graph or my system need to, in the end, to support that seasonality. It needs to track intraday seasonality, weekly seasonality. But some of the data is not seasonal data. It can be sparse. It can my multimodal. It can be sampling. And based…in Anodot, when we test all our customers, including commerce and retail, only 38% of the data is smooth, meaning it has seasonality.

Sixty-two percent of the data fell in all other types of signals. And this means that if we will build a system that only focus on seasonal, because most of the people in e-commerce think that the sales or seasonality means that the rest of the data is seasonal, they will find themselves in a big issue, because they will take an algorithm for seasonal data, enforce it on signals that are not seasonal.

And then, either they are going to miss incidents or there are going to be a lot of the incidents that are not really a real incident. So, in the end, building a machine learning that can does it, does all these things that I want it, it’s a little bit a complex task. Why? Because I need it to be agnostic to the data. All the infrastructure, all the database, everything behind the scene needs to be agnostic to the data.

It can be weather, it can be sales, it can come from my machines. Hey, it can come from a partner that I work with. It needs to be able to support rapid business decisions, meaning it needs to be real time. Some of my data is every minute. My revenue data is every day. And the weather data is every hour. It needs to support different timestamps or different time scales, actually.

And I want it to be able to analyze all the metrics all the time, without the need for me to tell it which ones I want to analyze. I don’t want to build a system that I need to ask questions. I want a system that can analyze everything. If I need to ask the question, find the anomalies in revenue, it’s very, very easy. But how many metrics you don’t even know what to ask? Maybe it’s not revenue.

Maybe there is abnormal behavior in the relationship between how many users enter to the store to how many users go to your website. Some bizarre issue that you track it, but you don’t even think to ask the question, are there any anomalies in that behavior? And we want a system that once it finds something and alert me, it’s easier for me to analyze it and understand maybe what is the root cause analysis.

If I am building a system that now give me an alert that I’m losing revenue, but I still need to activate 50 things in my company, and only after two weeks to get the answer what got wrong, I miss entire idea of being able to react fast. And, once again, this is related to retail, but it’s also to any other industry out there. So, in retail, we want to analyze the customer behavior via web analytics and to track the competitor pricing. Right?

There are all kind of tools: channel IQ and so on that check what my competitors are doing regarding their activities regarding their pricing and so on. I want to track my marketing data to understand if it influence on my sales or not. The weather feeds, very, very important to us. Everything that’s related to my physical stores and so on. And, of course, the system data that is also very important.

I want to track all of this in one system. And the power of machine learning, which everybody think that it’s a black box and some kind of magic, the idea of machine learning, overall, is very simple. Without machine learning, doesn’t matter how many people I have, how many BI tools I have, which I have great BI tools, none of them can tell me what are the constellations in the sky. They can show me all the stars. I need to find out those constellations.

The machine learning, because of its power, okay, because it doesn’t involve human beings to react, or to do something, or to configure it, can find the constellations all the time. And now we are need to ask, as an e-commerce, we understand, as a retail, we understand, okay, we have all the BI tools out there, we have great analytics, we have great data scientists. We are definitely understand that machine learning is something that we need. Okay?

Now comes the question, do I buy it, or do I build it? And although, as human beings, and many, many data scientists, and managers, we want to build because then it’s ours, we can manage it easily, and so on. In the world of machine learning, you actually need to do the analysis in a deeper way. Because otherwise, you might find yourself spending two, three, five years, between $1 million to $10 million, and in the end of the time, you have zero because your machine learning either doesn’t work as expected or the amount of continuous that you will need to do it will kill the company and somebody will take a decision to shut down this project.

And in the end, everybody is going to lose. There are companies that needs to build it, and they shouldn’t buy. Now, I’m going to give you the things that…or what to think when you want to ask these questions. So, first, we need to talk about budget. And when we are building machine learning, we need data scientists, and it’s not only one. It’s usually between two to four. And data scientists are very, very expensive.

We need developers, front end to build the UI, back end to build everything related to the database, but you also need developers that knows how to build machine learning. It’s a different capabilities. Finding those developers, it’s hard task, and they also cost a lot. We need a product manager to collect the data and all the requirements from the different departments that are going to work with it.

We will need QA. Now, it’s not only QA or, okay, let’s take 5,000 people in India, pay them $14 and hour, and that’s it. You need QA that understand machine learning algorithms and time series algorithms, because they need to test and to make sure that the algorithm works as expected. All this together, not even including the infrastructure, okay. I need a cloud, or I need 500 servers to buy in order to be able to calculate the amount of data in real time.

Then the second question that I need to ask, what is the volume of data? Is it terabyte, megabyte? How much do my metrics, are every minute, every hour, once a day? All those questions impact the analysis of how much time it’s going to take us to build such a system. Our expansion plan. Do we expect to expand by 25% year-by-year? Because then it means that I might need to build a system that needs to jump from terabyte to petabytes. And so on.

Production and maintenance. Building and finishing to build a system doesn’t means that we are done. Now we need to maintain it. There are bugs, new features. So now what I do with all the team that I have? Do I keep them, paying them salary just for the time that someone will ask some kind of improvement or feature request? This is tough questions. Do I release them and then I discover that I have a big issue and I have nobody left to fix those problems?

So also things to remember when you want to build something like that. The deployment time. Great, I have a project, I have the budget, I understand everything, all the requirements. It’s going to take me four years. Now what? Four years without anomaly detection? Meanwhile, Amazon, my biggest competitor, already has and using their own anomaly detection. They are checking my price.

They react much, much faster than I am. What will happen during those four years? Maybe I need to build it but, during those four years, I will need to hire a company that will do those calculations for me meanwhile. And the last thing is ease to use. We want it to that not only going to serve the analysts and the data scientists. We want it to where the product line manager can set up alerts and understand what impact is sales on his own product line.

We want the CEO to be able to understand what is going on, the CFO, and so on. So we want a system that it’s very, very easy to use. And this is part of the big challenge. How you build a system, based on machine learning, that does anomaly detection, that can every person in this room can read, use, and analyze it without finishing five years in the college to understand what anomaly detection even means. Okay?

And when we are doing the analysis, and this is based on discussions that I had with companies that are very, very rich, have a lot of money, budget is usually not an issue. And even the ones that tried it or did the analysis, we can’t do the falling numbers. When we are talking about building, the cost is between 400K to $3 million. And it depends. Depends on the size. We are usually talking about at least, by minimum, two data scientists, four developers when one of them is a front end, and at least three QA. Okay?

And the average lifespan of that project, one to five years. One year is very base, very simple. If you have, really, you succeed to hire the best of the best, it might take them one year. But usually, it’s more closer to the five, three to five years. And when we are talking about buy, usually, once again, the buy, different companies like Anodot, it’s depend of the amount of metrics, but it can be between several hundred thousands of dollars, depend on your size, even less, to a million dollar. Okay?

But even if it’s a million dollar per year, it might be more…less costly than manage such a service by yourself for the long run. And usually, time to value is 30 days. Okay, I can tell you that we have e-commerce, and IoT companies, and AdTech that, once they start to send us the data, a week later, they had an incident. Then, until today, their existing tools couldn’t find it. So it’s very easy to show the value and it’s very easy for you to understand if company A is better than company B.

Now, just an example, okay, of use cases. And in e-commerce, I can talk for the next five hours about use cases and anomaly detections. But I want to see, or my system can find, for example, a dropping in the sessions for a specific product line on different countries on different devices.

Okay, so let’s talk about the example. one of the thing that we want to do is to do exactly the correlation, I’m losing… I have a drop in sessions on a specific product line on different countries on different devices.

I want to be able to get an email that tells me, “You asked to be alerted if there is any abnormal behavior on sessions. Here you go. There is an abnormal behavior on 52 different metrics related to sessions.” I’m going inside the incident. I see all the 52, which is 52 countries, in our case, or maybe 20 countries on different devices: tablets, and iPhones, and desktops as well. And it’s also going to correlate it with the number of HTTP errors that happened on my servers, on those countries.

So now I understand the incident itself. I’m not only losing sessions in a small state like New York. I’m losing sessions in only biggest countries that I do sales. And it’s correlated to HTTP errors. Now, the next step, instead of activating the entire company, is to pick up the phone, call the DevOps, and tell them, “Guys, did you deploy any new version? Is there something wrong in our service? Because I see a huge spike in HTTP errors, and I see a huge drop in my sessions across the board. What is going on?”

And so, I got alert after one hour. And five minutes later, I can already activate the DevOps. So this is one very simple example. Another example is that, for three hours I see a drop in the number of sales on mobile. Where it’s correlated with the mobile load time. So, and once again, think about it not only in one country, or not only in one state, or not only in one product line.

Sometimes, it’s only one. Sometimes, it’s many. You want to see because you want to understand what is going on. Now, I can see that there is a spike. I see it dropping in mobile sales. I see a spike in mobile average load time. Now can I understand. Maybe someone changed their CDN configuration. Maybe we released a new mobile device that, instead of sending small images, we now sending megabytes of images that’s causing the load to be very, very slow.

And those from e-commerce knows that people are not going to wait 30 seconds until they can browse your mobile app. If it’s not there after two seconds, bye-bye, I’m going to Amazon. Now, we have…Anodot has a big retail company that has many, many brands. And when we started to work with them, they came to us with one problem. We have many brands. We are tracking a lot of data, different data sources.

We are very, very lean. We cannot track everything. Does your solution can work for us? We tried to build something. It didn’t work. We don’t have budget. Yada, yada, yada. We started to do a POC with them, and we showed them how easy it is for them. Now, instead of staring into charts and doing all kind of complex Excel every day, just to set up the alerts. Send all the data that you have from across the board.

Now, the marketing manager can see what happen in Facebook, and he can see the correlation with the increasement of sales, and he can come to the CEO and tell him, “You see? The campaign works.” Indeed, we succeed to increase the sales thanks to that campaign. Why? Because the system shows the correlations. Another interesting thing that happen is that they had a drop of sales.

And some of the brand said, “Oh, it’s not us. It’s because of the weather.” Before Anodot, nobody could actually check it. But now, because they are sending us the weather on all the states, they actually…or, the CEO went to the system and said, “But, guys. I see the incidents of dropping sales no correlations to the weather. There isn’t any anomalies in weather across all the state of United States. So it’s not the reason. Can you please give me another explanation? Go back to your meeting rooms and analyze why we lost sales in the last three weeks, since we got the alert.”

So they had to go back, and they had to find another explanations. Before that, everybody can say, “Oh, you know, we are losing sales. It was a rough weather in New York.” Okay. Nobody can really understand if it’s indeed the root cause or maybe something else. Maybe my competitor not only reduced the prices of similar product lines, but maybe also decided to increase the bidding of the ads that he’s doing on Google search engine.

If you track it, you will be able to see the correlations. And, basically, once again, it’s machine learning is needed for all the industries. But even if we are talking just on the current industries, six of them are related to e-commerce. So AdTech, e-commerce invest a lot of money on ads. You want to track all those information. Or in payment, this is exactly what you need. At the end of the transaction, someone needs to pay.

Whether you are using any other third-party services, you want to understand if there is incremental aspects of fraud. Maybe the partner, the payment gateway, has all kind of issues and is declining more transactions from your side, and this is what’s causing you to lose sales. Security fraud, we already talked. The shipping, delivery. Is there any anomalies that maybe impact your delivery time that you are not even aware of? Okay?

If you will be able to have a system that can track all of this, you will be able to react much, much faster and make hard life to Amazon, which we all know is currently the biggest competitor of all the e-commerce people that sits here. And they use anomaly detection, by the way. But for them, it’s easy, because infrastructure, they already have AWS, so they are paying zero. Okay?

They have data scientists that exactly this is what they build. And they are using it against you. So use machine learning. And it doesn’t matter if it will be Anodot or anything else, use machine learning against them as well. And this is just a list of our customers, which, probably, you don’t really care. But we have a booth there, and if you want to see more about how anomaly detection can work, how it can help you, you are more than welcome.

Any questions, by any chance? Yes.

Woman: You talked about looking at what was going on with other data, things like weather. Does Anodot actually pull those APIs in as part of its offering, or is that something that the customer asks you to do and pays for as part of their data?

Nir Kalish: So, customers, especially in e-commerce, many…each e-commerce usually work with another weather detection tools. But we can have…we build collectors. So we can, if customer give us access to his account of the weather, we can build a collector that will pull the data every hour, every day, transform into the Anodot format, and we will send it to his account. Any other questions?

Either I did a really amazing job, or I did a lousy job.

Woman: I think you did an amazing job

Nir Kalish: Thank you very much.

Written by Anodot

Anodot leads in Autonomous Business Monitoring, offering real-time incident detection and innovative cloud cost management solutions with a primary focus on partnerships and MSP collaboration. Our machine learning platform not only identifies business incidents promptly but also optimizes cloud resources, reducing waste. By reducing alert noise by up to 95 percent and slashing time to detection by as much as 80 percent, Anodot has helped customers recover millions in time and revenue.

You'll believe it when you see it