For many CSPs, increasingly complex networks and immature technological solutions result in a typically long time to the detection and resolution of incidents that impact the customer experience, the brand’s reputation, and the bottom line. With RAN, Mobile and IP core, transport and applications and dozens of other integrated components, the network is one of the most complex areas to monitor. And since in most cases one KPI alone isn’t enough to surface the problem, the detection of outages and service degradation depends on the correlation of multiple anomalies.
The ability to autonomously correlate between thousands of KPIs of related events and glitches — across multiple domains, layers, technologies and vendors — provides the full context of what is happening. CSPs using solutions that offer autonomous correlations across the network enjoy these benefits:
- Shorten time to detection and resolution. Discovery of correlations between different monitored elements in the network is essential for autonomous root cause analysis for the fastest possible remediation, and is therefore a key step on the roadmap to zero touch. Only by monitoring and correlating between different KPIs across the entire telco stack can CSPs significantly reduce time to detection (TTD) and time to remediation (TTR).
- Gauge real customer impact. Autonomous correlations are also the only way to effectively gauge real customer impact. Sleeping cells are a great example, as there is no ONE specific KPI that indicates the cell is not working: multiple metrics, including drop in downlink throughput, spikes in ERAB drops, and increases in S1 failures, need to be correlated in order to identify a sleeping cell. When metrics aren’t actively correlated by the carrier, the real monitoring is being done by the customers that call and complain. Using AI-based anomaly detection with advanced correlation capabilities is the only way to raise such an alarm. The correlation will use different KPIs (such as drop in downlink throughput) and will compare it to the data to analyze the actual root cause.
- Reduce alert fatigue. Correlation analysis reduces alert fatigue by filtering irrelevant anomalies (based on the correlation) and grouping correlated anomalies into a single alert. This dramatically reduces one of the pains many CSPs face today – managing hundreds, even thousands of separate alerts from multiple systems, many of which stem from the same incident.
Current CSP correlation challenges
Despite their significant advantages that directly translate into improved ROI, AI-based correlations are far away on the roadmap for many CSPs. The main hurdles faced by CSPs are the complexity of the network, limited resources and internal knowledge, and an overwhelming number of potential rules, leaving the majority of companies stuck at the POC stage.
However, most correlation solutions are rule-based and rely on static threshold models that are best suited for data-at-rest, and therefore cannot provide the benefits inherent in the latest technologies. With older solutions, correlation rules are limited and mostly applied on fault alarms which are reactive — instead of on granular time-series data. Of course, minimizing TTR can only be achieved by anomaly detection and correlation analysis carried out on live data streams, requiring advanced correlation techniques.
In addition, cross-domain correlation is currently unachievable by most monitoring solutions utilized by CSPs, since these solutions work in silos: every network layer and every network type is monitored separately, using rule-based or static thresholds. The siloed approach prevents effective correlation between related issues. As a result, the NOC team’s only way to understand the actual service impact and experience is by collecting customer complaints and looking at the “downdetector”, which typically takes anywhere from a few hours to a few days, resulting in significant revenue loss and damage to brand reputation.
Provide the service customers expect
Providing the service that customers expect requires CSPs to create a holistic view across multi-vendor, multi-domain environments for real time detection and correlation of service-impacting incidents. Anodot serves as the brain on top of the OSS, giving CSPs the cross-domain and service experience view they require. Anodot’s patented correlation engine correlates anomalies across the network for holistic root cause analysis and the fastest time to resolution, leading to enhanced network monitoring capabilities, improved network availability and customer experience. These capabilities are critical for reducing the typically long time to the detection and resolution of customer impactful incidents, continuous alert storms, and, finally, revenue loss and damaged brand reputation.