Establishing the Right Conditions to Trigger an Anomaly
Determining the correct conditions to identify an anomaly is vital if you want to reduce the number of false positives you get. You’ll want to set rules based on the three attributes above, as well as an understanding of how other factors influence whether unusual values for a particular metric need attention or not.
The first step is to decide which metrics to use to detect anomalies. Which will indicate that there is an issue in your system? And will you use the raw value, or do you need to apply some normalization before it becomes valuable?
For example, the number of failed payments might be an indicator that there is an issue with your payment gateway. Or it could be due to your marketing team’s new campaign driving extra traffic to your website. In this case, the failure rate is more important than the number of failures. But it requires you to calculate it from the total number of attempted payments.
You’ll also need to decide how granular you want your alerts to be. Will you monitor overall payment success, for example, or will you break it down by payment type, location, device, or a combination of these. The more granular your conditions, the easier it will be to isolate issues. But you’ll also spend more time investigating every single anomaly.
Once your metrics are chosen, you need to decide for each one:
- What direction indicates an anomaly that is a concern – up or down?
- How long should the anomaly be present before it triggers an alert?
- How far should the metric deviate from the normal range to be considered an issue?
What Warrants the Triggering of an Alert?
Once you have set your conditions to identify anomalies, you need to decide when an alert will be triggered.
In many ways, this relates more to how your business operates than to outliers in your data. At what point do anomalies require action?
The answer will vary for each metric you monitor. And it may even have two answers since, for many of your metrics, both a dip and a peak might be worthy of note.
A sudden drop in contact form submissions, for example, might suggest an issue with your website. A sudden increase might just indicate plenty of new leads. But it could relate to unusual activity from spambots. The drop in submissions will likely be a concern earlier than the increase is, but both could require action.
When an alert is sent will depend on how urgent the issue is, whether it has persisted for a long time, and how large the anomaly is. It might also depend on the interaction between different metrics. For example, you would expect a correlation between website sessions and purchases. If these increase in tandem, it might not be worth an alert.
But if one increases while the other drops, the behavior is unusual and you would want to know that it is happening since it could indicate an issue that needs your attention.
The Role of Machine Learning in Anomaly Detection
As we’ve seen, there is a lot to consider in setting up an anomaly detection system that doesn’t send too many false positives.
Configuring such systems manually requires an expert with deep knowledge of your data and a good understanding of your business’s priorities and operation.
Even then, the problem with manually-built anomaly detection systems is that they don’t adapt to changing circumstances. The person who built it must go back in to make changes regularly, so it remains updated and performing correctly. If those changes aren’t made, your number of false positives will increase as your system fails to consider the new conditions.
In contrast, systems that use machine learning are constantly adapting to incorporate new information. They automatically update their anomaly detection criteria according to changing data. This may include alerts being marked as false positives.
There are two main types of machine learning methods for anomaly detection. The first is supervised. A predictive model is built based on a labeled training set, which includes both expected and unusual results.
The other is unsupervised, where the training data isn’t manually labeled. Instead, it assumes that most of the points fall within the normal range and uses infrequent occurrences to identify anomalies.
Some systems use a combination of supervised and unsupervised. This is called semi-supervised learning.
How Machine Learning Can Reduce Your False Positive Rate
Machine learning automatically identifies patterns in your data and uses them to update its predictive modeling. This creates an anomaly detection system with the flexibility to respond to changing circumstances and integrate new information.
Machine learning is designed to cope with large datasets and can spot patterns that human analysts may miss. The accuracy of its anomaly detection should improve as it continues to assess and evaluate your data, reducing false positives.
The interplay between different metrics is a vital part of anomaly detection and another area where machine learning can increase the accuracy of your alerts. By combining and analyzing datasets, the machine learning model works to understand which correlations are expected between trends in different metrics.
The system’s ability to adapt and learn creates an anomaly detection system with the flexibility to modify rule-based alert systems according to insights it gleans from your data. And this reduces the number of false positives, meaning you and your team can concentrate on investigating genuine anomalies.