Time Series Forecasting & Anomaly Detection – do you really need both?Fibre Marketing2021-05-19T15:59:26+01:00
Time Series Forecasting & Anomaly Detection – do you really need both?
What is Time Series Forecasting?
Time Series forecasting is all about using historical data to predict future behaviour of a time series, and has important applications in business – for example, forecasting future sales of a product, or the number of viewers of a TV show, or the number of conversions in a marketing campaign.
With any forecast, the forecast horizon can vary dramatically depending on the use case – projecting forward revenues 12 months into the future is typical, whilst predicting the number of converted customers from a marketing campaign more than 3 months ahead may not be as useful, given new marketing campaigns launch on a regular basis.
As they rely on historical data, forecasts can become highly complex undertakings when users attempt to incorporate every single possible factor into the calculation. Whilst noble in intent, the reality is many of these influencing factors are external and/or not available as measurements.
Real data also introduces an important element – anomalies and aberrations in time-series data. These have the potential to affect the forecast, and make the results less accurate & intuitive.
Anomalies can also significantly affect the forecast – usually for the worse. Below is an example from a popular article on Towards Data Science, where an anomaly (intuitively understood by a human) is incorporated into an LSTM forecast:
Fig 1: Data from Adithya Krishnan
Why does Anomaly Detection matter in Forecasting?
Anomalies are large deviations from the expected values of a time series. Without accounting for them, it’s likely that errors occur in the following areas:
The trend is skewed due to the presence of an anomaly – this is particularly acute if the anomaly’s magnitude is large, and the system attempts to achieve a line of best fit, producing poor results
Periodic components are not correctly identified – what would usually be a weekly repeating pattern is not included because anomalies are present
Both these challenges result in a poorer forecast accuracy, with a loss of trust in the abilities of the forecasting system (and typically, the team implementing it), and a regression back to “traditional” manual forecasting.
How to conduct Time Series Forecasting with Anomaly Detection
Now that the challenges have been laid out, it’s worth considering what a robust, intuitive solution looks like – we see it requiring the following steps:
Break down historical data into components
Identify anomalies from the historical data
Remove anomalies from the trend and seasonality components
(Optional) Include information about known events (e.g. sales promotions, Black Friday, Christmas)
Project ahead based on trend & seasonality components that:
Reflect the latest behaviour in trend and seasonality
Are free of anomalies
Here’s an example of what that could look like, leveraging the code from examples discussed previously (using the first 50 datapoints as training data, and the last 50 as test data)
Actual data, without anomalies present – the SARIMAX forecasting method offers reasonable accuracy:
Fig 2: Historical data, without anomalies
Fig 3: Actual vs. Predicted, without anomalies, using SARIMAX – reasonable accuracy
Actual data, with anomalies present – the SARIMAX forecasting method offers poor accuracy, especially immediately following an anomaly. This shows the importance of identifying and removing anomalies from historical datasets – otherwise, the quality of the forecast suffers dramatically.
Fig 4: Historical data, with multiple anomalies
Fig 5: Actual vs. Predicted, without anomalies, using SARIMAX – poor accuracy surrounding anomalous points and after
Get Started With Anomaly Detection Today
Avora is always on hand, utilising both Ai analytics and machine learning to uncover unexpected changes in your data.