Time Series Forecasting & Anomaly Detection – do you really need both?

What is Time Series Forecasting?

Time Series forecasting is all about using historical data to predict future behaviour of a time series, and has important applications in business – for example, forecasting future sales of a product, or the number of viewers of a TV show, or the number of conversions in a marketing campaign.

With any forecast, the forecast horizon can vary dramatically depending on the use case – projecting forward revenues 12 months into the future is typical, whilst predicting the number of converted customers from a marketing campaign more than 3 months ahead may not be as useful, given new marketing campaigns launch on a regular basis.

As they rely on historical data, forecasts can become highly complex undertakings when users attempt to incorporate every single possible factor into the calculation.  Whilst noble in intent, the reality is many of these influencing factors are external and/or not available as measurements.

Real data also introduces an important element – anomalies and aberrations in time-series data.  These have the potential to affect the forecast, and make the results less accurate & intuitive.

Anomalies can also significantly affect the forecast – usually for the worse.  Below is an example from a popular article on Towards Data Science, where an anomaly (intuitively understood by a human) is incorporated into an LSTM forecast:

Fig 1: Data from Adithya Krishnan

Why does Anomaly Detection matter in Forecasting?

Anomalies are large deviations from the expected values of a time series.  Without accounting for them, it’s likely that errors occur in the following areas:

  • The trend is skewed due to the presence of an anomaly – this is particularly acute if the anomaly’s magnitude is large, and the system attempts to achieve a line of best fit, producing poor results
  • Periodic components are not correctly identified – what would usually be a weekly repeating pattern is not included because anomalies are present

Both these challenges result in a poorer forecast accuracy, with a loss of trust in the abilities of the forecasting system (and typically, the team implementing it), and a regression back to “traditional” manual forecasting.

How to conduct Time Series Forecasting with Anomaly Detection

Now that the challenges have been laid out, it’s worth considering what a robust, intuitive solution looks like – we see it requiring the following steps:

  1. Break down historical data into components
    1. Trend
    2. Seasonality
    3. Residual
  2. Identify anomalies from the historical data
  3. Remove anomalies from the trend and seasonality components
  4. (Optional) Include information about known events (e.g. sales promotions, Black Friday, Christmas)
  5. Project ahead based on trend & seasonality components that:
    1. Reflect the latest behaviour in trend and seasonality
    2. Are free of anomalies

Here’s an example of what that could look like, leveraging the code from examples discussed previously (using the first 50 datapoints as training data, and the last 50 as test data)

Actual data, without anomalies present – the SARIMAX forecasting method offers reasonable accuracy:

Fig 2: Historical data, without anomalies

Fig 3: Actual vs. Predicted, without anomalies, using SARIMAX – reasonable accuracy

Actual data, with anomalies present – the SARIMAX forecasting method offers poor accuracy, especially immediately following an anomaly.  This shows the importance of identifying and removing anomalies from historical datasets – otherwise, the quality of the forecast suffers dramatically.

Fig 4: Historical data, with multiple anomalies

Fig 5: Actual vs. Predicted, without anomalies, using SARIMAX – poor accuracy surrounding anomalous points and after

Get Started With Anomaly Detection Today

Avora is always on hand, utilising both Ai analytics and machine learning to uncover unexpected changes in your data.

Get Started

When does Forecasting break down?

Forecasting is not foolproof – there are situations where, even with anomaly detection in place, the forecast may not offer good accuracy – below are some broad examples:

  • New & Unknown events
  • Black Swan/Paradigm shifts (e.g. Covid-19)
  • Changes in seasonality
  • Changes in trend
  • Human-forced actions – e.g. stopping spend on a certain marketing channel
  • Unknown external influencing factors – e.g. competitors changing pricing

Parting Thoughts & Further Reading

  • Forecasting is doubtless a useful tool – with its limitations, as with anything!
  • The underlying assumptions are important
  • Usually assumes that behaviour carries on as before – so you must accept there will be paradigm shifts – that’s where human intuition will beat a system, any day of the week
  • Make sure the forecasts are explainable in context of the historical data
  • Always check forecasts against actual performance – a large deviation between the 2 (think anomaly detection?) may be an indicator of poor model performance, or the world having dramatically changed
  • Take extra care when behaviour is rapidly evolving – i.e. historical data is no longer a reliable predictor of future performance