Forecast accuracy refers to the between forecasts and corresponding actual sales.

This article is a section from my upcoming book "An Introduction to Probabilistic Planning and Forecasting" edited for the current context. Readers are invited to comment here. Any comments provided may be used to improve the text in the book and if used credit will be given, with permission.

In the time-series forecasting domain, the concepts of accuracy and precision are often confused and the terms frequently used interchangeably. Before the full benefit of demand forecasts can be achieved, it is imperative that they are measured correctly. Making the proper distinction between precision and accuracy is the essential first step to do so. [...] In previous articles, I described the difference irrespective of domain and how traditional metrics like the APE's need a minor adjustment to remove polluting one with the other. In this article, the difference is explained specifically applied to the time-series forecasting domain.

As we learned in chapter 3 a probabilistic forecast is expressed as time-series of probability distributions, where each time period will generally have a different distribution. Deterministic forecasts, such as statistical forecasts, are typically expressed as time-series of single points, plus some measure of the dispersion of the error residuals.

[for LinkedIn readers: a forecast error residual is the expected difference between the forecast and the actuals at time of forecasting, where a forecast error is the recorded difference between the forecast and the actuals after they are measured.]

This dispersion is generally assumed to be independent and identically distributed, assuming a normal distribution around the point-forecast. Hence we can view a deterministic forecast as a naive probabilistic forecast. This perspective is instrumental in illustrating the difference between precision and accuracy in a manner that can be applied across all types of forecasts. Most statistical forecasts include a standard deviation or another size indicator (such as MAD) of the residuals, which together with their unbiased normal distribution assumption uniquely defines their dispersion. Figure 6.12 below shows an example of a probabilistic forecast with its median and a deterministic forecast with its residual distribution for a single item in a single time period. Demand quantity is drawn along the horizontal axis, and the probability that a quantity could occur is drawn along the vertical axis for the distribution curves.

Figure 6.12: distributions of a deterministic forecast (left) and a probabilistic forecast (right) and for a single period.

The similarity between the two should be evident. The only noticeable difference between them is that the probabilistic distribution is skewed whilst the deterministic one is not. The numerical precision of these two distributions can now be determined. Remember, precision is a measure of how close together forecast values are. If these distributions are accurate representations of the expected uncertainty of the unknown future value then each distribution itself contains the information required to determine a single-item, single-period precision of their respective forecasts. All that is needed is to pick the desired precision metric and apply it. The various precision metrics are further explored in section 13.2. Note that precision and the precision metrics described in this section describe one aspect of precision, numerical precision. The other aspect of precision, granularity, is described in the next section. Any references in this section to precision refer only to numerical precision. Figure 6.13 below shows 3 example precision metrics applied to the distribution of figure 6.12.

Figure 6.13: Three precision metrics (standard deviation, interquartile range, and 95% confidence range) applied to two distributions. Percentiles labeled for each.

Each of these metrics states something about how narrow the range of possible values is that is predicted. Each has benefits and drawbacks. Ideally, the metric measures the widest possible range, without sensitivity to outliers. Narrower metrics tend to be less sensitive than wider metrics, but only if they allow for the skewness of the distribution. For example, the standard deviation is much narrower than the 95% confidence range but more sensitive to outliers. On the other hand, the standard deviation can be aggregated across multiple items or multiple time periods, whilst the 95% confidence range cannot. These trade-offs and more are explored in-depth in section 13.2.

One important observation is that precision can be determined at the time of forecasting. In the above discussion and the example precision metrics the actual demand does not occur. We do not need to wait until we can record the actual demand to determine precision. The same cannot be said for accuracy. Accuracy states how close the forecast is to actual demand. Note that this same distinction holds in general, beyond the time-series forecasting domain. If in the example using bullets at a target in [...] the top image of the current article (explained in this LinkedIn article) one were to imagine that the target is invisible it would be impossible to tell the difference between different accuracies, but the differences in precision would still be evident. In forecasting, that is exactly the situation: the target is unknown until after the actuals are measured.

One assumption made in the above definition of precision is that the distributions are accurate representations of the uncertainty of the unknown future value. For this assumption to hold we need an orthogonal accuracy metric to the chosen precision metric, and more general, accuracy needs to be an orthogonal forecast quality to precision. Orthogonal means that metrics measure different things without overlap. For example, a person's height and width are orthogonal, but their width and weight are not, since wider people weigh more. For metrics to be complementary, which is an objective of the framework introduced in section 13.2, orthogonality is a necessary, but not a sufficient, requirement.

A common way to measure the accuracy of deterministic forecasts for a single item and single time period is to take the difference between the single-point of the forecast and the actual recorded value. This value is called the forecast error and is the opposite of the accuracy. In other words, it is a measure of inaccuracy rather than accuracy. Figure 6.14 shows this for the two distributions used earlier in this section.

Figure 6.14: Forecast error measured for one actual measurement of a single item and a single time period for two distributions.

In the figure, the forecast error is taken at face value. In practice, it may be transformed in many different ways for different error and accuracy metrics. Common transformations include taking the absolute value of the error and/or dividing it by either the forecasted value or the actual value for a relative forecast error. This error perspective and its transformations provide nice insights into the forecast. Additionally, it is clearly orthogonal to precision. Where the latter depicts the width of the distribution, the former states nothing about the width, only about the middle.

Figure 6.14 shows one important departure from the traditional perspective. In that perspective, forecast error is defined as the difference between the mean of the forecast and the actual value, where figure 6.14 shows it as the difference between the median of the forecast and the actual value. In the deterministic perspective, these two are identical since the distribution is assumed to be normal as was discussed in section 3.4. This means they can be used interchangeably without detriment to the results. In the probabilistic perspective, they are generally different and the median is the key central tendency whilst the mean only has a supporting role. The various reasons are explained in detail in sections 3.4, 10.3, and 13.2, but an important one is that the mean is sensitive to outliers, whilst the median is highly robust.

One problem with using forecast error as a metric for accuracy is that it only assigns a value to one middle point of the distribution and nothing about the rest of the distribution. The assumption above however requires it to cover the entire distribution. This problem cannot be overcome for a single-item, single-period forecast. If a forecast distribution states that there is a 30% probability that demand will be at least equal to a given value and that value actually occurs, is it correct? Or is it 70% accurate or 100% wrong? What if that probability was 50% or 90%? Nassim Nicolas Taleb explains in his famous book "The Black Swan: The Impact of the Highly Improbable" [Taleb, 2007] that for a probabilistic forecast to be accurate if a value is predicted to be 30% probable, it should, in fact, occur 30% of the time. Not only that, every probability forecasted, not just the 30th percentile, needs to occur the stated percent of the time. Naturally, this perspective requires many occurrences to be applicable. This means that forecast accuracy needs to be determined across multiple items or across multiple time periods. The single-item, single-period forecast error remains useful, but primarily as a means to find exceptional issues. Now we will explore the same concept applied to many measurements. For explanatory purposes only, we use a single-item time-series and assume that the time-series is stationary. This means that the distributions of multiple time periods are all identical. In practice, time series will not be stationary but can either be transformed to become stationary or accuracy metrics can be used that do not require stationarity. Similarly, measurements of multiple items in a single time period can be transformed or a metric used that does not require an identical scale.

Figure 6.15: a stationary time series across 52 weeks, with the distribution of quantities accumulated on the right.

The two distributions of the prior figures are fictitious forecasts for this stationary time-series. Figure 6.16 shows those distributions overlaid onto the quantity histogram of Figure 6.15, rotated and mirrored to align them.

Figure 6.16: Forecast error measured across many time periods for two distributions of stationary time series. The height of the bars determines how often an error is counted.

When determining accuracy for a portfolio of items or across multiple time periods it is highly impractical to consider each forecast error value separately. The multiple error measurements are typically combined into one or a few summary metrics. If a simple sum is taken of all the error values it results in the traditional bias metric. If the absolute value is first taken of each error before summation an absolute error metric is returned. Both provide different insights. Dividing the former by the latter gives a scaled bias metric, often called "tracking signal", that is a good complement to the absolute error. Other common transformations are taking the sum of the squares of the errors, or the square root thereof for an error metric in the same unit as the forecast. And for relative errors often the thus aggregated errors are divided by the sum of the actual values, the sum of forecasts, or the sum of their absolute values. Like figure 6.14, figure 6.16 also uses the median instead of the mean to measure forecast error. Again, in the deterministic perspective, this is immaterial since the mean and median of the forecast are identical. In the probabilistic perspective, they are not and the median is the correct one to use. Hence a probabilistic bias metric is similar to a traditional bias, with the one change that it is centered around the median and not the mean, and therefore robust against outliers. In the deterministic perspective, this raises an undesirable side-effect. If bias is caused by high forecast values being increased further by an overly optimistic sales team, the median-based version will be robust against that too. In other words, it will not reflect that. Many advanced statistical forecasting practitioners will watch both the mean-based and median-based versions to discover both chronic bias and outlier bias, and be mindful of differences between them. In the probabilistic perspective, a tail-error metric is a better complement to the median-based bias, removing the significant overlap between the two bias versions. A tail-error metric measures how accurate the extremes of the distributions are. A very basic such metric could be one where the number of actual occurrences above a confidence level is divided by the number of forecasted occurrences above that level. If this number is less than 1 the upper tail of the distribution is negatively biased, meaning the forecast was optimistic about the size of high demand. If the value is greater than 1 the forecast was pessimistic.

All of these metrics have good uses, but none measure the accuracy of the entire distribution and hence fall short of being adequate accuracy metrics for the generic probabilistic scenario. All metrics based on forecast error only measure the accuracy of the middle single-point forecast. And the tail-errors measure only the accuracy of a single point on the upper tail of the distribution. All other points of the distribution are ignored. Figure 6.17 shows the same distributions and actual values of figure 6.16 but from a different angle. If the forecast were perfect the shape of the tops of the bars would be identical to the shape of the forecast distribution.

Figure 6.17: Distribution error measured across many time periods for two distributions of stationary time series.

The key difference between figures 6.16 and 6.17 is that errors are measured horizontally in the first, and vertically in the second. In the former, errors are the difference between the actual quantity and average forecasted quantity. In the latter, errors are the difference between the number of occurrences of an actual quantity and the forecasted number of occurrences of that quantity. Where the traditional approach measures the distance to a single point, this alternate approach measures the distance to the entire distribution. It should be clear that this approach does not work for individual measurements, or even for measurements across very small samples. In our weather example of chapter 3, if the forecast is a 40% chance of rain, and it does not rain, it is unrealistic to make a judgment on the accuracy of that forecast. However, if in around 40% of the cases where the forecast was 40% chance of rain it indeed rained, that forecast would be accurate. The same is true in this probabilistic way of measuring forecast accuracy. The question is, how large does the sample size need to be, to be a fair assessment of accuracy? In practice, results will start to become significantly reflective with sample sizes above around 10 for well-chosen accuracy metrics. This means it can be used to measure accuracy for a single item for a year of monthly forecasts, or a quarter of weekly forecasts, or within a single time period for groups of at least 10 items. or for a single item in a single-period across at least 10 locations. Since probabilistic forecasts perform better as more detail is provided, and companies typically judge the quality of forecasts at levels of aggregation, the cases where this way of measuring accuracy cannot be applied are particularly rare. On the other hand, deterministic metrics, like absolute percentage errors (APE's) and tail errors, only measure local symptoms of forecast accuracy, making their strengths complementary. In section 13.2 a forecast quality framework is introduced where probabilistic accuracy is used to judge a forecast at levels of aggregation, whilst deterministic accuracy is used to pinpoint specific problem cases in the detail. This differentiated approach is required to be able to correlate forecast accuracy to business value, which is generally not possible with deterministic accuracy metrics, whilst being able to identify specific issues, which is not possible with probabilistic accuracy metrics.

This excerpt of the book explained how accuracy and precision in the common sense translate to accuracy and precision in the time-series forecasting domain. In the traditional, statistical, perspective of forecasting this translation is not easy to see, and may even seem unnecessary. In the probabilistic perspective, however, this generalization is critical to correlate the quality of forecasts with their impact on business value. If you have ever wondered how much money a percent increase in forecast accuracy was worth to your business, this perspective is what enables you to assess that. Our use of traditional accuracy metrics is the foremost reason businesses have struggled to determine this value, and the foremost reason forecasts have not improved in this regard in any meaningful measure over the decades. In following articles I will present excerpts from the book that cover:

  • How precision and accuracy complement each other
  • The three parts of precision: arithmetic & stochastic and granularity
  • The forecast quality framework mentioned in this article
  • A complementary numerical precision metric to the Total Percentile Error

If you are interested in probabilistic planning and forecasting please consider joining the "Probabilistic Supply Chain Planning" group here on LinkedIn.

Find all my articles by category here. Also listing outstanding articles by other authors.

What is an accurate forecast?

Forecast accuracy is the degree to which sales leaders successfully predict sales (in both the long and short term). Accurate sales forecasts are essential for making key decisions about short-term spending and deals for key accounts.

Why is forecast accuracy important?

Forecasting allows businesses set reasonable and measurable goals based on current and historical data. Having accurate data and statistics to analyze helps businesses to decide what amount of change, growth or improvement will be determined as a success.

Which forecasting method is most accurate?

Of the four choices (simple moving average, weighted moving average, exponential smoothing, and single regression analysis), the weighted moving average is the most accurate, since specific weights can be placed in accordance with their importance.

What is forecast accuracy in SCM?

While forecast accuracy is rarely 100%, even in the best of circumstances, proven demand forecasting techniques allow supply chain managers to predict future demand with a high degree of accuracy. These predictions save companies money and conserve resources, creating a more sustainable supply chain.