We have a pre-print manuscript that describes the performance of the ensemble forecasts.
Each week, we generate ensemble forecasts of cumulative and incident COVID-19 deaths and incident COVID-19 cases over the next four weeks that combine the forecasts from a designated model submitted by each team. This is helpful because it gives a sense of the general consensus forecast across all teams. Previous work in infectious disease forecasting and other fields has also shown that ensemble forecasts are often more accurate than any individual model that went into the ensemble. Readers who are more familiar with the forecasting methods may also find it helpful to explore forecasts from individual models to obtain a more detailed understanding of the underlying uncertainty and the range of projections generated by models built on different assumptions. We published a medrxiv pre-print in August 2020 describing the performance of the ensemble forecast during the first few months of the pandemic.
Summary of how the ensemble is built
Typically on Tuesday by noon eastern time, we update our COVID-19 Forecast Hub ensemble forecast and interactive visualization using the most recent forecast from each team submitted since the previous Tuesday. From April 13 to July 21 2020, the ensemble was created by taking the arithmetic average of each prediction quantile for all eligible models for a given location. Starting on the week of July 28, we instead used the median prediction across all eligible models at each quantile level.
To be included in the ensemble, a team’s designated model must meet certain specified inclusion criteria. Forecasts must be submitted within the week prior to every Tuesday to be included in that week’s ensemble. For forecasts of cumulative deaths, we also perform two additional checks for internal consistency. By definition, cumulative deaths cannot decrease over time (other than possibly because of revisions to reporting). We therefore require that (1) a team assigns at most a 10% chance that cumulative deaths would decrease in their one-week ahead forecasts, and (2) at each quantile level of the predictive distribution, that quantile is constant or increasing over time. Additionally, models that project case or death values that are larger than the population size of the geographic location are not included. Before the week of July 28, we performed manual visual inspection checks to ensure that forecasts were in alignment with the ground truth data; this step is no longer a part of our weekly ensemble generation process. Details on which models were included each week in the ensemble are available in the ensemble metadata folder.