We have a pre-print manuscript that describes the performance of the ensemble forecasts.


Each week, we generate ensemble forecasts of cumulative and incident COVID-19 deaths, incident COVID-19 cases, and incident COVID-19 hospitalizations over the next four weeks that combine the forecasts from a designated model submitted by each team. This is helpful because it gives a sense of the general consensus forecast across all teams. Previous work in infectious disease forecasting and other fields has also shown that ensemble forecasts are often more accurate than any individual model that went into the ensemble. Readers who are more familiar with the forecasting methods may also find it helpful to explore forecasts from individual models to obtain a more detailed understanding of the underlying uncertainty and the range of projections generated by models built on different assumptions. We published a medrxiv pre-print in August 2020 describing the performance of the ensemble forecast during the first few months of the pandemic.

Summary of how the ensemble is built

Typically on Tuesday by noon eastern time, we update our COVID-19 Forecast Hub ensemble forecast and interactive visualization using the most recent forecast from each team submitted since the previous Tuesday. From April 13 to July 21 2020, the ensemble was created by taking the arithmetic average of each prediction quantile for all eligible models for a given location. Starting on the week of July 28, we instead used the median prediction across all eligible models at each quantile level.

To be included in the ensemble, a team’s designated model must meet certain specified inclusion criteria. Forecasts must be submitted within the week prior to every Tuesday to be included in that week’s ensemble. For forecasts of cumulative deaths, we also perform two additional checks for internal consistency. By definition, cumulative deaths cannot decrease over time (other than possibly because of revisions to reporting). We therefore require that (1) a team assigns at most a 10% chance that cumulative deaths would decrease in their one-week ahead forecasts, and (2) at each quantile level of the predictive distribution, that quantile is constant or increasing over time. Additionally, models that project case or death values that are larger than the population size of the geographic location are not included. Before the week of July 28, we performed manual visual inspection checks to ensure that forecasts were in alignment with the ground truth data; this step is no longer a part of our weekly ensemble generation process. Details on which models were included each week in the ensemble are available in the ensemble metadata folder.

We created ensemble forecasts for hospitalizations due to COVID-19 for the first time on the week of December 7, 2020. This is a beta version of the ensemble, and it has not been assessed for accuracy or calibration. To be eligible for inclusion in the ensemble, individual model forecasts must meet a check for consistency with recent observed data. We have periodically made minor updates to the criteria since the introduction of the ensemble forecasts for hospitalizations:

  • On the weeks of December 7, 2020 through December 21, 2020, we required that the mean daily point prediction for a given location during the first seven days (e.g., covering Tuesday, December 8 through Monday, December 14 for forecasts submitted December 7) must be at least as large as the mean reported daily confirmed hospital admissions minus four times the standard deviation of the reported daily confirmed hospital admissions data for that location over the past 14 days. This check was performed separately for each location, but a given model was included in the ensemble for all locations if it passed this check in at least 75% of jurisdictions, and excluded for all locations otherwise.
  • On the weeks of December 28, 2020 and January 4, 2021 we used the check described above, but inclusion was determined separately for each location.
  • Starting on the week of January 11, 2021 the check is based on the mean of the predictive median during the first seven days rather than the mean point prediction. Model inclusions are still determined separately for each location.

For all checks described above, daily reported hospital admissions are taken from HealthData.gov.