We have one pre-print manuscript that describes the performance of the ensemble forecasts through July 2020, as well as a second pre-print that evaluates the predictive performance of the ensemble and dozens of other models through all of 2020.

Overview

Each week, we generate ensemble forecasts of cumulative and incident COVID-19 deaths, incident COVID-19 cases, and incident COVID-19 hospitalizations over the next four weeks that combine the forecasts from a designated model submitted by each team. This is helpful because it gives a sense of the general consensus forecast across all teams. Previous work in infectious disease forecasting and other fields has also shown that ensemble forecasts are often more accurate than any individual model that went into the ensemble. Readers who are more familiar with the forecasting methods may also find it helpful to explore forecasts from individual models to obtain a more detailed understanding of the underlying uncertainty and the range of projections generated by models built on different assumptions. We published a medrxiv pre-print in August 2020 describing the performance of the ensemble forecast during the first few months of the pandemic.

Summary of how the ensemble is built

Typically on Monday evening or Tuesday morning, we update the COVID-19 Forecast Hub ensemble forecast using the most recent forecast from each team submitted since the previous Tuesday.

The following is a timeline of major changes to the ensemble methods:

  • From April 13 to July 21 2020, the ensemble was created by taking the arithmetic average of each prediction quantile for all eligible models for a given location.
  • Starting on the week of July 28 2020, we instead used the median prediction across all eligible models at each quantile level.
  • We created ensemble forecasts for hospitalizations due to COVID-19 for the first time on the week of December 7, 2020.
  • Starting the week of November 15, 2021 we used a weighted ensemble method for forecasts of incident and cumulative deaths. In this ensemble, the ten component models with the best performance as measured by their Weighted Interval Score (WIS) in the 12 weeks prior to the forecast date are included. These component models are assigned weights that are a function of their relative WIS during those 12 weeks, with models that have a stronger record of accuracy receiving higher weight.

Detailed eligibility criteria

Forecasts submitted by 3pm ET on Monday night are guaranteed consideration for inclusion in the ensemble for that week, as long as the forecast was associated with a date since the previous Tuesday.

To be included in the ensemble, a team’s designated model must meet certain specified inclusion criteria.
We require that forecasts include a full set of 23 quantiles to be submitted for each of the one through four week ahead values for forecasts of deaths, a full set of 7 quantiles for the one through four week ahead values for forecasts of cases, or a full set of 7 quantiles for the one through twenty-eight day ahead values for forecasts of hospitalizations (see Technical README) for details).

For forecasts of cumulative deaths, we also perform two additional checks for internal consistency. By definition, cumulative deaths cannot decrease over time (other than possibly because of revisions to reporting). We therefore require that (1) a team assigns at most a 10% chance that cumulative deaths would decrease in their one-week ahead forecasts, and (2) at each quantile level of the predictive distribution, that quantile is constant or increasing over time. Additionally, models that project case or death values that are larger than the population size of the geographic location are not included. Before the week of July 28, we performed manual visual inspection checks to ensure that forecasts were in alignment with the ground truth data; this step is no longer a part of our weekly ensemble generation process. Details on which models were included each week in the ensemble are available on GitHub.

To be eligible for inclusion in the hospitalizations ensemble between December 7, 2020 and July 12, 2021, individual model forecasts had to meet a check for consistency with recent observed data. We have periodically made minor updates to the criteria since the introduction of the ensemble forecasts for hospitalizations, and stopped using these criteria with hospitalization forecasts generated the week of July 19, 2021:

  • On the weeks of December 7, 2020 through December 21, 2020, we required that the mean daily point prediction for a given location during the first seven days (e.g., covering Tuesday, December 8 through Monday, December 14 for forecasts submitted December 7) must be at least as large as the mean reported daily confirmed hospital admissions over the previous seven days minus four times the standard deviation of the reported daily confirmed hospital admissions data for that location over the most recent 14 days. This check was performed separately for each location, but a given model was included in the ensemble for all locations if it passed this check in at least 75% of jurisdictions, and excluded for all locations otherwise.
  • On the weeks of December 28, 2020 and January 4, 2021 we used the check described above, but inclusion was determined separately for each location.
  • Starting on the week of January 11, 2021 the check is based on the mean of the predictive median during the first seven days rather than the mean point prediction. Model inclusions are still determined separately for each location.
  • Starting on the week of February 8, 2021 we updated this check to include an upper bound. The new check requires that the mean of the predictive median during the first seven days after the forecast date is within the mean reported daily confirmed hospital admissions over the previous seven days plus or minus four times the standard deviation of the reported daily confirmed hospital admissions data for that location over the most recent 14 days. This check is performed separately for each location.
  • Starting on the week of February 15, 2021 we updated this check to so that the minimum width of the acceptance band is 2 (i.e., we compare to the mean of the observed values over the last 7 days plus or minus 1 if the standard deviation is less than 0.25). This is necessary to produce a forecast for locations with all reported zeros in the last two weeks.
  • Starting on the week of July 19, 2021, these exclusion criteria were not used for the hospitalizations ensemble.

For all checks described above, daily reported hospital admissions are taken from HealthData.gov.