Overview

Each week, we generate ensemble forecasts of incident COVID-19 hospitalizations over the next four weeks that combine the forecasts from different teams. This is helpful because it gives a sense of the general consensus forecast across all teams. Previous work in infectious disease forecasting and other fields has also shown that ensemble forecasts are often more accurate than any individual model that went into the ensemble. Readers who are more familiar with the forecasting methods may also find it helpful to explore forecasts from individual models to obtain a more detailed understanding of the underlying uncertainty and the range of projections generated by models built on different assumptions.

We have one pre-print manuscript that describes the performance of the ensemble forecasts through July 2020, as well as a second pre-print that evaluates the predictive performance of the ensemble and dozens of other models from April 2020 through October of 2021. Additionally, we have compared the performance of trained and untrained probabilistic ensemble forecasts of COVID-19 cases and deaths in another pre-print manuscript.

Summary of how the ensembles are built

Ensemble build timing

On Monday afternoon, we update the COVID-19 Forecast Hub ensembles using the most recent submission from each team.

Overview of major updates to methodology

The following is a timeline of major changes to the ensemble methods:

  • From April 13 to July 21 2020, the ensemble was created by taking the arithmetic average of each prediction quantile for all eligible models for a given location.
  • Starting on the week of July 28 2020, we instead used the median prediction across all eligible models at each quantile level.
  • We created ensemble forecasts for hospitalizations due to COVID-19 for the first time on the week of December 7, 2020.
  • Starting the week of September 27, 2021, the official COVIDhub-ensemble only generates forecasts of incident cases for forecast horizons up to 1 week ahead and forecasts of incident hospitalizations for forecast horizons up to 14 days ahead.
  • Starting the week of November 15, 2021 we used a weighted ensemble method for forecasts of incident and cumulative deaths. In this ensemble, the ten component models with the best performance as measured by their Weighted Interval Score (WIS) in the 12 weeks prior to the forecast date are included. These component models are assigned weights that are a function of their relative WIS during those 12 weeks, with models that have a stronger record of accuracy receiving higher weight.
  • Starting the week of January 29, 2023 we stopped generating forecasts of incident cases in the trained ensemble.
  • Starting the week of February 20, 2023 we stopped accepting forecasts of incident cases and no longer built an ensemble for case forecasts.
  • Starting the week of March 6, 2023 we stopped generating ensemble forecasts of incident or cumulative deaths.

Ensemble models

The forecast hub currently collaborates with CDC on the production of four ensemble forecasts each week:

1. COVIDhub-4_week_ensemble

This ensemble produces forecasts of incident cases (discontinued as of February 2023), incident deaths, and cumulative deaths (discontinued as of March 2023) at horizons of 1 through 4 weeks ahead, and forecasts of incident hospitalizations at horizons of 1 through 28 days ahead. For all of these targets, the ensemble forecasts are computed as the equally-weighted median of all component forecasts at each location, forecast horizon, and quantile level.

In the past, forecasts of cases and hospitalizations produced by this ensemble have shown unreliable performance at longer horizons. This ensemble is produced for research purposes only, and the forecasts are not intended for use as inputs to public health decision making.

2. COVIDhub-trained_ensemble

This ensemble produces forecasts of incident cases (discontinued as of January 2023), incident deaths and cumulative deaths (both discontinued as of March 2023) at horizons of 1 through 4 weeks ahead, and forecasts of incident hospitalizations at horizons of 1 through 28 days ahead. For all of these targets, the ensemble forecasts are computed as a weighted median of the ten component forecasts with the best performance as measured by their WIS in the 12 weeks prior to the forecast date.

In the past, forecasts of cases and hospitalizations produced by this ensemble have shown unreliable performance at longer horizons. This ensemble is produced for research purposes only, and the forecasts are not intended for use as inputs to public health decision making.

3. COVIDhub-ensemble

This ensemble produces forecasts of incident cases at a horizon of 1 week ahead (discontinued as of February 2023), forecasts of incident hospitalizations at horizons up to 14 days ahead, and forecasts of incident and cumulative deaths at horizons of up to 4 weeks ahead (discontinued as of March 2023). As documented in this analysis, these are the horizons at which these forecasts have shown fairly reliable performance in past forecasts. The forecasts of hospitalizations (and previously those of cases) are calculated as an equally-weighted median of eligible forecasts, while the forecasts of deaths were calculated as a weighted median of eligible forecasts using the methodology described for the trained ensemble above. The forecasts of incident and cumulative deaths from the COVIDhub-ensemble were therefore identical to the corresponding forecasts from the COVIDhub-trained_ensemble.

4. COVIDhub_CDC-ensemble

This ensemble pulls forecasts of hospitalizations from the COVIDhub-4_week_ensemble. The set of horizons that are included is updated regularly using rules developed by CDC based on recent forecast performance. To be included in the COVIDhub_CDC-ensemble, models must also meet the eligibility criteria for the COVIDhub-4_week_ensemble (see “Detailed Eligibility Criteria” below for an example).

Detailed eligibility criteria

Forecasts submitted by 3pm ET on Monday night are guaranteed consideration for inclusion in the ensemble for that week, as long as the forecast was associated with a date since the previous Tuesday.

To be included in the ensemble, a team’s designated model must meet certain specified inclusion criteria. We require that forecasts include a full set of predictive quantiles (i.e., 23 for hospitalizations; see Technical README for details). Additionally, forecasts must include all forecast horizons that will be produced by a given ensemble to be included in that ensemble. For example, the COVIDhub-4_week_ensemble requires that a model provide hospitalization forecasts at all of the 1 through 28 days ahead to be included, but the COVIDhub-ensemble forecast of hospitalizations only requires forecasts of hospitalizations at a horizon of 1 to 14 days ahead for inclusion.

To be included in the COVIDhub_CDC-ensemble, models must also meet the eligibility criteria for the COVIDhub-4_week_ensemble. For example, the COVIDhub_CDC-ensemble may use hospitalization forecasts at horizons up to 25 days, but models are selected from the COVIDhub-4_week_ensemble that requires forecasts of incident hospitalizations at horizons of 1 through 28 days ahead. If a team submits a forecast of incident hospitalizations for 1 to 25 days ahead, it will not be included in the COVIDhub_CDC-ensemble (or the COVIDhub-4_week_ensemble), although it will be included in the COVIDhub-ensemble (which only requires horizons up to 14 days ahead).

There are also some eligibility criteria that have been used specifically for the cumulative deaths and incident hospitalizations targets:

Eligibility criteria for cumulative deaths

Note: Starting the week of March 6, 2023 we stopped accepting death forecasts and no longer build an ensemble for death forecasts.
For forecasts of cumulative deaths, we also perform two additional checks for internal consistency. By definition, cumulative deaths cannot decrease over time (other than possibly because of revisions to reporting). We therefore require that (1) a team assigns at most a 10% chance that cumulative deaths would decrease in their one-week ahead forecasts, and (2) at each quantile level of the predictive distribution, that quantile is constant or increasing over time. Additionally, models that project death values that are larger than the population size of the geographic location are not included. Before the week of July 28, we performed manual visual inspection checks to ensure that forecasts were in alignment with the ground truth data; this step is no longer a part of our weekly ensemble generation process. Details on which models were included each week in the ensemble are available on GitHub.

Eligibility criteria for incident hospitalizations

To be eligible for inclusion in the hospitalizations ensemble between December 7, 2020 and July 12, 2021, individual model forecasts had to meet a check for consistency with recent observed data. We have periodically made minor updates to the criteria since the introduction of the ensemble forecasts for hospitalizations, and stopped using these criteria with hospitalization forecasts generated the week of July 19, 2021:

  • On the weeks of December 7, 2020 through December 21, 2020, we required that the mean daily point prediction for a given location during the first seven days (e.g., covering Tuesday, December 8 through Monday, December 14 for forecasts submitted December 7) must be at least as large as the mean reported daily confirmed hospital admissions over the previous seven days minus four times the standard deviation of the reported daily confirmed hospital admissions data for that location over the most recent 14 days. This check was performed separately for each location, but a given model was included in the ensemble for all locations if it passed this check in at least 75% of jurisdictions, and excluded for all locations otherwise.
  • On the weeks of December 28, 2020 and January 4, 2021 we used the check described above, but inclusion was determined separately for each location.
  • Starting on the week of January 11, 2021 the check is based on the mean of the predictive median during the first seven days rather than the mean point prediction. Model inclusions are still determined separately for each location.
  • Starting on the week of February 8, 2021 we updated this check to include an upper bound. The new check requires that the mean of the predictive median during the first seven days after the forecast date is within the mean reported daily confirmed hospital admissions over the previous seven days plus or minus four times the standard deviation of the reported daily confirmed hospital admissions data for that location over the most recent 14 days. This check is performed separately for each location.
  • Starting on the week of February 15, 2021 we updated this check to so that the minimum width of the acceptance band is 2 (i.e., we compare to the mean of the observed values over the last 7 days plus or minus 1 if the standard deviation is less than 0.25). This is necessary to produce a forecast for locations with all reported zeros in the last two weeks.
  • Starting on the week of July 19, 2021, these exclusion criteria were not used for the hospitalizations ensemble.

For all checks described above, daily reported hospital admissions are taken from HealthData.gov.