COVID-19 Data Modeling

Free Tool for Smoothing COVID-19 Density Data

COVID-19 density data is generally too noisy to fit to multiple component density models without pre-smoothing since the noise spikes fit as individual peaks.
Even the China data can be rendered suitable for multiple component density modeling by an effective smoothing procedure.

From our perspective, most of the data modeling of COVID-19 data where the data are first processed using a smoothing algorithm is not being done especially well. Simple and exponential moving averages are being widely used, the worst possible form of smoothing if the identification of an apex or post-apex timing is part of the modeling (since such algorithms introduce lag). For the EMA, large spikes in the data will propagate forward and take a greater time to wash out than will smaller spikes. Algorithms that use regression with centrality, such as Loess, are better, but we have also found its smoothing weak, and component peaks were not identified as a consequence of both attenuation and loss of higher moment information.

The description of the approach we use for density smoothing is here.

In multiple density fitting, models must be updated as new peaks are detected as appearing in the the data. For this type of modeling to work effectively, the accuracy of the smoothing must be exceptional. Anything less and smaller non-local-maxima peaks, those without an apex which appear in the rise and decay of the larger local maxima component peaks, will not be identified and fitted, and the models will not fit to statistical significance in all parameters.

We created our tool using the data furnished from the site:
https://ourworldindata.org/coronavirus-source-data

You will need to use either the GitHub link:
Download our complete COVID-19 dataset (CSV)
or the European CDC link with all four metrics:
https://covid.ourworldindata.org/data/ecdc/full_data.csv

The tool will process either of these CSV files and add eight additional columns to a new CSV file which you specify once the data have been processed. Columns tcdens15, tcdens17, tcdens19, tcdens21 contain the smoothed cases density for the three-pass modified Savitzky-Golay procedure with lengths of 15, 17, 19, and 21 days. Columns tddens15, tddens17, tddens19, tddens21 contain the smooth deaths density for the 15, 17, 19, and 21 day window lengths. The program will generate these data for all countries in the CSV file in a single step.

Only the total cases and total deaths columns, the cumulative data, is processed to create the eight additional columns of smooth densities. The columns with the daily cases and deaths are not used nor does the program process the additional columns in the GitHub data.

This is an Excel plot of the four tddens (deaths) columns for the US portion of the smoothed data. The 15-point window will have the greatest detail, but also the higher S/N. The 21-point window will be exceptionally smooth, but detail will be somewhat diminished.
This is an Excel plot of the four tcdens (cases) columns for the US portion of the smoothed data.

If you add your own data, use the European CDC data with its four columns. Import the file into Excel, add your own data to the total cases or total deaths columns and re-save the file as a CSV. You can insert anything you like, hospitalization admissions, discharges, ICU cases, etc. into the columns to be processed.

The download consists of a single 32-bit Windows exe file. You will probably see a browser warning in the download. The density.exe file is digitally signed and safe; it will not run if it has been tampered in any way. You can place this file anywhere you wish on your machine. There are only two steps, the selection of the input and output data files.

We found that the 17-length filter was usually the best for multidensity fitting. To fit multiple densities to smoothed COVID-19 data, with a peak containing a third moment or skewness parameter, it was generally necessary to get in the vicinity of 20 ppm unaccounted variance (r²=0.999980) in order for all parameters in all peaks, including the smallest ones, to fit to a 95% statistical significance. We found this smoothing essential for multi-component density fits of the China data. An example is here.

Please bear in mind density modeling only projects the current state of affairs in a kind of best-case scenario where no additional peaks are assumed to subsequently occur. This smoothing method, however, will allow newly appearing peaks to be swiftly detected, as a new peak must be added at the right of the data in order to bring the statistical fit into significance in all parameters.

Please email questions or comments to ron@aistsw.com.