In the following discussion we turn the clock one month forward and explore whether the patterns we found are indeed able to predict a higher than average growth in the COVID-19 death rates.
The initial analyses were conducted using data from May 10, 2020. On that day the US county-wise COVID-19 death rate average was 16.8 deaths per 100k population; on June 10 that number grew to 24.1.
The bar chart on the left compares these US-wide numbers with those obtained for the three patterns -- county sets #1-3 -- we discussed earlier. While these sets already had higher than average death rates in May, this discrepancy became even more dramatic in June.
The May to June increase of the death rate for all US counties was 7.3 deaths/100k on average. Conversely, the death rate increase for county sets #1-3 was consistently higher, 2-3 times the US-wide growth.
The bar chart on the right visualizes this remarkable difference. It impressively demonstrates the power of our software to identify patterns of data that will exhibit higher than average growth in the target variable. In the following we provide more detailed information for the three sets of counties we used as an example -- sets #1-3.
We continued to monitor the county COVID-19 death rates. The US wide average on July 10 was 28.3 deaths per 100k. The bar chart above shows that July widened the discrepancy between the US average death rate and that of our three sets of counties even more.
The bar chart above clearly shows that the growth rate of the three county sets we selected as examples also grew at a factor 2-4 of the US average in July. While set #2 grew somewhat slower in June it exploded in July. These charts confirm the long term value of our predictions.
This pattern contains 106 US counties, all described by an aging population with high poverty rate, living in rural areas (see this page for more detail). In May 2020, 45 (42%) of these counties had a COVID-19 death rate greater than the then-prevailing US county average. In June this number grew to 53 (50%) with 9 counties joining this subset, while only one county barely left it.
The figure on the left plots the June county death rates sorted by the May county death rates, with the respective US-wide county death rate as the zero line. We chose a line plot over a bar chart since it better shows the peaks (each tick on the x-axis is one county).
We call the plot 'adjusted' since the blue curve represents the May death rates minus the US average in May, and the red curve represents the June death rates minus the US average in June.
In this plot it can be clearly observed that the growths in county death rates in June over May were vastly higher than the respective death rate declines. The pattern clearly predicts these strong upward trends.
The figure on the right clarifies it even more. Here we plot the COVID-19 growth rate for each county in the pattern, adjusted for the US-wide growth. This is the real growth generated by the pattern itself, corrected for the overall trend. While some counties down-trended slightly, a large number experienced strong upward growth.
This stark contrast is also expressed in the two one-sided standard deviations, where the standard deviation for the counties experiencing positive adjusted growth is 43.7 deaths/100k, while the standard deviation for the others is a mere 6.0 (13%).
The 115 US counties in this pattern share three common attributes - their residents tend to sleep less than recommended by the CDC, they are under-educated and they tend to not have health insurance (refer to this page on how we derived this).
In May 2020, 51 (44%) of these counties had a COVID-19 death rate greater than the then-prevailing US county average. In June this number grew to 56 (49%) with 25 counties joining this subset, but 20 leaving it. So this pattern is somewhat more diverse than the one in set #1 above.
The figure on the left plots the June county death rates sorted by the May county death rates, with the respective US-wide county death rate as the zero line.
Even though it appears that many counties have recovered from their above-average COVID-19 death rates those who have not or are just seeing the spread's impact have been experiencing strong increases. This is clearly observed by the strong upward spikes.
The plot on the adjusted growth rate makes this even clearer. While some counties down-trended (left side), a larger number of them experienced a strong upward growth (right side).
The standard deviation for the counties experiencing positive adjusted growth is 54.4 deaths/100k and the standard deviation for the others is 21.6 (40%). While the difference between the two is not as impressive as for set #1, it is still significant and again shows that our pattern can predict upward trends that are much beyond overall global trends.
This pattern is composed of 86 counties whose common traits are low Asian but high minority population where an above-average number of black children live in poverty (this page provides more detail on how we derived this).
In May 2020, 44 (51%) of these counties had a COVID-19 death rate greater than the then-prevailing US county average. In June this number grew to 51 (59%) with 9 counties joining this subset and only 2 leaving it.
As before, the figure on the left plots the June county death rates sorted by the May county death rates, with the respective US-wide county death rate as the zero line.
It is clear that there is a strong upward trend from May to June. Counties that already had a high death rate grew even more vigorously, while those that declined only minimally did so. We can also see a significant uptick in the death rate for counties on the low end in May.
The plot on the adjusted growth rate makes this even clearer. Only two counties (left side of the plot) showed a significant downward trend in the death rate. About half of the counties showed modest growths or declines, while the other half exhibited strong or very strong growths.
The standard deviation for the counties experiencing positive adjusted growth is 26.5 deaths/100k, while the standard deviation for the others is 8.5 (32%). Again, this shows that our pattern mining engine can predict upward trends that are irrespective of overall global trends.
Our analysis in May identified a total of 279 sets of counties that all had COVID-19 death rates above the US average. In June, 273 of these sets experienced a death rate growth in excess of the US average. The growths of the remaining 6 sets were at the US average.
In other words, our patterns were able to predict extraordinary growth 98% of the time, while the remaining 2% grew at the average pace, and none slowed in growth below the US-average.
In July the predictions continued to be very reliable. The patterns we found in May predicted a growth above US average 80% of the time in July. No pattern fell below US average growth. We purposely did not refresh our predictions with new data to show this persistence.
The examples we have presented here clearly demonstrate that our software can identify patterns in high-dimensional data that have highly predictive power. In the case of COVID-19 we identified counties that fit a certain profile but where the spread of the disease had not hit yet. Others grew even more. Recognizing these trends early can be of tremendous help in planning and advocating for the appropriate set of resources to ease the impact of the disease.
It is important to note that these analyses are not specific to COVID-19 or even county-related data. Our software can give the same kinds of insights for any domain. There are always candidates that have potential but have not risen to that potential yet. They are prospects that fit a certain profile. But the hard question is what this critical profile actually is when there are dozens, or hundreds, or even thousands of parameters.
Our software can help analysts navigate this jungle, find what really matters, and explain it in easy terms. Contact us here to discuss your specific scenario with us.