We used the pattern mining engine of our software to analyze a prominent data set with about 500 attributes, covering demographics, economics, infrastructure, etc. for all of the 3,007 US counties. We were interested why some counties experience higher COVID-19 death rates -- the number of deaths in terms of population size.
Our findings yield a better understanding of this raging pandemic, can assist local authorities to predict future COVID-19 death rates, can inform health policy on important correlations, can help with the allocation of resources, such as testing kits and stations , and can aid in targeted community information campaigns.
Our analysis reveals that it is rarely just one feature that exposes a county to a higher than average COVID-19 death rate. Rather, it is usually a combination of features which when true at the same time provide a vivid narrative of these fateful circumstances.
In most cases, the number of features required to sufficiently describe a pattern is just a few. This makes them easy to explain, leading to a better understanding of the hidden processes and relations. Essentially, each pattern is a knowledge facet told in the domain's language.
Our AI algorithm automatically identified 297 sets of counties. The COVID-19 Risk Alert Map on the left colors these counties in shades of blue; darker shaded counties appear in more than one pattern.
We found that 985 US counties are at high risk, and Mississippi, Louisiana, and Georgia have the highest density of high-risk counties at a coverage of 80-90%.
In the following, we present three examples from the set of 297 patterns we found. In our analysis we only considered counties that had at least one COVID-19 death incidence.
This is a map of US counties that all have one thing in common -- on average they have a higher COVID-19 death rate than all of the US counties averaged together. Follow this link to learn why these counties are at high risk. What else do they have in common?
There is more than one set of features that could put a county at higher risk. On the left is another map of US counties for which our software has identified a feature pattern that associates with higher than usual COVID-19 death rate on average. Follow this link to learn why these counties are at high risk. What are their common properties?
Here is a third map of counties unified by a set of features that indicate an unusually high COVID-19 death rate on average. Click this link to learn what these critical features and their values ranges are.
Correlation can reveal a linear association between two variables, such as exercise and health. But important correlations are often hidden with conventional correlation analysis that uses all data points, here the counties, indiscriminately. Conversely, our pattern mining engine can reveal sets of counties where certain important correlations hold and so enable more targeted COVID-19 testing and health policy making.
The scatterplot on the left makes one think that there is no apparent correlation between severe housing cost burden and COVID-19 death rate; the correlation is a mere 10%. But in fact there are county patterns where such a correlation holds. Click this link to find out what they are.
Similarly, per the scatterplot on the left, there is no apparent correlation between a county's unemployment rate and its COVID-19 death rate; the correlation factor is just 13%. But do not rush to this conclusion; we found that there IS a correlation but only for counties that fulfill certain population criteria, as elaborated on this page.
Prediction is the ultimate goal of data analytics. For the ongoing pandemic, health officials are highly interested in identifying the US counties where the COVID-19 death rate might spike next. It would allow authorities to direct test kits, allocate hospital care, increase contact tracing, alert the community, and so on. The patterns we find are essentially predictions of the response variable -- higher than average COVID-19 death rate.
We followed the COVID-19 county data over time. We found that the pattern descriptions did not change much -- solid evidence that our pattern mining engine delivers statistically robust and reliable results. The three maps to the left show the three sets of counties, set 1-3, at the time of initial analysis (May 10) and one month later (June 10).
We see that for quite a few of these counties the COVID-19 death rate has markedly increased; the shallow blue coloring has turned to dark blue. Other counties previously not affected, but fitting the respective pattern profile, have now seen their first COVID-19 fatalities. They were previously invisible (grey) but are now shaded in shallow blue.
This shows our software's capability to predict a county's future fate in the ongoing pandemic. The three sets highlighted here were randomly selected from the patterns we found; we did not 'cherry pick' the best results. For all patterns, the average COVID-19 death rate increased 2-3 times the US average during this time frame. Click here to learn more.
We can think of every US county as an (observational) experiment; each has certain characteristics which makes it unique, and similar to some others at the same time. Our pattern mining engine looks for regions in this feature space that are occupied with similar counties that all respond in a similar way to a given target variable of interest -- the COVID-19 death rate.
The criteria that determine ‘similarity’ are grounded in sophisticated statistical pattern mining -- a core technology we market. It can be applied to any domain, not just to predict the outcomes for a pandemic disease. Contact us to find out how we can help you to find important features in your data. Chances are high that we can help you.
If you like to know more about our approach to pattern mining in high-dimensional feature spaces please visit this page. It offers a gentle introduction to the subject. For more a more rigorous treatment we plan to post a few technical briefs on our technology in the near future.
Copyright © 2020 Akai Kaeru, LLC - All Rights Reserved.