COVID-19 ANALYTICS: WHO IS AT RISK? WHO WILL BE NEXT?

We used the pattern mining engine of our software to analyze a prominent data set with about 500 attributes, covering demographics, economics, infrastructure, etc. for all of the 3,007 US counties. We were interested why some counties experience higher COVID-19 death rates — the number of deaths in terms of population size.

Our findings yield a better understanding of this raging pandemic, can assist local authorities to predict future COVID-19 death rates, can inform health policy on important correlations, can help with the allocation of resources, such as testing kits and stations , and can aid in targeted community information campaigns.

Pattern analysis -- finding the critical factors that identify counties at risk

Our analysis reveals that it is rarely just one feature that exposes a county to a higher than average COVID-19 death rate. Rather, it is usually a combination of features which when true at the same time provide a vivid narrative of these fateful circumstances.

In most cases, the number of features required to sufficiently describe a pattern is just a few. This makes them easy to explain, leading to a better understanding of the hidden processes and relations. Essentially, each pattern is a knowledge facet told in the domain’s language.

In May, our AI algorithm automatically identified 297 sets of US counties. We found that 985 US counties are at high risk, and Mississippi, Louisiana, and Georgia have the highest density of high-risk counties at a coverage of 80-90%. These numbers have changed somewhat as the pandemic raged on.

In the next blog pieces, we present three examples from the set of 297 patterns we found in our analysis in May. We only considered counties that had at least one COVID-19 death incidence. Then we present two examples for correlation patterns, and finally show how well our patterns were able to predict death rates in later months.

OUR COVID-19 RISK EXPLORER DASHBOARD

We designed the COVID-19 Risk Explorer shown on the left to allow users to explore the counties and their risk patterns, if any. This particular image shows the top three socio-economic risk factors for Chickasaw County, MS and its death rate curve on the top right. We see that this county participated in many risk patterns which can be selected in the bottom panel of the dashboard and viewed in the middle panel on the right.

Other counties that have the selected risk pattern are also shaded (according to death rate). This enables users to make predictions and draw analogies for possible interventions. To start exploring the risk patterns of your county simply click here and a new tab or window with our dashboard will open.

DETAILED ANALYSES OF SOME EXAMPLE PATTERNS

In the following, we present three examples from the set of 297 patterns we found in our analysis in May. We only considered counties that had at least one COVID-19 death incidence. Then we present two examples for correlation patterns, and finally show how well our patterns were able to predict death rates in later months.

Counties high at risk: set #1

This is a map of US counties that all have one thing in common — on average they have a higher COVID-19 death rate than all of the US counties averaged together. Follow this link to learn why these counties are at high risk. What else do they have in common?

Counties high at risk: set #2

There is more than one set of features that could put a county at higher risk. On the left is another map of US counties for which our software has identified a feature pattern that associates with higher than usual COVID-19 death rate on average. Follow this link to learn why these counties are at high risk. What are their common properties?

Counties high at risk: set #3

Here is a third map of counties unified by a set of features that indicate an unusually high COVID-19 death rate on average. Click this link to learn what these critical features and their values ranges are.

Correlation analysis -- finding factors that amplify risk

Correlation can reveal a linear association between two variables, such as exercise and health. But important correlations are often hidden with conventional correlation analysis that uses all data points, here the counties, indiscriminately. Conversely, our pattern mining engine can reveal sets of counties where certain important correlations hold and so enable more targeted COVID-19 testing and health policy making.

Counties where risk correlates with a factor: set #4

The scatterplot on the left makes one think that there is no apparent correlation between severe housing cost burden and COVID-19 death rate; the correlation is a mere 10%. But in fact there are county patterns where such a correlation holds. Click this link to find out what they are.

Counties where risk correlates with a factor: set #5

Similarly, per the scatterplot on the left, there is no apparent correlation between a county’s unemployment rate and its COVID-19 death rate; the correlation factor is just 13%. But do not rush to this conclusion; we found that there IS a correlation but only for counties that fulfill certain population criteria, as elaborated on this page.

Predictive analysis -- identifying counties that will be impacted soon

Prediction is the ultimate goal of data analytics. For the ongoing pandemic, health officials are highly interested in identifying the US counties where the COVID-19 death rate might spike next. It would allow authorities to direct test kits, allocate hospital care, increase contact tracing, alert the community, and so on. The patterns we find are essentially predictions of the response variable — higher than average COVID-19 death rate.

Our patterns can predict COVID-19 death rate!

We followed the COVID-19 county data over time. We found that the pattern descriptions did not change much — solid evidence that our pattern mining engine delivers statistically robust and reliable results. The three maps to the left show the three sets of counties, set 1-3, at the time of initial analysis (May 10) and one month later (June 10).
We see that for quite a few of these counties the COVID-19 death rate has markedly increased; the shallow blue coloring has turned to dark blue. Other counties previously not affected, but fitting the respective pattern profile, have now seen their first COVID-19 fatalities. They were previously invisible (grey) but are now shaded in shallow blue.

This shows our software’s capability to predict a county’s future fate in the ongoing pandemic. The three sets highlighted here were randomly selected from the patterns we found; we did not ‘cherry pick’ the best results. For all patterns, the average COVID-19 death rate increased 2-3 times the US average during this time frame. Click here to learn more.

The bigger picture

We can think of every US county as an (observational) experiment; each has certain characteristics which makes it unique, and similar to some others at the same time. Our pattern mining engine looks for regions in this feature space that are occupied with similar counties that all respond in a similar way to a given target variable of interest — the COVID-19 death rate.

The criteria that determine ‘similarity’ are grounded in sophisticated statistical pattern mining — a core technology we market. It can be applied to any domain, not just to predict the outcomes for a pandemic disease. Contact us to find out how we can help you to find important features in your data. Chances are high that we can help you.

Interested to learn more about our overall approach to pattern mining?

If you like to know more about our approach to pattern mining in high-dimensional feature spaces please visit this page. It offers a gentle introduction to the subject. For more a more rigorous treatment we plan to post a few technical briefs on our technology in the near future.

Like to see our software in action?