Examples of Akai Kaeru at work analyzing COVID-19 for the Public Health sector

COVID-19 Analytics: Who Is at Risk? Who Will Be Next?

Back in May 2020, when the COVID-19 pandemic was just starting, we used the pattern mining engine of our software to analyze a prominent dataset with about 500 attributes, covering demographics, economics, infrastructure, etc. for all of the 3,007 US counties. The dataset is composed of a curated collection of over 200 publicly available COVID-19 related datasets from sources like Johns Hopkins, the WHO, the World Bank, the New York Times, and many others.

We were interested why some counties experience higher COVID-19 death rates — the number of deaths in terms of population size.

Our Findings Can:

Pattern Analysis:

Finding the critical factors that identify counties at risk

Our analysis reveals that it is rarely just one feature that exposes a county to a higher than average COVID-19 death rate. Rather, it is usually a combination of features that, when true at the same time, provide a vivid narrative of these fateful circumstances.

In most cases, the number of features required to sufficiently describe a pattern is just a few. This makes them easy to explain, leading to a better understanding of the hidden processes and relations. Essentially, each pattern is a knowledge facet told in the domain’s language.

In May, 2020 our AI algorithm automatically identified 297 sets of US counties. We found that 985 US counties were at high risk, and that Mississippi, Louisiana, and Georgia had the highest density of high-risk counties at a coverage of 80–90%. These numbers changed somewhat as the pandemic was raging on.

Our COVID-19 Risk Explorer Dashboard

We designed the COVID-19 Risk Explorer shown on the right to allow users to explore the counties and their risk patterns, if any. This particular image shows the top three socio-economic risk factors for Chickasaw County, MS, and its death rate curve on the top right. This county participated in many risk patterns, which can be selected in the bottom panel of the dashboard and viewed in the middle panel on the right. Other counties that have the selected risk pattern are also shaded (according to death rate). This enables users to make predictions and draw analogies for possible interventions. To start exploring the risk patterns of your county, contact us for a free demo.

Detailed Analyses of Some Example Patterns

In the following, we present three examples from the set of 297 patterns we found in our analysis in May 2020. We only considered counties that had at least one COVID-19 death incidence. Then we present two examples for correlation patterns, and finally show how well our patterns were able to predict death rates in later months.

Counties high at risk: Set #1

This is a map of US counties that all have one thing in common: On average they have a higher COVID-19 death rate than all of the US counties averaged together. Follow this link to learn why these counties are at high risk. What else do they have in common?

Counties high at risk: Set #2
There is more than one set of features that could put a county at higher risk. On the right is another map of US counties for which our software has identified a feature pattern that associates with higher than usual COVID-19 death rate on average. Follow this link to learn why these counties are at high risk. See if you can find what their common properties are.
Counties high at risk: Set #3

Here is a third map of counties unified by a set of features that indicate an unusually high COVID-19 death rate on average. Click this link to learn what these critical features and their values ranges are.

Correlation Analysis: Finding Factors That Amplify Risk

Correlation can reveal a linear association between two variables, such as exercise and health. But important correlations are often hidden when data is subjected to conventional correlation analysis that uses all data points indiscriminately. Conversely, our pattern mining engine can reveal sets of data points (in this case, counties) where certain important correlations hold. When utilized by public health officials, it can enable more targeted COVID-19 testing and health policy-making.

Counties where risk correlates with a factor: Set #4

The scatterplot on the left might make one think that there is no apparent correlation between severe housing cost burden and COVID-19 death rate, as the correlation is a mere 10%. But in fact, there are county patterns where such a correlation holds. Click this link to find out what they are.

Counties where risk correlates with a factor: Set #5
Similarly, per the scatterplot on the right, there is no apparent correlation between a county’s unemployment rate and its COVID-19 death rate; the correlation factor is just 13%. But do not rush to this conclusion, because we found that there is a correlation, but only for counties that fulfill certain population criteria, as elaborated on this page.
Predictive Analysis:

Identifying Counties That Will Be Impacted Soon

Prediction is the ultimate goal of data analytics. During the pandemic, health officials were highly interested in identifying the US counties where the COVID-19 death rate might spike next. Prediction allows authorities to direct test kits, allocate hospital care, increase contact tracing, alert the community, and so on. The patterns we found are essentially predictions of the response variable: higher than average COVID-19 death rate.

Akai Kaeru’s patterns can predict COVID-19 death rate

We followed the COVID-19 county data over time, and found that the pattern descriptions did not change much. We also found solid evidence that our pattern mining engine delivers statistically robust and reliable results. The three maps below show the three sets of counties, Sets 1–3, at the time of initial analysis (May 10, 2020) and one month later (June 10, 2020). In fact, our prediction stayed on track for later months as well.

We see that for quite a few of these counties, the COVID-19 death rate markedly increased. This is signified by the shallow blue turning to dark blue. Other counties that were previously not affected but fit the respective pattern profile have now seen their first COVID-19 fatalities. They were previously invisible (gray) but are now shaded in shallow blue.

This shows our software’s capability to predict a county’s future as the pandemic raged on. The three sets highlighted here were randomly selected from the patterns we found; we did not “cherry-pick” the best results. For all patterns, the average COVID-19 death rate increased 2–3 times the US average during this time frame. Click here to learn more.

The Bigger Picture

We can think of every US county as an (observational) experiment; each has certain characteristics that make it unique and similar to some others at the same time. Our pattern mining engine looks for regions in this feature space that are occupied with similar counties that all respond in a similar way to a given target variable of interest: the COVID-19 death rate.

The criteria that determine “similarity” are grounded in sophisticated statistical pattern mining, which is a core technology we market. It can be applied to any domain, not just to predict the outcomes for a pandemic disease. Contact us to find out how we can help you to find important features in your data. The data strongly suggests that we can help you.

Interested in learning more about our overall approach to pattern mining?

If you would like to know more about our approach to pattern mining in high-dimensional feature spaces, visit this page. It offers a basic introduction to the subject.