Skip to 0 minutes and 9 secondsHello. In this video, I will show you how to transform volunteer data into actionable geo information, using spatio-temporal analytics. More precisely, in this video, I will go through two use cases. I will first briefly explain how to derive and mine risk factors for tick bites, and then I will show you how to map tick abundance. For the description of these two use cases, I will go through three main steps. One, we will identify explanatory factors that are relevant for the problem at hand, and we will enrich our volunteer data. Two, we will model this phenomena using data mining and machine learning methods. And third, we will visualize and interpret the results.
Skip to 1 minute and 2 secondsLet's start with the risk factor for tick bites. After an extensive literature review, we found that there are several environmental and human factors that influence tick bites. The table here summarizes the main factors according to literature. Broadly speaking, these factors belong to the categories temperature, precipitation, vegetation, distance-based metrics, and soil types. Using spatio-temporal layers for each of these factors, we enrich the volunteer reports on tick bites. Then we applied frequent pattern mining. Frequent pattern mining is a method useful to analyze big data sets and is used in business intelligence. Companies are recording all the products that individuals buy. And later, they mine all these shopping lists, and they identify products that are bought together.
Skip to 1 minute and 57 secondsSo using this approach, we apply an algorithm called apriori to find subsets of explanatory variables that co-occur with the tick bites. And this is an innovative approach to identify risk factors for tick bites. Here, we see some of our results. On the left side, there is a ring map. The ring map is formed by three concentric rings which represent three factors identified by the frequent pattern mining algorithm. The colors of the ring map indicate the type of factor that was identified-- red being temperature factors and yellow being distance-based factors. And as you can see, these are the two most important factors that explain the occurrence of tick bites.
Skip to 2 minutes and 47 secondsNow, let's zoom in and focus on the white dots inside the rectangle. This white dot represents one of the frequent patterns found in the tick bites dataset. In this case, it represents tick bites that occur in years that had many days with temperatures above 30 degrees and at that occur in locations that were about half a kilometer or within half a kilometer from recreational and forested areas. The map on the right shows all the tick bites that fulfill this criteria for a particular year-- in this case, 2013. This was a relatively warm year, and with these three simple rules, we were able to explain about 50% of all the tick bites that occurred that year.
Skip to 3 minutes and 37 secondsNow, let's move to the creation of tick abundance map. Similar to the previous case study, we start by identifying the main environmental factors that explain tick ecology. Broadly speaking, these factors are temperature, precipitation, vegetation, and wildlife. We found a spatio-temporal environmental layers for all of these factors, and we enriched our volunteer data. This enriched data set was used to build a regression model where tick abundance in a particular location and time is expressed as a function of the various spatio-temporal environmental factors. The regression model was then trained and tested using the available data. And then we used predict at unseen locations and timestamps.
Skip to 4 minutes and 30 secondsFor this case study, we decided to use machine learning because we have many potential environmental factors, because we are dealing with a non-linear phenomenon, and because there are several methods-- for instance, random forest aggression, support vector regression, or Gaussian processes-- that can be used to build different types of regression models. Here, you see the first result of our experiments. On the left side, we see a cross plot of model predictions in the x-axis versus tick abundance on the y-axis. As you can see, the model is able to capture this phenomenon, but there are still large errors that require additional work.
Skip to 5 minutes and 13 secondsHowever, just for demonstration purposes, we use this model to predict tick abundance for a particular day in the Netherlands-- in this case, 15th of March 2014. As you can see, this model predicts a large variation in the number of tick bites per hundreds of square meters. And it is also important to realize that this model can be applied on a daily basis, and this will produce a series of maps that can be used to study spatio-temporal dynamics of tick populations.
Skip to 5 minutes and 45 secondsSummarizing-- in this video, we have seen how spatio-temporal analytics can be used to derive health-relevant geoinformation In particular, we have seen how volunteer data can be used to identify tick bite risk factors and to map ticks in the space and time. We have also seen that volunteer data can be used by scientists to co-create knowledge and hopefully to co-design, together with citizens, public health interventions in space and time.
Spatio-temporal machine learning and data mining
In the video we made a brief tour on the universe of spatio-temporal analytics. We introduced the use of frequent pattern mining and of machine learning methods to build a regression model using spatio-temporal data collected by volunteers. Frequent pattern mining was used to extract common patterns in the environmental and human conditions associated to tick bites and machine learning methods were used to predict tick abundance in nature.
© University of Twente