Want to keep learning?

This content is taken from the University of Twente's online course, Geohealth: Improving Public Health through Geographic Information. Join the course to learn more.

Skip to 0 minutes and 7 seconds Hello, and welcome to this walkthrough of the exercise “Mapping Q-fever Instances in the Netherlands.” In this exercise, I will walk you through this exercise, which you see on the screen at this moment, which is a step through showing you how to use GeoDa to calculate the smooth disease rates and to determine whether observed spatial patterns of disease are actually clustered using Moran’s I statistic. For this exercise, we’re going to use a software called GeoDa, which is either already installed on your PC, and it’s not so, the instructions will tell you how to do so. So once installed, you’ll find this little icon on your desktop, and you can double click it to open the software.

Skip to 0 minutes and 53 seconds This looks like this and is a basic overview of all the functions which we have inside the software.

Skip to 1 minute and 14 seconds So once we’ve opened the software, the first thing you need to do is to start a new project. And you start a new project by directing the software to a spatial data file. So there is a data file, we are going to use to create a disease map and to smooth and calculate clustering for. So we do that by going to file. We open the file, a new project form, and we select the option ESRI Shape Files. We’re going to open a new shape file which has some preloaded data with it. In my case, it will directly open the correct folder where the data is stored.

Skip to 1 minute and 49 seconds In your case, if you have just downloaded the data, you’ll have to look this up on your own machine. And once around, you can open this layer here, which is called NLPC4QF, the Netherlands post code for area Q-fever. We’re going to look at the Q-fever incidence rate by postal code area in the Netherlands. We select the file, and we open it. And by default, it will open a map of the Netherlands to show all these postal code for areas. So now, it’s still a blank. Map. It’s not visualizing any incidence rates. So the first step we’re going to take now is actually visualizing the incidence rates.

Skip to 2 minutes and 26 seconds And we do that by going to the main menu, selecting the option Maps, selecting option Percentile Map. And now it will open a dialogue asking which attribute of the PC4 area you want to visualize. And we want to visualize the Q-fever incidence rate. So we select that and click OK. And it will give you a map of the Q-fever for incidence rate by postal code for area in the Netherlands. So this by percentiles. So the lowest category is the lowest 1% of values. And our highest category is the highest 1% of values. You can select the categories. And you’ll show on the map where these areas are located. So it’s a nice little convenient feature. So now have this.

Skip to 3 minutes and 11 seconds So we might want to smooth this pattern. We know, because these areas are quite small that the population from which these incidence rates were derived were also quite small. And as you might remember from the intro lecture, that could lead to an inflation of these traits. There is a high chance of an inflation if there’s an unlikely event occurring in a very small population. So we’re going to try to reduce this noise by smoothing it out using the empirical base smoother technique, which was introduced in the introduction lecture. And what that does ensure for each of its area is trying to correct it towards the mean of the whole area.

Skip to 3 minutes and 51 seconds So if you have a small area with a very high rate, it will try to pull it downwards towards the global mean. And if you have a very low rate, it will try to pull it upwards towards the global mean. And the amount of the correction is dependent on the size of the population from which the incidence rate was derived. So how did we do this in GeoDa? We can go to the main menu. We select Map. And now we do not select Percentile Map, but Rate Calculated Map. And we say, oops. So we again, we go to map. We go to Rates Calculated Map. And we go to Empirical Base. So they’ll ask you for two input variables.

Skip to 4 minutes and 26 seconds One of it is the event variable. So the account of some type of event, in this case, Q-fever cases, and the Page variable, which is the population from which these events were derived. So here, that’s the normal population. And the event variable is the number of cases. We select as map them, again, Percentile. We set the number of categories to 5 and click OK. And this is how the output looks like. This is the same map of the Netherlands, again, showing Q-fever instances by postal code area. But this time, smooth rates are shown. So these are the rates which are derived based on this empirical base smooth thing. You can see the noisy pattern.

Skip to 5 minutes and 7 seconds But what you see here is already smoothed out pattern like this, this example here. So to save these rates, we right click next to the map. We right click to the map. And we select the option Save Rates. And we’ll give a default name, REBS rate, empirical base. We just leave it to default and click OK.

Skip to 5 minutes and 31 seconds Now, that we have saved these rates, the next thing we want to do is know whether it is clustered. And if you remember from the introduction lecture, if we want to measure clustering, we first need to conceptualize what the spatial neighborhood might be. So here in GeoDa, we conceptualize the spatial neighborhood using again the main menu bar. We go to Tools. And we select Weights Manager. And we select the option Create. It will open a new dialogue box. And first, it will ask for an ID variable, so an attribute, which can be used to identify each of these unique PC4 areas. You are going to select one of the attributes which is already there.

Skip to 6 minutes and 7 seconds Or you can add a new ID variable. In this case, we’ll call it polygon ID. So we’ll leave the default. And we say Add. The conceptualization of the neighbourhood in this case is by contiguity so touching borders. And we can choose either Queen, which is cardial right directions– up and down, left and right, or Rook, which is also diagonals. We choose that option. And we leave the order of contiguity to be 1. So it’s your first order neighbors. And we just click create. It will ask us to save it. And we’re just going to override one of the previous outputs I’ve already created. So yes, we’re going to override that.

Skip to 6 minutes and 48 seconds It will give us another warning that there are some areas which do not have any neighbors. These are the islands where PC4 areas do not have any neighbors touching borders. So that’s OK. We can just ignore that. We click OK. And it’s successfully created. We can close the dialogue. So this is what we’ve just done. We’ve created a new spatial contiguity, spatial representation by the type of rook. It’s symmetric. This is its name– first order contiguity. And we can close this. So now we have to calculate the smooth rates. And to find the spatial neighborhood, we can actually go and calculate whether this pattern we observe is clustered. We do that by, again, going into the main menu.

Skip to 7 minutes and 34 seconds We click the option of space. And we select Univariate Moran’s I. It’ll ask us which variable we want to test for spatial clustering. And we want to test the empirical base smoothing rates, which we have just calculated. And by default, we’ll choose the correct weights matrix, which is the one we just made. And we click OK. The output looks like this. So here we see a Moran’s plot, which was introduced in the introduction lecture within x-axis showing the local rates for each of the postal code areas. It’s local smooth rates. And on the y-axis, the average of the rates in its neighboring districts.

Skip to 8 minutes and 15 seconds We see there’s a correlation between the two, meaning that there is a spatial autocorrelation, namely about 51%. And to test whether it is significant, we can right click in the plot, select Randomization, and choose a number of permutations test, which will test how many times this observed pattern will be found based on random chance. So we test it for 999 times. And we’ll see here the output of these are the expected correlations based on random chance. And this is what we find in our data. So this tells us that what we see is very unlikely to have happened by mere random chance. So it’s statistically significant clustered. We’ll close this.

Skip to 9 minutes and 0 seconds Another feature of this plot, you can select the points by clicking and dragging. And it will show on the map which areas you have just selected. So these are the points which are high with our neighbors. So they correspond to these areas on the map. So that’s it. So that’s how you use GeoDa to calculate smooth disease rates and to calculate Moran’s I statistic for spatial autocorrelation. And I hope you enjoyed the step through instructions and good luck practicing. Thank you. Bye, bye.

Investigating disease clusters using disease rates in specified spatial areas

The main objective of this case study is to identify spatial patterns in Q-fever infections during the 2009 outbreak in The Netherlands.

You will analyze case incidence rates per 4-digit postal code areas in The Netherlands.

Software for this exercise

For this step we are going to use the software called GeoDa. GeoDa is a free and open source software tool that serves as an introduction to spatial data analysis. This is a very simple yet very powerful piece of software which allows you to analyze spatial data using a number of statistical tools.

The software requires you to provide a file of spatial locations (for example a point or polygon shapefile) and allows you to explore the data by mapping, plotting, and statistically analyzing the data.

The GeoDa software is available for Windows, Mac and Linux. Please go to GeoDa software website to get more information and start downloading the software.

The software is not available for smartphone operating systems, so you will need a desktop computer for this exercise. Please check out the common questions and answers on using this software properly. Unfortunately, we cannot provide technical assistance in using this software in this course.

Exercise instructions

The PDF Exercise Spatial Smooth in GeoDa that you can download below explains the exercise. You should also download the data that you can use for this exercise.

Share this video:

This video is from the free online course:

Geohealth: Improving Public Health through Geographic Information

University of Twente

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join: