Want to keep learning?

This content is taken from the University of Twente's online course, Geohealth: Improving Public Health through Geographic Information. Join the course to learn more.

Skip to 0 minutes and 9 seconds Welcome to this lecture on spatial statistics. In this lecture, we are going to look at the concept of spatial dependence. This is the key concept in spatial statistics. We are going to look at how to measure spatial dependence by the semivariance and what the variogram is. Then, we are going to complete this micro-lecture by an example of mapping malaria parasite rate in Africa from the research of Hay and others in 2009. The material presented in this lecture is also available from the articles that you can find in the FutureLearn website of GeoHealth. We acknowledge the inspiring contribution of Dr. Nicholas Hamm, who was the previous lecturer of this in-depth topic.

Skip to 1 minute and 0 seconds So as you can see, this is the image of the global land surface temperature obtained from WorldClim 1.4 of the global climate data. This is the surface temperature in July averaging over the period of 30 years in the past. The yellow areas have the highest surface temperature in the world. Whereas, the blue areas have the lowest surface temperature in the world. What we notice is that high values tend to occur close by each other and the low values tend to occur close by each other. That reflects our understanding of spatial dependence that similar values tend to occur close by each other. Whereas, dissimilar value tend to occur further away in geographical space.

Skip to 1 minute and 54 seconds Looking at the values of the temperature along a north-south transect in Africa in the figure in the bottom right, we can see that temperature at two locations close by are very much similar. For example, between locations 50 and 52. If we compare the temperature at location 50 with the temperature at location 150, the values are quite different. Much higher temperature at location 150 as compared to that at location 50. This is because the two locations are geographically far apart. So this colored map is a visualisation of spatial dependence. But how can we quantify the spatial dependence scientifically? Let’s recall how statisticians measure the relationship between two variables. We all may be very familiar with the covariance and the correlation.

Skip to 2 minutes and 55 seconds They are measures of dependence. There are three kinds of relationship between two variables as we can see from the scatter plots. The first one has no dependence. The second one has positive dependence. And the third one shows negative dependence. No dependence when the two variables do not vary together. Negative dependence when one variable increases as the other variable decreases. And positive dependence when one variable increases and the other variable increases as well. This is the dependence of the attribute. According to Tobler’s first law of geography, spatial dependence depends on the distance. When the distance between two locations are small, the attribute at two location are positive dependence. When the distance between two locations are large, they have no dependence.

Skip to 4 minutes and 1 second Let’s look at the temperature at two location, s1 and s2. We can measure the Euclidean distance h12 between them when we know the coordinates. The temperature at s1 and s2 might be very much the same, where h12 is smaller than 1 kilometre. But the temperature at s1 and s2 may be very different when h12 is larger than 1,000 kilometre. So once again, we can see that spatial dependence describes the influences of nearby location to each other. This results into similar attributes. We measure spatial dependence by the semivariance.

Skip to 4 minutes and 49 seconds We also call it the semivariogram. So, what do we need to know to quantify spatial dependence between two locations, s1 and s2? What we need to know is the distance between them and the attributes at each location. Let’s look at the only equation we have here. The Gamma indicates the semivariance or the semivariogram. We see that the Gamma is the value of the distance h between two locations, s and the location at a distance h apart. This second location can be located at any direction from s. z is attribute, can be the temperature, the relative risk of getting flu, or the malaria parasite rate. When h is small, we expect that the attribute values is to be similar.

Skip to 5 minutes and 49 seconds So, the value of the semivariance, the Gamma, will be low. When there is a large separation, we might expect the values of z to be dissimilar. And the value of the semivariance, gamma, will be high. And we can go from there to this plot. In the figure in the right-hand side, the green dots are the semivariances. The plot of all semivariances versus the all lag distances h is called the sample variogram. Loosely speaking, when we successfully draw a line that goes through all these dots, we obtain a variogram model. Statistically speaking, the variogram model is a function that is best fitted to the semivariances.

Skip to 6 minutes and 46 seconds The variogram model has three main parameters. They are the total sill, the nugget, and the range. The range is the limit of spatial dependence. When the separation distance is larger than the range, the semivariance is equal to the sill. And there is no spatial dependence anymore beyond the range. The nugget captures the random micro-variation and the measurement error. The total sill is the total variation of the attribute. The partial sill is the difference between the total sill and the nugget. OK, that’s interesting. But what can we do with this variogram? Well, the variogram is already informative, because it gives us information about the spatial structure of our data. More importantly, we can use the variogram for spatial interpolation.

Skip to 7 minutes and 48 seconds So, spatial interpolation is the process of using locations with known attribute values to estimate the attribute values at other unknown locations.

Skip to 8 minutes and 1 second The malaria parasite rate, as we can see in the figure, were created by geostatistics interpolation or Kriging given the samples and the variogram. So, that brings us to the end of this lecture on spatial dependence. We’ve learned about the concept of spatial dependence. We’ve seen how we can describe a model of spatial variation using the variogram. And we learned about the key parameters of the variogram. They are the sill, the nugget, and the range. Thank you very much for watching this lecture. And if you would like to know more about these topics, you can look at the articles that are there to support this module in the FutureLearn website. Thank you very much.

Spatial dependency

You are going to learn more about the concept of spatial dependence.

You will learn how to model spatial dependence using the variogram. You are going to look at three key parameters of the variogram: the sill, the nugget, and the range. These will be illustrated with examples in environment and health.

Share this video:

This video is from the free online course:

Geohealth: Improving Public Health through Geographic Information

University of Twente

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join: