## Want to keep learning?

This content is taken from the University of Twente's online course, Geohealth: Improving Public Health through Geographic Information. Join the course to learn more.
3.14

## University of Twente

Skip to 0 minutes and 9 secondsHello. Welcome to this micro lecture on spatial statistics. This micro lecture is part of the spatial statistics module in this MOOC. So, what are we going to learn about in this micro lecture? We're going to look at the concept of spatial dependence.

Skip to 0 minutes and 29 secondsWe're going to then learn how we describe a model with spatial variation and spatial dependence using the variogram. And we're going to look at three key parameters of the variogram-- the sill, the nugget, and the range. And then we're going to round off with some examples in environment and health. The material presented here is also supported by the articles which you can find in the Future Learn website.

Skip to 0 minutes and 59 secondsWell, this image is from Tuz Golu. Tuz Golu is a lake in Turkey. And this is from band four from the Landsat image. And the green areas indicate high values of reflectance, and the pinky-orange areas indicate low values of reflectance. And what we see is that the high values tend to occur close by each other, and the low values tend to occur close by each other. And that reflects what we understand about spatial dependence, that similar values tend to occur close by each other in space, whereas dissimilar values tend to occur further away in space. On the right side, we've taken a transect across the lake.

Skip to 1 minute and 40 secondsAnd you can see there that the high values of reflectance tend to occur close by each other, and the low values of reflectance tend to occur close by each other. So we might have a separation of one pixel in the image. It's quite close by two pixels in the image. We see the values are quite similar. If we go to a larger separation of maybe 50 pixels, the values can be quite dissimilar. Well, that gives us, if you like, a visualization of spatial dependence. But how might we quantify that scientifically? To quantify it scientifically, we use this tool called the variogram.

Skip to 2 minutes and 21 secondsThis is the only equation in this module, but let's have a look in more detail. So, gamma. Gamma indicates the variogram value. The variogram value, what we call the semi-variance.

Skip to 2 minutes and 36 secondsThe y is the attribute. So in this case, it's the reflectance value, but it could be something else. It could be the air pollution concentration. It could be a malaria parasite rate, a disease prevalence.

Skip to 2 minutes and 49 secondsThe s indicates location. So in this example, we have a transect of values going from east to west, and the units of distance are pixels.

Skip to 3 minutes and 5 secondsh, h indicates the separation. So what we want to know is we want to know what is the variogram value, the value of the semi-variance, for two observations measured at a given separation. So perhaps h is one pixel, it's two pixels, it's three pixels, it's 50 pixels, and so forth.

Skip to 3 minutes and 27 secondsAnd finally, n indicates the number of pairs. So we average over all these pairs to get our variogram value.

Skip to 3 minutes and 37 secondsAnd you'll notice that this is based on this term ys minus ys plus h. So when h is small-- for example, if h is one-- we expect the attribute values to be similar. So the value of the variogram, the value of the semi-variance, gamma will be low. When we have a larger separation, we might expect the values of y to be dissimilar. And the value of the variogram, semi-variance, gamma will be high. And we can go from there, then, to this plot. So the plot is shown on the right side. And this is what we call a sample variogram. What we see that it short lags. So the lags is the geographic separation.

Skip to 4 minutes and 24 secondsPairs of points have a similar value, so the variogram value is low. As we increase the separation, the variogram value increases. And it increases until this variogram reaches a plateau. So we can see that the semi-variance increases with increasing lag.

Skip to 4 minutes and 47 secondsAnd that's expected, because the increasing semi-variance means an increased dissimilarity between observations taken at pairs of points. And we expect that to be higher at larger separations.

Skip to 5 minutes and 2 secondsBut what is the next step? The next step is to fit a curve through these points. And that's what we've done here, we fitted a curve through the points. And this curve has three key parameters. The first parameter is called the range. And the range is the limit of spatial dependence. So we see the range occurs where the sample variogram flattens out. So the range here is around 1,000 meters. And for short range-- short distances, less than a range, perhaps 500 meters, we have a low value of the semi-variance. And that increases until we reach the range. So, pairs of points separated by 500 meters would be expected to be correlated because they're within the range of spatial dependence.

Skip to 5 minutes and 59 secondsThey're within the limit of spatial dependence. Pairs of points separated by 1,500 meters would be expected to be uncorrelated, because they're beyond the limit of spatial dependence.

Skip to 6 minutes and 13 secondsThe next parameter is the sill. The sill describes the overall variability in our data. And the overall variability is at a maximum when we reach the range. OK? Because the range is the limit of spatial dependence, and that's where we expect the maximum variability, the maximum difference between pairs of observations. The final parameter is the nugget. And the Nugget is the variability in our data that does not have a spatial component. And this is typically measurement error, for example due to your instruments.

Skip to 6 minutes and 51 secondsWhat can we do with this variogram? Well the variogram is already informative, because it gives us information about the spatial structure in our data. So we might expect-- we might know now, for example, that high values of disease prevalence tend to occur within a radius of perhaps 50 kilometers, for example. But we can also go further, and we can use this variogram for interpolation. In this illustration, we see the black points. There's six black points. And these are locations where we took a measurement. So we have real data at those points. So this could be for values of air pollution, it could values of disease prevalence, it could be values of reflectance.

Skip to 7 minutes and 34 secondsBut imagine we have a location s0, this red diamond. And at that location, we haven't taken a measurement. But we want to make a prediction of what the value is at that location. So using this variogram, we can say how correlated we expect the value at that location to be with the value of all the other locations where we have data. And we use that information then to predict at the red diamond, to interpolate. In geostatistics, we often call this Kriging. I'll now show you some examples of where that's been done. The first case is from our own research. And these are measurements of air pollution on the left side.

Skip to 8 minutes and 20 secondsThis is particulate matter less than 10 microns, air pollution concentration at different places in Europe. We have high values indicated in red and low values indicated in blue. So we've gone through the steps I just showed. We fitted a variogram to these data, and we used that variogram to support the interpolation. And that leads to the map that we see on the right side. The red color indicates higher pollution concentrations, and the blue color indicates low air pollution concentrations. And we might use such a map to make estimates of personal exposure that could be used to support an environmental epidemiological study. The last example is from the Malaria Atlas Program.

Skip to 9 minutes and 9 secondsAnd I suggest take a look in the middle, at Africa. We can see there we have survey data about the malaria parasite rate. The red values indicate-- red color indicates high values, and the yellow color indicates low values. They've done a geostatistical analysis on these data, and that leads to the map where, once again, the red values indicate high values for parasite rate by the yellow color indicates low values. So that brings us to the end of this micro lecture. We've learned about the concept of spatial dependence. We've seen how we can describe a model spatial variation using the variogram. And we learned about those key parameters, the sill, the nugget, and the range.

Skip to 9 minutes and 58 secondsAnd we looked at some examples in environment and health. We had a remote sensing image, we had air pollution concentration, and we had malaria parasite rate. Thank you very much for watching this micro lecture. If you would like to know more about these topics, you can look at the articles that are there to support this module in the Future Learn website. Thank you very much.