## Want to keep learning?

This content is taken from the University of Twente's online course, Geohealth: Improving Public Health through Geographic Information. Join the course to learn more.
4.3

## University of Twente

Skip to 0 minutes and 8 seconds Hi, my name is Ente Rood, and welcome to this lecture about disease mapping and the use of spatial analysis to visualize disease data. This lecture will cover the basic concepts of spatial epidemiology and introduce some methods which you might want to consider when mapping your disease data. So disease mapping, what is the relevance to epidemiologists or policy makers? What are the benefits of mapping your data? Officially mapping your data will give you an instantaneous insight in any geographic pattern of disease occurrence. As health outcomes are closely linked to geographic variations and social climate conditions, variations in pathogen abundance or environmental exposures, health outcomes will vary over space.

Skip to 0 minutes and 48 seconds Exploring these patterns can provide new insights into potential causes or interventions and health program I’d consider. It allows you to disseminate your message and portray essential health care messages to the relevant target audience. This process will also allow you to hypothesize about the causality of relations with good leads to the collection and mapping of new data to investigate these hypotheses.

Skip to 1 minute and 12 seconds So coming back from what you know about disease clustering, what is it what we actually mean by spatial clustering? Well very loosely defined, a cluster is a graphically bounded group of occurrences of sufficient size and concentration to be unlikely to have occurred by chance. Statistically speaking, we consider two types of data which would be used to determine whether a disease is clustered. First are point data. This would represent individual cases of a certain infectious disease, for example. When point data is used, clustering is measured by the average between point distances. If a distance between two neighbors is less than would be expected based on random chance, a point distributions is considered to be clustered.

Skip to 1 minute and 55 seconds By analogy, you could think of the following experiment. If you throw 20 rice grains on a checkerboard, what would you expect to be the average distance between each pair of nearest rice grains? In the very unlikely event that all of these would land on a single square, what would you then expect to be the average distance? In this latter case, the grains would be closer together, even closer than you would expect based on mere chance. Hence, we consider this pattern to be clustered. The second type of data which we use are those which have a certain measurement of magnitude as an outcome for a certain area. For example, the rate of disease per municipality or district.

Skip to 2 minutes and 38 seconds In this case, clustering is measured as a similarity of rates in nearby municipalities. How this works will be the topic of the remainder of this presentation. Now we know what basic clustering means, a question arises. What causes diseases to cluster in space? Well in essence, there are two different processes which you might consider. First, there are variations in the external environment, which is causing a disease to cluster. For example, diseases can cluster because people cluster. Or respiratory syndromes might cluster in space because the air pollution which has causes these syndromes is clustered. In contrast, it can also be that there is interdependence between points or areas themselves.

Skip to 3 minutes and 21 seconds In this case, for example, we mean that diseases cluster because people catch this disease from other people who have the disease. This is an intrinsic property of the disease itself. Hence, this is called endogenous processes. In practice, it’s very difficult to distinguish which of these two processes are causing your disease to cluster.

Skip to 3 minutes and 43 seconds Let’s consider an example to show a bit further how this works. Here, we see a map of the Netherlands showing the incidence rate of leptospirosis per municipality. Dark brown colors indicate high incidence rates, while light colors are representative of low incidence rates. Looking at the map, we can see the pattern of leptospirosis across and the Netherlands is clearly clustered. So the question arises, what is causing this. First, we considered external environmental factor, namely soil clay content. As can be seen, the observed distribution of leptospirosis closely matches the distribution of soil clay contents across the Netherlands. This pattern could however not fully be explained by the contents of clay in the soil.

Skip to 4 minutes and 30 seconds It appeared that the observed pattern of leptospirosis could be explained by the occurrence of leptospirosis in the vicinity of a certain municipality. As leptospirosis is not transmittable from human to human, it is likely the result of persons living in an area being exposed to any nearby source of leptospires, causing this pattern to cluster.

# Spatial clustering

## Spatial clustering

The first type of spatial analysis we will discuss is cluster analysis. A cluster can be defined as a geographically bounded group of occurrences of sufficient size and concentration that is unlikely to have occurred by chance.

There are many different techniques/algorithms that can be used to find clusters in disease datasets. Clusters can be found using 1) point data, showing every disease case as a point, or by using 2) aerial (polygon) data showing the disease expressed as a population rate.

## Measuring clusters

Point data clustering is measured by calculating the average distances between points. When this average point distance is less than what can be expected for a random distribution, the point dataset displays clustering.

When using data expressing the rate of disease per area (for example the municipality or district), clustering can be measured by the Moran’s scatterplot.

## What causes disease to cluster in space?

We can distinguish between two different processes that may explain disease clustering in space.

First, there are variations in the external environment, i.e. the so-called exogenous factors. For example, diseases can cluster because people cluster, or respiratory syndromes might cluster in space because the air pollution, which causes these syndromes, is clustered in space.

Second, clustering may also be explained by endogenous factors through interdependence between points or areas themselves. In this case, for example, we mean that diseases may cluster because people may catch this disease from other people who have the disease. This is an intrinsic property of the disease itself, hence this is called an endogenous process.