1.3

# Spatial data

The first element we need to understand in GeoHealth is spatial data. Spatial data is the same as geographic data, and can be related to a location on the earth surface. This may seem easy to you however there are many different ways to indicate in which place you are.

When someone asks me: “Where are you?” my answer could be: “In the Netherlands”. I indicate my position via a country name. The Netherlands is an area, and when a GIS would put me at a certain point in this country this point would most likely be the centroid. The centroid of an irregular shape like the Netherlands can best be visualized as the point on which the country would balance when I would place it on the top of my finger.

My actual location is the city of Enschede. In case you are a bit familiar with the Netherlands you know that the centroid of this country is not close to Enschede. But even this spatial information may not be precise enough for the analysis I have in mind. The information does not specify where in Enschede I am located.

## Using coordinates to indicate a spatial location

A more precise way of pinpointing a location on the earth surface is by providing coordinates. An easy way to get my coordinates is via the “Where am I app” using Google earth. Several of these services are available for both mobile phones and desktop computers. Most mobile devices are aware of their own location. This position on the earth surface is being measured via a system called: latitude and longitude. Longitude refers to lines running from pole to pole (north-south) over the globe. The number you retrieved (in my case 6.88) refers to the vertical line I am on. The latitude (or height) refers to circles running over the globe in a horizontal fashion. We often measure latitude and longitude in degrees and minutes. For me it would be 52 ° (degrees) and 22’N (minutes North).

## Different types of Spatial Data

We have just answered the question how we can geo-reference data (find a spatial location), but there is another equally relevant question: What to geo-reference?

Data used in the GeoHealth domain can be split into two different groups:

1. Primary data
2. Secondary data

Primary data concerns disease (or health) surveillance data, e.g. the location of a patient or health facilities. To be able to use this data in a GIS we need the data to be geo-referenced (we need a reference to a location).

In many studies we link primary data to other environmental and social datasets like Census data, administrative data (boundaries of districts or provinces), but also rainfall, landuse and other information that can help to gain a deeper understanding of our primary data. This type of data is often referred to as “Secondary data”. A GIS allows the integration of primary and secondary data.

## Granularity of spatial data

Besides the type of data, we should also consider the granularity of the data. Data can be aggregated or refer to a single patient. Health data is often aggregated to a certain administrative zone e.g. the number of patients per district. The reason for this aggregation can be functional (you want to show that certain districts have more or less patients) but aggregation is often performed for privacy reasons. The level of aggregation can have an impact on the analysis that can be performed.

Trends: We are collecting more and more data and this will only increase in the near future. We have better surveillance systems in place for communicable diseases that function as early warning systems. We collect data in a more systematic way so that datasets collected in different countries can be integrated and compared. We are also using new datasets, for example data from social media. We do not only collect data during a single survey, but repeat this data collection periodically so that we will have a complete spatial-temporal dataset that can really help us in answering important questions.

What trends do you see in relation to primary and secondary data?