Skip to 0 minutes and 9 seconds Before we start on self-organizing maps, I would like to take you on a small tour and tell you a little bit about data mining and machine learning. So before we start on self-organizing map, let us zoom out a bit and checkout machine learning and data mining. Data mining is a technique aiming at discovering previously unknown pattern, something that is hidden within your data set but has not been found or revealed yet. It’s a technique often used by, for example, grocery stores or retail companies. What they try to find is a link between a customer buying product A and buying product B. Now, data mining is so important because we are facing a growing volume of data.
Skip to 1 minute and 6 seconds Our data sets are getting larger and larger. This is perhaps, on one hand, because of electronic patient records. So we get more and more information about patients. But on the other hand, we also have far more information on other variables, for example, census data or environmental variables. And we want to integrate these data sets. Now, this makes the traditional tools less suitable. And we’re trying to find a new way of analyzing this data. If we focus a bit on data mining, we’ll find that it’s used for different purposes, classification, clustering, finding associations, sequential patterns, and prediction. We can extend this a bit into spatial.
Skip to 2 minutes and 5 seconds And what we, for example, see is that it can be used for spatial segmentation, identifying areas with similar values maybe to uncover the graphical distribution and variation of diseases.
Skip to 2 minutes and 26 seconds We can also mine for spatial associations. Perhaps we find that when we find a variable with the value blue, light blue, that indicates that in that same area, we will also find the dark blue. This we could, for example, use to find associations between patients and other spatial variables. Now, we can extend this further also in the temporal domain. We may want to understand spatial-temporal diffusion patterns of diseases or maybe forecast epidemics.
Skip to 3 minutes and 10 seconds We can also split data mining in two groups, being supervised or unsupervised. Supervised methods assume that we already have a certain understanding and are looking for a certain pre-defined class. You see that supervised methods are especially used in classification and prediction.
Skip to 3 minutes and 37 seconds Now, what is so special about spatial data mining? Is it more than just the fact that we are using spatial data? Yes, in fact it is. In many cases, we will have a spatial auto-correlation. So we have a situation where neighbouring objects may actually influence each other, so they’re not independent. We can also face situations where the spatial relationships are not explicit in our data. So it is not stored that two locations are actually close. Or perhaps, or in most cases, we’re trying to find spatial patterns.
Skip to 4 minutes and 26 seconds So when we zoom out a bit, what we see is that self-organizing map, SOM’s, belong to a group called machine learning, which we can say is part of the group of data mining techniques, although this group also has other methods, maybe spatial statistics or visualization.
Skip to 4 minutes and 49 seconds So if we could make a small comparison between data mining on one hand and machine learning on the other hand, what we see is that data mining is more generic. And when we talk about machine learning, we’re expecting the machine to learn, or actually not a machine, but an algorithm.
Skip to 5 minutes and 14 seconds I’ve been looking on the web to find some, well, applications of machine learning within the health domain. And one of the examples I found was this one by Barbara Han She’s actually trying to apply machine learning in order to understand Ebola, to identify disease-carrying species in order to predict the epidemics. What I also found is that the domains of computer science, machine learning, and health care are trying to get together, for example, in this conference that is going to take place later this year to together develop better methods for machine learning within the health domain.
Skip to 6 minutes and 11 seconds Now, we see that machine learning is being applied using several different types of algorithms. You see here, artificial neural networks or support vector machines. Many others are available besides the SOM’s that we are applying in this MOOC.
Skip to 6 minutes and 34 seconds Now that you understand a little bit more about the data mining and machine learning, we’ll go on and apply self-organizing maps later on using a case study of measles in Iceland in order to find similarities between health units, but also to find similarities in time between different measles outbreaks.
Big data and spatial data mining
Our data sets are getting larger and larger; we are storing much more data nowadays then a few years ago.
In this video we will explain the concepts data mining, machine learning and Self-Organizing Maps (SOM). It is an introductory step, before we start on the case study of this in-depth topic.
One of the topics addressed is spatial data mining. Is it different from mining none spatial data and why?
© University of Twente