Hi and welcome to this course, Process Mining with ProM. I’m Joos Buijs. I’m Assistant Professor here at Eindhoven University of Technology and I’ve been doing process mining for over five years now. And I would like to make you as enthusiastic as I am about this nice analysis technique. And in this course we will show you what process mining is, what it can do, but also how you can do it. So we will learn concrete skills on how you can apply process mining on the data that we provide but also after this course on data that you have around you because data is everywhere.
For instance, if you look what happens in one Internet minute in 2011– so also five years ago or more– what happens in an Internet minute is staggering. So for instance on YouTube, they process 1.3 million video views per minute. Google processes more than two million search queries, et cetera, et cetera. And of course, each of these queries or views creates data, event data even. So in 2011 it was projected by 2015 that the number of network devices would be twice the global population. There’s so much data that some even call it the Big Data Crisis. The amount of data companies collect keeps growing but they cannot analyze it all.
They claim that only a half a percent of big data is actually being analyzed. But at the same time in 2013 $31 billion was already spent on analyzing big data. And, of course, growth is projected. And you see that for instance in two days in 2015 as much data was produced as all of history until 2003. So more and more data is created and we need to make sense of this. And in process mining we focus on event data because event data is also everywhere. Let me give you a couple of examples. For instance, whenever you use your bank card to pay for groceries or whatever, an event is triggered.
A particular amount was paid at a particular store at a particular time and place. But also whenever you make a phone call or send an email, data and particularly event data is written. What route does your email take, et cetera, et cetera. Also now with smart TVs they also store when they are on and which channels are watched when. Smartphones, they record where they are and when, which apps are used, et cetera, et cetera. And you can do useful stuff with this. When I come home turn the Wi-Fi on, for instance.
And also public transport, of course. With a public transport card you can check in and out and this all creates data– when did I check in and where. And, of course, when you browse the Internet even Future Learn data is recorded– which website and which page was visited, when and for how long. And this can be used to improve. For instance, this course we can analyze which lectures were more difficult than others. Let’s go into a bit more detail. So let’s take the public transport example. So what does a process related to public transport look like? Well, you start with buying a public transport card.
And where you buy this, when, and how much you put on your card can all be recorded. Once you have this card, you can check in on the bus, another event. And then you check out on the bus, another event. You know exactly who checked out where and how long it took, for instance, between check in and check out.
And after you’ve checked out on the bus you can check in on the bus or the subway or train or whatever and then you can check out again. So there’s a whole journey. You’re traveling from one place to another, checking in and out multiple times. At some point you also top up which is a different type of activity which is also recorded. And then you can check in and check out again. And finally, of course, you stop. You reach your destination and this process for now stops. But at any point you can start this process again. At every step event data is recorded. Second example, whenever you apply for a loan, this application has to be handled by the bank.
So the first step of the bank is to register your application. They have to store it somewhere in an IT system with all the details that you provide. After this, several checks have to be performed but in arbitrary order. So your case is virtually split and three people can work on this together. For instance, your credit has to be checked. Didn’t you loan already too much. Thery also have to calculate the capacity. How much can you loan at maximum? And they want to do some other checks in the system. Once all these three checks are performed the process merges again and a decision can be made. Do we approve or reject this loan application?
And whatever the decision, an email is sent to you informing you of the decision. So although this may be a simplistic view of a process, I hope you can imagine that handling a loan application follows this type of processes and usually supported by an IT system and therefore creating a lot of data that can be analyzed afterwards. Let’s look at a third example. Whenever you order something online, this order has to be delivered. So you have your order and it contains three packages– red, blue, and green. And, of course, once you place the order, somewhere you have to pay so this needs to be tracked. Only once you paid your order will be shipped.
So after you’ve paid, for instance, the red and the green packages can already be shipped. So they’re loaded on the truck and moved to your house and they are delivered. And maybe a couple of days later blue is in stock so the blue item is also shipped and delivered at your house. Again, a very simplistic view on the process but I hope you can see that one order has one payment but might contain multiple shipments. And if you’re not at home a shipment might have several attempts. You’re at home, it’s returned and shipped another time. And in a process a lot of exceptions have to be handled. And, again, each step is recorded.
So process mining can assist in analyzing all the data that’s generated. And I’ve shown you many examples of processes and software systems that actually support processes. So the bank cards, the public transport card, websites, they’re all software systems that in some way interact with the world– you, me, and other systems. And they have to know how to interact with this world. Therefore, they’re configured using process models. Process models describe what input can be received and how the software system should react. So this process model in some sense describes the world and using that knowledge they configure the software system. So the software system is actually executing a process. When this happens I have to do that.
In this state I can expect this or that. Well, during this execution event data is recorded. Every step, every check in, every order, everything is recorded in databases. And this is the data that process mining looks at. And process mining bridges the gap between event logs, event data, and process models. Process mining techniques can be roughly divided in three categories. Discovery– using solely the event data, we can discover a process model that describes how the software system for instance is behaving. Secondly, we can do conformance checking.
Using the added discover process model or a process model that was actually used to configure the system we can check using the data if the system or the users of the system comply with what the process model describes. Finally, once we have a process model, again either discovered or provided, we can enhance this. Since we can relate the data to the process model, we can predict timing information on top of this. Where are people waiting? Which path is most frequently executed? But also how are users in the system collaborating? Processes mining can provide answers to all these questions.
In this course we will cover several process mining activities and they cover several phases of a process mining project. So usually you start with the initialisation phase, followed by the analysis iterations, and then you summarize and you implement. For instance, you start with planning. What process am I looking at? What questions do I want answered? And this gives input for the extraction. Depending on the process and the question you want to answer, you need to extract certain data. Once you have the data you can process it. You can filter out particular cases or events to focus further. Using this filtered data, you can do process mining.
There are several techniques that can help you and we will discuss the main ones. Then a very important step comes– evaluation. Given the process mining results, for instance a process model, you have to be able to evaluate how good this process model describes the data. And this gives input in changing, for instance, parameters or going back to the data processing phase and applying process mining techniques again. Once, after evaluation, you believe that the results are correct, you can summarize the results and this gives input for process improvement. So based on your summary of the results you can provide concrete process improvements to the process owner.
And in this course we will cover the steps from extraction to summarizing these results and we’ll mainly focus on the different types of process mining techniques there are and how you can evaluate the results. So in the next lectures, we will install the process mining tool ProM which we will use throughout this course to apply whatever we learn on real data immediately. So I hope to see you again in the next lecture soon.