Contact FutureLearn for Support
Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip to 0 minutes and 10 secondsHi and welcome to this course, Process Mining with ProM. I'm Joos Buijs. I'm Assistant Professor here at Eindhoven University of Technology and I've been doing process mining for over five years now. And I would like to make you as enthusiastic as I am about this nice analysis technique. And in this course we will show you what process mining is, what it can do, but also how you can do it. So we will learn concrete skills on how you can apply process mining on the data that we provide but also after this course on data that you have around you because data is everywhere.

Skip to 0 minutes and 45 secondsFor instance, if you look what happens in one Internet minute in 2011-- so also five years ago or more-- what happens in an Internet minute is staggering. So for instance on YouTube, they process 1.3 million video views per minute. Google processes more than two million search queries, et cetera, et cetera. And of course, each of these queries or views creates data, event data even. So in 2011 it was projected by 2015 that the number of network devices will be twice the global population. There's so much data that some even call it the Big Data Crisis. The amount of data companies collect keeps growing but they cannot analyze it all.

Skip to 1 minute and 33 secondsThey claim that only a half a percent of big data is actually being analyzed. But at the same time in 2013 $31 billion was already spent on analyzing big data. And, of course, growth is projected. And you see that for instance in two days in 2015 as much data was produced as all of history until 2003. So more and more data is created and we need to make sense of this. And in process mining we focus on event data because event data is also everywhere. Let me give you a couple of examples. For instance, whenever you use your bank card to pay for groceries or whatever, an event is triggered.

Skip to 2 minutes and 21 secondsA particular amount was paid at a particular store at a particular time and place. But also whenever you make a phone call or send an email, data and particularly event data is written. What route does your email take, et cetera, et cetera. Also now with smart TVs they also store when they are on and which channels are watched when. Smartphones, they record where they are and when, which apps are used, et cetera, et cetera. And you can do useful stuff with this. When I come home turn the Wi-Fi on, for instance.

Skip to 2 minutes and 58 secondsAnd also public transport, of course. With a public transport card you can check in and out and this all creates data-- when did I check in and where. And, of course, when you browse the Internet even Future Learn data is recorded-- which website and which page was visited, when and for how long. And this can be used to improve. For instance, this course we can analyze which lectures were more difficult than others. Let's go into a bit more detail. So let's take the public transport example. So what does a process relative to public transport look like? Well, you start with buying a public transport card.

Skip to 3 minutes and 39 secondsAnd where you buy this, when, and how much you put on your card can all be recorded. Once you have this card, you can check in on the bus, another event. And then you check out on the bus, another effect. You know exactly who checked out where and how long it took, for instance, between check in and check out.

Skip to 4 minutes and 1 secondAnd after you've checked out on the bus you can check in on the bus or the subway or train or whatever and then you can check out again. So there's a whole journey. You're traveling from one place to another, checking in and out multiple times. At some point you also top up which is a different type of activity which is also recorded. And then you can check in and check out again. And finally, of course, you stop. You reach your destination and this process for now stops. But at any point you can start this process again. At every step event data is record. Second example, whenever you apply for a loan, this application has to be handled by the back.

Skip to 4 minutes and 46 secondsSo the first step of the bank is to register your application. They have to store it somewhere in an IT system with all the details that you provide. After this, several checks have to be performed but in arbitrary order. So your case is virtually split and three people can work on this together. For instance, your credit has to be checked. Didn't you loan already too much. Bills have to calculate the capacity. How much can you loan at maximum? And they want to do some other checks in the system. Once all these three checks are performed the process merges again and a decision can be made. Do we approve or reject this loan application?

Skip to 5 minutes and 29 secondsAnd whatever the decision, an email is sent to you informing you of the decision. So although this may be a simplistic view of a process, I hope you can imagine that handling a loan application follows this type of processes and usually supported by an IT system and therefore creating a lot of data that can be analyzed afterwards. Let's look at a third example. Whenever you order something online, this order has to be delivered. So you have your order and it contains three packages-- red, blue, and green. And, of course, once you place the order, somewhere you have to pay so this needs to be tracked. Only once you pay your order will be shipped.

Skip to 6 minutes and 14 secondsSo after you've paid, for instance, the red and the green packages can already be shipped. So they're loaded on the truck and moved to your house and they are delivered. And maybe a couple of days later blue is in stock so the blue item is also shipped and delivered at your house. Again, a very simplistic view on the process but I hope you can see that one order has one payment but might contain multiple shipments. And if you're not at home a shipment might have several attempts. You're at home, it's returned and shipped another time. And in a process a lot of exceptions have to be handled. And, again, each step is recorded.

Skip to 6 minutes and 59 secondsSo process mining can assist in analyzing all the data that's generated. And I've shown you many examples of processes and software systems that actually support processes. So the bank cards, the public transport card, websites, they're all software systems that in some way interact with the world-- you, me, and other systems. And they have to know how to interact with this world. Therefore, they're configured using process models. Process models describe what input can be received and how the software system should react. So this process model in some sense describes the world and using that knowledge they configure the software system. So the software system is actually executing a process. When this happens I have to do that.

Skip to 7 minutes and 48 secondsIn this state I can expect this or that. Well, during this execution event data is recorded. Every step, every check in, every order, everything is recorded in databases. And this is the data that process mining looks at. And process mining bridges the gap between event logs, event data, and process models. Process mining techniques can be roughly divided in three categories. Discovery-- using solely the event data, we can discover a process model that describes how the software system for instance is behaving. Secondly, we can do conformance checking.

Skip to 8 minutes and 27 secondsUsing the added discover process model or a process model that was actually used to configure the system we can check using the data if the system or the users of the system comply with what the process model describes. Finally, once we have a process model, again either discovered or provided, we can enhance this. Since we can relate the data to the process model, we can predict timing information on top of this. Where are people waiting? Which path is most frequently executed? But also how are users in the system collaborating? Processes mining can provide answers to all these questions.

Skip to 9 minutes and 12 secondsIn this course we will cover several process mining activities and I cover several phases of a process mining project. So usually you start with the initialisation phase, followed by the analysis iterations, and then you summarize and you implement. For instance, you start with planning. What process am I looking at? What questions do I want answered? And this gives input for the extraction. Depending on the process and the question you want to answer, you need to extract certain data. Once you have the data you can process it. You can filter out particular cases or events to focus further. Using this filtered data, you can do process mining.

Skip to 9 minutes and 58 secondsThere are several techniques that can help you and we will discuss the main ones. Then a very important step comes-- evaluation. Given the process mining results, for instance a process model, you have to be able to evaluate how good this process model describes the data. And this gives input in changing, for instance, parameters or going back to the data processing phase and applying process mining techniques again. Once, after evaluation, you believe that the results are correct, you can summarize the results and this gives input for process improvement. So based on your summary of the results you can provide concrete process improvements to the process owner.

Skip to 10 minutes and 42 secondsAnd in this course we will cover the steps from extraction to summarizing these results and we'll mainly focus on the different types of process mining techniques there are and how you can evaluate the results. So in the next lectures, we will install the process mining tool ProM which we will use throughout this course to apply whatever we learn on real data immediately. So I hope to see you again in the next lecture soon.

Introduction

In this video we discuss that since data is everywhere, event data is also everywhere. We show several examples where event data is recorded in everyday life. We also detail three example processes to demonstrate this. This video is concluded with an overview of how process mining uses the event data that is created, as well as the several process mining activities that are covered in this course.

Share this video:

This video is from the free online course:

Introduction to Process Mining with ProM

Eindhoven University of Technology

Course highlights Get a taste of this course before you join:

  • Installing ProM lite
    Installing ProM lite
    video

    In this step we show how to find and install the free and open source process mining tool ProM lite.

  • Using ProM lite
    Using ProM lite
    video

    In this lecture we show the basic concepts and usage of ProM (lite): the resource, action and visualization perspectives.

  • Event logs
    Event logs
    video

    In this lecture we explain what an event log is and how it is structured. We also explain the most common attributes found in an XES event log.

  • Event logs in ProM
    Event logs in ProM
    video

    In this lecture we show you how you can load an event log in ProM and how you can get initial insights in the contents.

  • Converting a CSV file to an event log
    Converting a CSV file to an event log
    video

    Most data is not recorded in event log format. In this video we explain how a CSV file can be converted to an event log.

  • Exploring event logs with the dotted chart
    Exploring event logs with the dotted chart
    video

    After loading an event log into ProM it is important to apply the dotted chart to get initial process insights before process models are discovered.

  • Filtering event logs
    Filtering event logs
    video

    Before good quality process models can be discovered the event log data needs to be filtered to contain only completed cases for instance.