Want to keep learning?

This content is taken from the Eindhoven University of Technology's online course, Introduction to Process Mining with ProM. Join the course to learn more.

Skip to 0 minutes and 7 seconds Hi, and welcome back. In today’s lecture, I will show you how you can filter event logs, since every real life event log will contain some noise or erroneous data, you want to remove that before you do your real analysis. So this really fits in activity three, data processing. So using, for instance, to build a chart of the log dialog, you can gain insights in the data, and then you can decide what you want to remove, since it’s noise, or where you want to focus on. And in general, there are two ways to filter your data. So let’s look at the event log from a dot chart perspective. You can decide to keep or filter out particular traces.

Skip to 0 minutes and 48 seconds So you decide these cases, for instance, gold customers or cards obtained or purchased in this and this time period, these I want to keep. Then you keep the whole sequence of events, but you remove all the sequences. Another choice you can make is to filter the events themselves. And options here are, for instance, to only look at the particular time frame. So then you cut away everything from the left and the right from the dot chart perspective. Or you can choose to filter particular activities. So to keep, for instance, particular colors in the dot chart. I will show you how you can execute both ways of filtering in the ProM Lite tool.

Skip to 1 minute and 29 seconds So let’s look which plugins can assist you in filtering your event log. So, with ProM Lite open, let’s again import the VPI 2012 event log such that I can show you what filtering techniques are available in ProM Lite. So we can open the action view, and if you just press filter, we see several plugins. And in this lecture, I will explain the top three filtering plugins. And I will start with a filter log using simply heuristics plugin. So this plugin consists of several wizard screens. The first one asks you which type of events you want to include.

Skip to 2 minutes and 10 seconds So what do you want to keep for the events that indicate the start of an activity, to completion of an activity, and/or the scheduling of an activity. Let’s– and if you click on, for instance, schedule, it turns orange, then you remove this event, or red, then you discard the whole trace. For this example, let’s put in an orange. So let’s remove all schedule event types. Here on the bottom, you have the event log name, so you can provide it another name. It’s always good practice to give it a meaningful names. So, for instance, for now, let’s call this one filtered for Mark. And now you can press Next and you go to the next dialog screen.

Skip to 2 minutes and 52 seconds The next question it asks is, what start events you want to keep, or traces starting with this particular event you want to keep? But there’s only one activity or event type that’s recorded, so let’s press Next. And these are all the events that the traces are ending with. And automatically, it has selected those events that represent 80% of the traces that end with this. So by selecting these five activity names, you keep 80% of the traces. Well, you can change this if you want to if you keep the Control button pressed, and then you can then select Declined. And for instance, select all cases that end with Registered, or whatever you think would make a logical selection.

Skip to 3 minutes and 47 seconds Now, if you press Next, now you can filter all the events. So again, this slice is put on 80%. So up to this step, with the selected events, you will keep 80% of events. So 80% of the events have one of these activity names. And again, using the Control, you can say I don’t want this, I want this. Or you can just click, scroll down, and when you press Shift and Click, then you have this whole range. So now I’m keeping the A and O activities, and I’m removing the W activities. Now if you press Finish– it can take a while because it’s going to the whole log– and now you have this event log.

Skip to 4 minutes and 34 seconds And just to recall, we have now less traces, 8,000, while in the original log, there were 13,000 traces. We also have far less events, close to 67,000, while in the original log there were 262,000 traces. So we can open the original event log again, and show you one of the other two filtering techniques. For instance, filtering the log on trace attributes values. When you start this plug-in, you get one type for each attribute that a trace has, in this example, the amount requested, the registration date, and the concept name, and for instance, you can select to keep only the traces that have at least an amount of 32,000 requested.

Skip to 5 minutes and 25 seconds So again, Plus-Shift, Finish, and now you see that you keep only 1,000 traces that request at least 32,000 euros. So if we go back to the Object View, we can again look for filter plugins. And similarly, as a trace attribute value filter plugin, you also have the Event At Root Failure filter plugin, which shows you a tab for every event attribute, and you can remove all the events that do or don’t have particular values. So it takes a while for the plugin to go calculate and find what all the values and attributes are.

Skip to 6 minutes and 11 seconds So once it has gone through the whole event log, you again see several tabs. So again, the concept name of the event, the lifecycle transition, the resource, and a time stamp, and for instance, similarly to the block filter using simple heuristics, you can decide to only keep the complete and start events. Now, if you press Finish, you see that we have all the cases still remaining from the original event log, but we removed a couple of thousand events in total. So using these three plugins, you can filter your event log to only contain the behavior you actually want to look at.

Skip to 6 minutes and 51 seconds So now I hope you have a good understanding of which plugins you can use, and how you can filter event logs, to focus on particular behavior that you want to analyze further. I hope to see you again in the next lecture.

Filtering event logs

In this lecture we explain how event logs can be filtered within ProM. Filtering is important to be able to discover high quality process models later on. It also allows you to focus on a particular part of the process, or only a particular type of case.

Share this video:

This video is from the free online course:

Introduction to Process Mining with ProM

Eindhoven University of Technology

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join: