Hi, and welcome back. In today’s lecture, I will show you how you can filter event logs, since every real life event log will contain some noise or erroneous data, you want to remove that before you do your real analysis. So this really fits in activity three, data processing. So using, for instance, the dotted chart or the log dialog, you can gain insights in the data, and then you can decide what you want to remove, since it’s noise, or where you want to focus on. And in general, there are two ways to filter your data. So let’s look at the event log from a dotted chart perspective. You can decide to keep or filter out particular traces.
So you decide these cases, for instance, gold customers or cards obtained or purchased in this and this time period, these I want to keep. Then you keep the whole sequence of events, but you remove other sequences. Another choice you can make is to filter the events themselves. And options here are, for instance, to only look at the particular time frame. So then you cut away everything from the left and the right from the dotted chart perspective. Or you can choose to filter particular activities. So to keep, for instance, particular colors in the dotted chart. I will show you how you can execute both ways of filtering in the ProM Lite tool.
So let’s look which plugins can assist you in filtering your event log. So, with ProM Lite open, let’s again import the BPI 2012 event log such that I can show you what filtering techniques are available in ProM Lite. So we again open the action view, and if you just press filter, we see several plugins. And in this lecture, I will explain the top three filtering plugins. And I will start with a filter log using simple heuristics plugin. So this plugin consists of several wizard screens. The first one asks you which type of events you want to include.
So do you want to keep all the events that indicate the start of an activity, to completion of an activity, and/or the scheduling of an activity. Let’s– and if you click on, for instance, schedule, it turns orange, then you remove this event, or red, then you discard the whole trace. For this example, let’s put it on orange. So let’s remove all schedule event types. Here on the bottom, you have the event log name, so you can provide it another name. It’s always good practice to give it meaningful names. So, for instance, for now, let’s call this one filtered for MOOC. And now you can press Next and you go to the next dialog screen.
The next question it asks is, what start events you want to keep, or traces starting with this particular event you want to keep? Well there’s only one activity or event type that’s recorded, so let’s press Next. And these are all the events that the traces are ending with. And automatically, it has selected those events that represent 80% of the traces that end with this. So by selecting these five activity names, you keep 80% of the traces. Well, you can change this if you want so if you keep the Control button pressed, and then you can unselect Declined. And for instance, select all cases that end with Registered, or whatever you think would make a logical selection.
Now, if you press Next, now you can filter all the events. So again, this slice is put on 80%. So up to this step, with the selected events, you will keep 80% of events. So 80% of the events have one of these activity names. And again, using the Control, you can say I don’t want these, I want these. Or you can just click, scroll down, and when you press Shift and Click, then you have this whole range. So now I’m keeping the A and O activities, and I’m removing the W activities. Now if I press Finish– it can take a while because it’s going through the whole log– and now you have this event log.
And just to recall, we have now less traces, 8,000, while in the original log, there were 13,000 traces. We also have far less events, close to 67,000, while in the original log there were 262,000 traces. So we can open the original event log again, and show you one of the other two filtering techniques. For instance, filtering the log on trace attributes values. When you start this plug-in, you get one tab for each attribute that a trace has, in this example, the amount requested, the registration date, and the concept name, and for instance, you can select to keep only the traces that have at least an amount of 32,000 requested.
So again, press Shift, Finish, and now you see that you keep only 1,000 traces that request at least 32,000 euros. So if we go back to the Object View, we can again look for filter plugins. And similarly, as a trace attribute value filter plugin, you also have the event attribute filter plugin, which shows you a tab for every event attribute, and you can remove all the events that do or don’t have particular values. So it takes a while for the plugin to calculate and find what all the values and attributes are.
So once it has gone through the whole event log, you again see several tabs. So again, the concept name of the event, the lifecycle transition, the resource, and the time stamp, and for instance, similarly to the log filter using simple heuristics, you can decide to only keep the complete and start events. Now, if you press Finish, you see that we have all the cases still remaining from the original event log, but we removed a couple of thousand events in total. So using these three plugins, you can filter your event log to only contain the behavior you actually want to look at.
So now I hope you have a good understanding of which plugins you can use, and how you can filter event logs, to focus on particular behavior that you want to analyze further. I hope to see you again in the next lecture.