Hi, and welcome to this lecture on event logs. So the input of process mining to ProM are usually event logs. And in the extraction phase, you decide what data to include to create an event log. In this lecture, I’ll first explain what an event log is, and what the ingredients are, and what the structure is of an event log, such that later on you know what data to include in an event log and how to create this.
So as I’ve shown you in earlier lectures, event data is everywhere. And it’s been recorded in many shapes and forms, so usually you don’t get an event log on a silver plate, you have to transform it. And therefore, you have to recognize how to get the right data attributes and put it in an event log. So let’s look at the public transport example. Again, this example I’ve shown you before. You buy a public transport card. You travel on the bus. And you top up and you travel some more. And then you stop. So just imagine what type of data could be recorded on each event.
For instance, when you buy a card, it could be recorded when you bought it, January 1, 2016; where you bought it, Main Street; what the card ID is; how much money you put on the card; et cetera, et cetera. So for the buy public transport card event, many data attributes could be recorded. So another event that could be recorded is when you check in on the bus. For instance, here, after you bought a card, you check in on the same day. And on this event, you can record where you checked in, Main Street, with which card, and on which bus, the 404.
So each event or activity in this process can contain a lot of data that you can use, where, when, how, et cetera, et cetera. And all this together forms what we call a trace, which is a sequence of events recorded for, in this example, a particular card.
And this trace can contain a lot of different attributes. For instance, of course, the card number, but also that it is a prepaid card. For this card, as I said, several events are recorded. And as I’ve shown you before, for instance, when did you buy the card, and when did your check in on the bus or the subway or the train or whatever. And so you form a whole trace or sequence of events and observed activities for this card, not only on a single day but throughout the year, for instance. And you can imagine that you have similar traces for other cards. For instance, this card is a business type card, and not a prepaid card.
So you get many different sequences of different activities being recorded. And this all together constitutes an event log. What are event log ingredients? You have a case, and this case can have a description, for instance. And other attributes, as I said before, the card number, the type of card, et cetera, et cetera. And for a particular case, you record several events, what happened, when did it happen, who did what, and what was the state change of the activity. And you have this type of information for many events in the same trace. To be able to store all this, we started the XES or Extensible Event Stream event log format, which exactly records traces of sequences of events.
So a trace usually has one attribute always that’s the concept semicolon name. So every trace has an attribute called concept semicolon name that indicates the name of this trace. In our example, for instance, the card number. Then, each trace contains several events. And each event, again, has a name, which usually is the activity that was executed, check-in, check-out, top up, et cetera. When did it happen? January 1, 2016, for instance. Who did it? So sometimes in a process, it’s not always the customer that does something, but it could be the clerk, or the secretary, or the manager. And additionally, you usually have the lifecycle transition. So for instance, one particular activity can be started and completed.
So that would trigger two events.
At 10:00, something was started.
And at 11:00, the same activity was finished. Using all these attributes, you build up a trace of the events, which you can analyze using process mining techniques. However, you usually don’t get your data already in an event log format. So you usually get data in a tabular form like this. You have several columns. And in this example, the first column is, for instance, the card ID. The second column, the action, what happened. The third column, the time stamp. The fourth column, the location. And the fifth column, the card balance after activity. But here, we can recognize the key concepts that we discussed earlier.
For instance, the first column is actually the trace ID, for which card did the particular action happen. The second column is the event or the activity that happened, check-in, check-out, buying a card. The third column is the time stamp, when did this happen. The other columns can be considered extra attributes. And you can use these in your analysis later on if you want, but you can also leave them out. So it’s important that you recognize the first three columns, the trace identifier, the activity that happened, and when did it happen. These three columns are the minimal requirement for an event log. All the rest, you can use, but it’s not required. It’s not necessary.
So to recap, event logs are recorded usually in the XES event log format, that’s what ProM works with. An XES event log consists of traces, and a trace consists of events. The trace can have a name and other attributes. And each event has a name, the timestamp when it happened, the resource that executed this– but this is optional– and the lifecycle transition that the event records– for instance, starting or completion of an activity. Also, this attribute is optional. So, now you know the key ingredients of an event log, and this will help you in the extraction phase. Don’t worry, however, in later lectures, we will cover this in a bit more detail.
But now you know what key ingredients there are to be expected in an event log. Hope to see you again in the next lecture.