Hi, and welcome to the closing of week two. And in this lecture, I would like to briefly summarize what we have discussed this week. So in this week we focused on particular process mining algorithms and how you can evaluate their results. Now, we were mainly focusing on the discovery part of process mining, so how you get from event data to a process model describing the behavior that’s in this data. So first, we explained how a process model can be described. And therefore, we introduced the Petri net notation. So given this example, a token flows through the process. And whenever a transition fires, it consumes a token for each incoming arrow and produces one on each outgoing arrow.
And this process model is now animated. And you see how the token moves through the process. At certain points, you can make a choice. And when the token is in the final place, the process has ended. Some of you might already know the BPMN notation. So the process model on the top describes the same behavior as the Petri net on the bottom. But in this course, and most process mining techniques however use Petri net notation. But you can see the similarities between BPMN notation. In another lecture, we explained when the process model is sound, or in other words bug free. So there are three properties that if all are satisfied. The process model is sound.
So the process model should have the option to complete. When it completes, it should be properly completed, and there should be no dead transitions. When all this is satisfied, the process model is sound and therefore contains no bugs. Then in the second half of this week we showed you how you can relate process models and behavior. So for clarity, we abbreviated the activity names to letters of the alphabet, and then, for instance, this process model allows for particular traces or sequences of activities. And there are multiple activity sequences possible that can be generated by this Petri net, mainly because of the choice between e and f and the parallelism between b, c, and d.
What we just did was simulate the process model. We looked and played the process model and saw what possible activity sequences were possible. Discovery, and what is actually the main point of process mining is actually doing this the other way around. Given only the observations, what process model describes this behavior? And in the next week, we will focus on replay. So when we have both the process model and the data, you can put them all together, and you can see where the deviations are. However, this week we mainly focused on process discovery. You can also compare the quality of a process model that you discovered with the data that you used to discover it.
And there are actually four quality dimensions, and actually they’re opposite forces. Whenever you try to optimize for one, you usually reduce the quality in at least one of the other directions. The first quality dimension is replay fitness. Replay fitness is actually the main quality force. And replay fitness evaluates how well the observed data can be replayed by the process model. The other quality dimension that’s important is precision. You also don’t want your process model to allow for any behavior, because then it’s too wide or too lose of a description of what you have seen. The third quality dimension is generalization. The event log usually contains only a snapshot of all possible behavior.
And although you don’t want your process model to be imprecise, you do want to have some generalization or interpretation of what you have seen. Finally, you and I should be able to read the process model. So the process model should be sufficiently simple. The trick of many process discovery algorithms, but also you as a process miner is to balance these forces. And usually, as I mentioned before, you cannot really optimize all four at once. You have to pick your priorities right. So given this example data and the processing on the bottom, we can see that we can replay all the traces in this process model. It generalizes.
So it has not seen all possible combinations and all possible behavior of this process model. But given these example traces, it makes correct assumptions on parallelism and choice behavior. And finally, it is rather simple to read. However, it’s not perfectly precise. So this process model allows for more behavior than you have seen in your log. And this is actually the trade off between precision and generalization. We also introduced the process model checklist. One of the first things you have to check is is the process model sound? Then you can check the four quality dimensions to see how well the process model and the data fit together. In this course, we also discussed several process mining algorithms.
And the alpha miner was actually the first discovery algorithm that we discussed. However, the alpha miner is the most simple algorithm and has some issues. For instance, it can produce unsound models. And it cannot really handle noisy behavior in the data. The next algorithm that is able to handle noisy behavior was the heuristics miner. But it gave this result on the example data. Hence it is also not sound and has a not very good replay fitness. The inductive miner, however, guarantees to discover sound process models. And all the data that we put in actually discovers the process model that we expect to find.
Then we also showed another process discovery algorithm the fuzzy miner, which merely shows a process graph and where soundness is also given. But at the same time, the model is of a different nature than a Petri net. So comparing all these algorithms, I hope you know what the pros and cons are of each and that allows you to select a proper process discovery algorithm for your data. In the next week, we will focus on conformance and enhancement techniques and detail this further. So we’re still in the process mining activity phase, but also already cover evaluation and interpretation of the data. So I hope to see you in the next week, and good luck with the quiz.