Contact FutureLearn for Support
Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip to 0 minutes and 10 secondsHi. Welcome back. In this lecture, I will explain how the alpha miner works. So the alpha miner is actually the very first algorithm that bridges the gap between event logs, or observed data, and the discovery of a process model. And since it was the very first algorithm to be created, it also has its flaws, but it was a good starting point to continue on. So the alpha miner has a few main steps. And I will, in the remaining of this lecture, briefly and on a high level explain how these work. So the first step that the alpha miner does is it scans the traces for ordering relations between activities. And using the relations, it builds a footprint matrix.

Skip to 0 minutes and 54 secondsIn the next step, it takes this footprint matrix and converts it to a Petri net. So there are four ordering relations that the alpha miner can detect. There's a directly follows, indicated by the greater than sign. So if A is directly followed by B in a trace, it records this using this notation. And it does so for every pair. Now, it can deduce a sequence of A sequence B, if it sees that A is sometimes directly followed by B, but never that B is directly followed by A. Hence, you can conclude that A and B are in a sequence. You always see first A and then B, and never the other way around. You can also see them in parallel.

Skip to 1 minute and 44 secondsSo if you've seen both A being directly followed by B, as well as B being directly followed by A, you can conclude that A and B are in parallel. They are both executed, but in arbitrary order. Finally, there is no direct relation. So if you've never seen A being directly followed by B or B being directly followed by A, you indicate it with this symbol. And using these for ordering relations, we can construct a footprint matrix. So let's look at again these five traces that we've seen before. And the alpha miner will construct the following footprint matrix. So let's look at it in a bit more detail.

Skip to 2 minutes and 30 secondsAnd I'll explain a couple of cells, and invite you to see if you can validate the other cells. So for instance, if you look at this cell, you see that A is never directly followed by A. That is, you never see A and then A. If you look at the cell A and B, you see that A is a sequence with B. So you see A, and you can see B straight after. However, you never see B followed by A. So let's look at the data. You indeed have some observations where you see A and then B, but nowhere there's a trace where you see B which is directly followed by A. Let's look at another cell.

Skip to 3 minutes and 12 secondsFor instance, this one, B and C, how are they related? You see that B and C are in parallel. How do we see this? There are some observations where B is directly followed by C. And there's also an observation where C is directly followed by B. And according to the rules above, this has been a parallel relation. Let's look at one other cell. When you look at the cell C and then A, you see that they are in the reverse sequence relation. So it can be that C is preceded by A, but never that A is after C. You see that this footprint matrix is mirrored over the diagonal.

Skip to 3 minutes and 52 secondsSo I've just discussed C and then A, but in the cell A and then C, you see the normal sequence operator. And again, I invite you to validate the rest of the table. So using the footprint matrix, we can build a Petri net. So we've observed that you always start with A, so we create a place with a token, which is connected to A. But now, we need to decide what comes after A. We can find it in this part of the footprint matrix. So you can see that A is always followed by B, C, and D. So we can add the connections. But we also saw that B, C, and D are in parallel.

Skip to 4 minutes and 35 secondsTherefore, you create this parallel split. Now, we look at another part of the footprint matrix. How is E related to B, C, and D? Well, the alpha miner observers that B is never followed by E, but D and C are. So therefore, it creates this relation. So we'd observed the sequence between D and E, and between C and E, and therefore, connected them like this. Similarly, for the activity F, it observes that B and F and D and F are observed. And hence, it connects the places like this. Now finally, activity G, it's in sequence with both E and F. So you've joined the choice and you conclude with G.

Skip to 5 minutes and 25 secondsSo given this data, this is the Petri net that is discovered by the alpha miner. So let's see how well this Petri net actually describes this data using our process model checklist. First, let's check soundness. Is this process model sound? No, it's not. Look at this state. So after you've executed A and B, C, and D in any order, you are in this state of the Petri net. Now, you can do two things. You can fire either E or F. So what happens if you fire F and then G? You're in this state. The token in between C and E is remaining, and its violating one of our soundness rules, the proper completion rule.

Skip to 6 minutes and 10 secondsWe are in a state where the final place is marked, where there's work left behind. So actually, this process although discovered by the alpha miner is not sound. So let's validate the order four quality dimensions. Replay fitness. Can it replay all the observed traces? Yes, it can It cannot replay perfectly or everything that we have seen. Is the precision good? Well, it's OK. It's not perfect. The model allows for a bit more behavior if you correct for the unsoundness, it will allow for some more behavior, but it's not that. It's also generalizing. It's introducing behavior that you have not directly seen, but that you can conclude from what you've observed. And also, it's a rather simple model.

Skip to 6 minutes and 58 secondsBut the main problem here is that the process model that it discovered is not sound.

Skip to 7 minutes and 6 secondsSo in this lecture I've explained to you how to alpha miner works, and in the next lecture, I will show you how to use it in ProM and what the results are, starting with the example that I've just discussed. I hope to see you again in the next lecture.

Alpha miner

In this step we explain how the very first process discovery algorithm, the Alpha miner, works.

We show the main steps taken by the Alpha miner to get from the input data to a process model. We also show the limitations of the Alpha miner.

Share this video:

This video is from the free online course:

Introduction to Process Mining with ProM

Eindhoven University of Technology

Course highlights Get a taste of this course before you join:

  • Introduction

    Introduction to process mining: recognizing event data, what is process mining and what can process mining analyse.

  • Installing ProM lite
    Installing ProM lite

    In this step we show how to find and install the free and open source process mining tool ProM lite.

  • Using ProM lite
    Using ProM lite

    In this lecture we show the basic concepts and usage of ProM (lite): the resource, action and visualization perspectives.

  • Event logs
    Event logs

    In this lecture we explain what an event log is and how it is structured. We also explain the most common attributes found in an XES event log.

  • Event logs in ProM
    Event logs in ProM

    In this lecture we show you how you can load an event log in ProM and how you can get initial insights in the contents.

  • Converting a CSV file to an event log
    Converting a CSV file to an event log

    Most data is not recorded in event log format. In this video we explain how a CSV file can be converted to an event log.

  • Exploring event logs with the dotted chart
    Exploring event logs with the dotted chart

    After loading an event log into ProM it is important to apply the dotted chart to get initial process insights before process models are discovered.

  • Filtering event logs
    Filtering event logs

    Before good quality process models can be discovered the event log data needs to be filtered to contain only completed cases for instance.