Hi, and welcome back. In this lecture, I would like to show you a particular case study on real data. And in the next article there are also links to other the case studies that you can look to for inspiration. So one particular nice data source for public data is the BPI Challenge. So this is a yearly open challenge since 2011. And the data and also the submitted reports are public. So the challenge is, data is published, there are some questions, people submit their report, the best one wins. But all the reports are made public. And the links to all the BPI Challenges but also some other sources are provided in the article.
And in this lecture I would like to focus on the BPI Challenge of 2012. And it’s actually the real life event log that I showed in all the previous lectures. And this data comes from a Dutch Financial institute providing consumer loans. So consumers can ask the institute, I want to loan this amount of money. And then the institute says yes, or no, or we need more information. The event log contains 260,000 events, spread over 13,000 cases. And it contains data attributes such as the amount requested which makes it interesting for data analysis purposes. The submission or the process starts at the web page followed by some automated checks. And the activities are divided in three types of states.
So events starting with A denote states of the application. Events starting with O denote states of the offer belonging to the application. And W are work items belonging to belong to the application. And A and O are rather structure, but W is a bit unstructured because it’s mainly manual activities. And A and O are particular states that follow a clear procedure. So I would like to go through three submissions for the BPI Challenge 2012. So in this submission, the applications have been classified. On the top note all the applications are present and then a distinguishment has been made between approved, declined, cancelled, or undecided, offer, or no offer and fraud, or no fraud detected.
So this already gives a categorisation of classification over all the 13,000 applications. Another analysis that they did was the number of resources active on a particular day. So the red dots on a number of resources active and you see that it’s usually is between 20 and 30. Except on Saturdays that’s the green line, and on Sundays even less resources are active. They also looked at this particular resource. And when they start and then working days. So for instance this resource 10,138
you can see that they usually start between 8:00 and 9:00
in the morning and then finish around 4:00. Or they have a late shift where they start afternoon and they finished around 8 o’clock in the evening. So you see another plot of the start and end times of another resource. But here you see that this resource is working shorter hours.
So for instance this resource usually work from 5:00 to 9:00.
In this report there are also main observations made. For instance, that there’s an automated resource, resource number 112 which is involved in approval of 3 loan applications. And since this is an ultimate resource, this is suspicious and should be investigated further. Also, in 2 cases a customers called after the application was already cancelled. So this could have been worked or can be prevented. And in 74 cases the completion checks are performed after the application is already accepted. So this might be something that the process owner might want to investigate further and see whether this is an issue and how it then can be prevented. There are also some data ambiguities discovered which are usually when you’re analyzing your data.
But it’s important to know this because they might be fixed. And then when you get a new data sets in sometime you might be able to do better analysis.
Then in another report this diagram is proposed. And what they do here is over the runtime off the case, so the number of days after that the application was received. They plot how many communities differently applications were approved, cancelled, or declined. And what you see is offer around 30 days suddenly a lot of applications are cancelled. And this seems to be an automated activity. However, it also increases the work time of the person-day spent significantly. However, they also see that the offer run 20 days after receiving application, majority of cases was already approved. So the recommendation in this report was to see whether this term of 30 days could be moved a bit earlier to prevent waste off manpower.
But in another report they analyze when the cancelled activity was executed. And you see that after receiving the application cancellation is done almost always. But there’s a particular line visible which is actually again the 30 days. But now analyzed from the Dotted chart view. Another Dotted chart that they made was plotting the system activities. And then you clearly see the system activity so usually 112 is mainly executing the first activities. So submission through the web form and the first checks, but also the ultimate canceling and sometimes activities after this period. So this shows that particular activities are done by the system resource.
As I mentioned more cases are available. So for the BPI challenge 2012 there were several submissions and you can look at them. So since you know the data, it might be interesting to look at the other submissions. But the article that follows also contains links to other case studies. For instance, the IEEE task force website also lists several case studies where process mining has been successfully applied within the company on real data. So I hope this inspires you for new types of analysis and shows you the value of process mining in practice. I hope to you see again in the next lecture of this week.