Optional: Is Kruskal’s algorithm correct?
If you would like to develop your understanding of Kruskal’s algorithm further, you may want to do this optional step. (You will not need to do this for the assessment.)
Kruskal’s algorithm is simple but it will always find a minimum spanning tree for a connected graph. When computer scientists design algorithms two things are important:
- Is it correct?
- How long does it take to execute?
Is it correct?
Mathematics and logic can be used to demonstrate and reason why an algorithm is correct. This is more robust than demonstrating that the algorithm works in a handful of situations, because it demonstrates nothing about the situations that weren’t tested. While the correctness of Kruskal’s algorithm requires much more formalism than it is possible to introduce in this course, it is still interesting. Being able to demonstrate that a solution is optimal is an important process in computer science.
You can try to develop an intuition about how you would go about demonstrating the correctness of Kruskal’s algorithm.
To show that Kruskal’s algorithm is correct you have to demonstrate two things:
1. The output is connected and has no cycle.
The input graph to Kruskal’s algorithm is connected. The output of Kruskal’s algorithm can’t contain cycles by definition because the algorithm states that an edge won’t be included if it creates a cycle. Consequently, the output of Kruskal’s algorithm does not contain a cycle. The output of Kruskal’s algorithm is connected since the first edge encountered by the algorithm which connects together two part of the graph is included.
2. The set of edges to include have a minimum sum – the hard bit.
This is the tricky part. You see that the included edges have minimum distance by showing that the following statement is true throughout the execution of Kruskal’s algorithm: “If (F) is the set of edges chosen at any stage of the algorithm, then there is some minimum spanning tree that contains (F) and none of the edges rejected by the algorithm”. The statement essentially says that at each point while executing Kruskal’s algorithm you can always extend the set of edges and eventually get to an optimal solution.
Clearly the statement is true when you start to execute the algorithm as there are no edges and a set of no edges can always be extended into an optimal solution.
Now consider a set of edges (F) given to us by Kruskal’s algorithm, and let (S) be a minimum spanning tree that contains (F). Let us assume that the statement is true for (F).
If the next edge, let us call it eee, added by Kruskal’s algorithm is also in (S), then the statement is also true for (F+e) (this is the set of edges (F) with the edge (e) added). If eee is not in (S) then when we add the edge (e) to (S) we will create a cycle, and the cycle contains edges which can’t be in (F). Let (f) be an edge that is part of the cycle in (S+e) but not in (F+e). Note that the edge (f) in part of the set (S), and as a consequence of the statement, it has not been considered by Kruskal’s algorithm yet (it must be further down the list of edges). Therefore, (f) must have a distance value which is greater than or equal to the distance value of (e). The set of edge (S-f+e) is connected and contains no cycles and has the same minimum sum or less as (S). So (S-f+e) is a minimum spanning tree that contains (F+e) and consequently the statement holds.
When Kruskal’s algorithm finishes executing the output is a connect graph without cycles. If the statement “If (F) is the set of edges chosen at any stage of the algorithm, then there is some minimum spanning tree that contains (F) and none of the edges rejected by the algorithm” is also true then the output is a minimum weight connected graph without cycles – a minimum spanning tree.
How long does it take to execute?
The amount of time it takes for an algorithm to run is important. For problems in the real world there is often a time constraint on obtaining a solution. The amount of time it take for an algorithm to execute is often dependent on the size of the problem. For example, for Kruskal’s algorithm you would expect it to take more time to execute the algorithm if there are more cities and less time to execute if there are fewer cities. As a result of this observation we can express the amount of time it take an algorithm to execute as a function of the size of the input.
For example, if the input of an algorithm has size nnn then the amount of time it takes for the algorithm to execute can be written as (f(n)). The function (f(n)) could be any function. In the table below are some example functions and the amount of time it would take for an algorithm to execute. To provide some context for the large number in the table, the universe is estimated to be (4.3times 10^{23}) seconds old. You can see that even for relatively small sized inputs the execution time would be longer than the universe is estimated to have been in existed for some of the functions.
(f(n)) | (n=1) | (n=2) | (n=5) | (n=10) | (n=100) | (n=1000) |
---|---|---|---|---|---|---|
(f(n) = 1) | 1 | 1 | 1 | 1 | 1 | 1 |
(f(n) = log n) | 0 | 0 | 1 | 1 | 2 | 3 |
(f(n) = n) | 1 | 2 | 5 | 10 | 100 | 1000 |
(f(n)=n^2) | 1 | 4 | 25 | 100 | 10000 | 1000000 |
(f(n)=n^3) | 1 | 8 | 125 | 1000 | 1000000 | 1000000000 (31.7 years) |
(f(n)=2^n) | 1 | 4 | 32 | 1024 | (1.26 times 10^{30}) | (1.07 times 10^{301}) |
(f(n)=n!) | 1 | 2 | 120 | 3628800 | (9 times 10^{157}) | (4 times 10^{2567}) |
Let’s look at the time to execute Kruskal’s algorithm:
Kruskal’s algorithm
Take the edges of the graph and write them in a list ordering them by the number associated with each edge from lowest number to highest number. The lowest number should be at the top of the list an the highest number at the bottom of the list.
Starting from the top of the list add the edge with the lowest number if by adding the edge it does not introduce a cycle into the graph. Cross the edge you have added off the list.
Repeat step 2 until you have crossed all of the edges off of your list.
Step 1 of Kruskal’s algorithm requires you to sort a set of edges based on the distance between the cities. In the GigaFibre scenario the maximum number of edges that you might have the sort is (frac{n(n-1)}{2}), where (n) is the number of cities. Let (m) be the number of edge in the grade. There are lots of different sorting algorithms such as Insertion sort, bubble sort and merge sort. The best of them take approximately (n log n) time to execute, where nnn is the number of items you want to sort. Therefore, we can sort the edges in approximately (m log m) time to execute.
Steps 2 and 3 of Kruskal’s algorithm executed at most once for each edge in the graph, and then the algorithm will terminate. Step 2 of Kruskal’s algorithm requires us to check if adding the edge creates a cycle. There are a few ways of doing this either using Depth first search, Breadth first search, or using a clever way of structure the data called the union-find data structure. The best of these methods takes a constant amount of time to decide if there is a cycle. Therefore steps 2 and 3 of Kruskal’s algorithm takes approximately (m) time to execute.
Therefore, overall the algorithm executes in approximately (m + m log m) time.
Introduction to Technology-Assisted Decision-Making
Introduction to Technology-Assisted Decision-Making
Reach your personal and professional goals
Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.
Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.
Register to receive updates
-
Create an account to receive our newsletter, course recommendations and promotions.
Register for free