Comparing efficiency

Hopefully you saw in the last section that bubble sort is a pretty inefficient algorithm. In almost every case, merge sort will sort a list much quicker than bubble sort. The only exception is when the list is already sorted or very nearly sorted, in which case sometimes the bubble sort will be faster, as it has an inbuilt method of checking if the list is sorted or not, unlike merge sort which will perform the same operations regardless of the state of the list.

Computer scientists and mathematicians have developed a way of describing how efficient an algorithm is. This is called big O notation. This is a fairly advanced topic, so we won’t go in to too much depth, but the algorithms can be described as follows.

  • Bubble sort’s efficiency is О(n^2)
  • Merge sort’s efficiency is O(n log n)

What does this mean?

Well for Bubble sort, the time taken to complete a sort is proportional to the square of the size of the list. That means doubling the size of a list will cause the number of operations that need to be performed to quadruple. This is called quadratic time.

For merge sort, the O(n log n) means that it runs in what is called Linearithmic time. You can see from the graph below that, although the number of operations does increase as the size of the list to be sorted increases, the rate of increase is significantly smaller than for quadratic time.

A chart labelled "Big-O Complexity Chart". The x-axis is labelled "Elements" while the y-axis is labelled "Operations", Moving clockwise from the y-axis, the chart is split into five colours - pink, orange, yellow, light green, and dark green. A legend describes these as corresponding to "Horrible" "Bad", "Fair", "Good" and "Excellent". In the pink region of the graph are three lines labelled "O(n!)", "O(2^n)" and "O(n^2)" with the former steepest and the latter the least steep. In the orange region is a line labelled "O(n log n)". In the yellow region is a straight line labelled "O(n)". In the green regions are the labels "O(log n)" and "O(1)".

Case Study: Timsort

Python has built a built in function for sorting lists. This example program uses the Python sorted function to create a sorted list from an unsorted list of randomly generated data.

#!/bin/python3

import random

data = []

for x in range (0, 100):
  data.append(random.randint(0, 100))

print(data)

print(sorted(data))

Sample output:

[92, 79, 31, 28, 72, 94, 23, 40, 82, 79, 51, 19, 20, 86, 23, 64, 100, 8, 53, 13, 28, 5, 40, 48, 26, 95, 89, 88, 26, 58, 5, 17, 80, 83, 50, 32, 22, 53, 5, 9, 41, 23, 57, 42, 3, 66, 30, 38, 28, 1, 21, 86, 29, 30, 70, 2, 97, 56, 84, 20, 67, 61, 46, 33, 6, 91, 37, 48, 14, 95, 48, 53, 62, 56, 9, 60, 68, 36, 68, 50, 57, 13, 40, 4, 46, 25, 76, 58, 56, 17, 60, 24, 80, 81, 45, 92, 95, 62, 13, 99]
[1, 2, 3, 4, 5, 5, 5, 6, 8, 9, 9, 13, 13, 13, 14, 17, 17, 19, 20, 20, 21, 22, 23, 23, 23, 24, 25, 26, 26, 28, 28, 28, 29, 30, 30, 31, 32, 33, 36, 37, 38, 40, 40, 40, 41, 42, 45, 46, 46, 48, 48, 48, 50, 50, 51, 53, 53, 53, 56, 56, 56, 57, 57, 58, 58, 60, 60, 61, 62, 62, 64, 66, 67, 68, 68, 70, 72, 76, 79, 79, 80, 80, 81, 82, 83, 84, 86, 86, 88, 89, 91, 92, 92, 94, 95, 95, 95, 97, 99, 100]

For sorted, Python uses an algorithm called Timsort, which was developed by Tim Peters based on techniques by Peter McIlroy. The algorithm is designed to work well on a variety of input data because it is used in many different situations. It uses a combination of merge sort and another algorithm called insertion sort, and adapts its choice of algorithm based on properties of the input data.

Which sort is best?

You might be wondering why we, or students, should care about how a sort works it there’s already a function in Python (and other languages) for sorting. This is one area where we can make the distinction between simply programming and thinking more like a computer scientist.

Often when programming, we know we need to sort something, but are not very concerned with how it’s done. All we need is a general-purpose “good enough” solution. However, as we’ve discovered with the merge and bubble sorts, different algorithms are better suited to different datasets and scenarios. Computer scientists concern themselves with selecting (or creating) the right algorithm to suit the data being processed.

Challenge

Can you add a Timsort function to your earlier program comparing the completion times of algorithms? Share your code and findings in the comments.

Share this article:

This article is from the free online course:

Programming 102: Think Like a Computer Scientist

Raspberry Pi Foundation