K-means clustering

Do you remember clustering? In this article, we discuss one of the popular clustering techniques, K-means clustering.
K-means clustering

Do you remember clustering? Clustering groups together a set of objects in a way that objects in the same cluster are more similar to each other than to objects in other clusters. When we make clusters, we measure similarity. We often use Euclidean distance. You remember that we talked about it. K-means clustering is one of the popular clustering techniques. Different clusters have centers of clusters. The center of the different clusters is called a centroid. A centroid can be an actual data point, but it can also be some other number. The name of K-means clustering came from the fact there are k centroids and k clusters in the data. Do we know what k is? We do not start from the exact k but we are guessing mostly based on our prior domain knowledge.

K-means clustering works by following steps. Let’s assume we decided k = 3. 
1. Select the initial three centroids (centers of a cluster). In our picture below, those are three dotted starts.
2. Calculate the distance between centroids and the other data points.
3. Assign the data points to the cluster depending upon the distance between the centroid and the data points. Since you have chosen three centroids, you would come up with three clusters.
4. The algorithm repeats steps 1-3 while it updates the centroids. Whenever you repeat steps 1-3, the location of centroids will change. And you finally find the best centroids, and those centroids will be moved from the original centroids as you can see the picture below. (best means in terms of shorter distances within the groups and longer distance between the groups)