Skip main navigation

Dimensional Measures

What is the best way to measure distance? In this article, we discuss several methods.

You might have seen Euclidean distance d(x,y)= √((∑(x-y )^2 )) in high school math. It is summing up the squares of two points, after then square rooting the summation. For example, in the XY plane, the distance between x=(1,2) and y=(3,4) is √(〖(1-3)〗^2 + 〖(2-4)〗^2 ) = √8 which is roughly 2.828. We call this kind of distance measure the Euclidean distance. Euclidean distance is probably the most commonly used distance. However, that does not mean it is the only way to measure distance.

There are other ways to measure distance: Manhattan distance, Minkowski distance, standardization distance are some of those. First of all, the Manhattan distance uses absolute values instead of the sum of squares. Let’s use our example to explain this. If we use Manhattan distance instead of Euclidean distance, we will have -(1-3)+(-(2-4)) = 4.

Do you know that the square root of x ( √x ) is, by definition, (x)^(1/2)? (x)^(1/2) is the same expression for √x. The square of (x)^(1/2) is x; it is the definition of the square root. Minkowski distance is something similar, but in Minkowski distance, the exponent is not necessarily 1/2. It can be any number. That is why the exponent is generalized to 1/m in Minkowski distance. m can be any natural number. It can be 1,2,3,4,5,.. and so on.

Standardization distance uses standard deviation. Standard deviation is a measure of how far away a point is from the average of all data. If the standard deviation is high, that means some of the data are far away from the mean. Standardization distance divides Euclidean distance by standard deviation (we denoted using si). It allows us to avoid skewness arising from the difference in variance and scale.

Mahalanobis distance is similar to standardization distance except that Mahalanobis distance also uses correlation along with standard deviation. Correlation measures how closely related the attributes are. You can obtain the correlation by dividing the covariance of two variables with two standard deviations from each variable. The higher the correlation, the more closely related the variables are. Mahalanobis, the renowned statistician, wanted to consider correlation when measuring distance.

This article is from the free online

Artificial Intelligence and Machine Learning for Business

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now