Learn more about this course.

Transformations

We have seen that the NumPy library in Python provides useful functions for summarising and counting values in a dataset.

Another common operation is to transform a variable to a new scale so that variables on different scales can be meaningfully compared.

For example, one variable might only have tiny numbers on a scale of 0.001 to 0.005, whereas another variable might have huge numbers on a scale of 1 billion to 5 billion. Therefore, it’s helpful to transform the two variables to the same scale. Each value in the array is transformed to a new value.

Here are some examples of different transformations commonly applied to data.

Want to keep
learning?

This content is taken from
Coventry University online course,

Get ready for a Masters in Data Science and AI

View Course

Scaling to values between zero and one

For this type of transformation, we transform the maximum value to one and the minimum value to zero. All other values are transformed proportionally in between. So, a value halfway between the maximum and minimum values will be transformed to 0.5. Consider the heights (metres) example below, the 0.86 is the minimum value and therefore will be transformed to zero. The 2.02 is the maximum value and will, therefore, be transformed into one.

height = np.array([0.86,2.02,1.87,1.44,1.80])
scaled_height = (height-np.min(height))/(np.max(height)-np.min(height))
print(scaled_height)

Centring and standardising

Centring is the process of subtracting the average from each value so that the transformed values have an average of zero. Does that mean they’re now on a polar +/- scale?

Standardising is the process of dividing the centred values by the standard deviation so that the transformed values have an average of zero and a standard deviation of one. Standardised values are commonly known as z-scores.

Consider the weights (kg) example below. Notice how the whole array is transformed in one line of code.

weight = np.array([15,112,106,91,85])
centered_weight = weight - np.mean(weight)
standardised_weight = (weight-np.mean(weight))/np.std(weight)
print(np.mean(weight))
print(centered_weight)
print(standardised_weight)

Construct new variables from old

Body mass index (BMI) is a measure designed to roughly classify people as underweight, normal weight, overweight or obese. It is calculated as weight (in kg) of a person divided by the square of height (in metres) of that person. We can do this calculation in Python. Note that ** is the ‘to the power of’ operator in Python.

height = np.array([0.87,2.02,1.87,1.78,1.80])
weight = np.array([15,112,106,91,85])
bmi = weight/(height**2)
print(bmi)

We have seen how to use basic mathematical ideas in Python to calculate summary statistics and to transform variables. In this way, we see that analysis of data involves transformations (each value in the array is transformed to a new value) and summary values (one value is calculated to summarise all values in an array). Notice that these calculations and transformations involved mathematical expressions (formulas). We, therefore, need to be sure exactly how Python will evaluate these. We’ll look at this in the next step.

Want to keep learning?

This content is taken from Coventry University online course

Get ready for a Masters in Data Science and AI

View Course

See other articles from this course

This article is from the free online

Get ready for a Masters in Data Science and AI

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Transformations

Want to keep
learning?

Get ready for a Masters in Data Science and AI

Scaling to values between zero and one

Centring and standardising

Construct new variables from old

Want to keep learning?

Get ready for a Masters in Data Science and AI

Get ready for a Masters in Data Science and AI

Get ready for a Masters in Data Science and AI

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Transformations

Want to keep learning?

Get ready for a Masters in Data Science and AI

Scaling to values between zero and one

Centring and standardising

Construct new variables from old

Want to keep learning?

Get ready for a Masters in Data Science and AI

Share this

Get ready for a Masters in Data Science and AI

Get ready for a Masters in Data Science and AI

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?