Skip main navigation

Excel Functions: How to Wrangle Data with Spreadsheets

This article explores some common functions of Excel in data wrangling and examples of how you might use them.
There’s a variety of tools and techniques for importing and processing data. Traditionally, and for relatively small, simple data sets, we do this manually with text editors or spreadsheets. For larger volumes of data or more complex scenarios, we can use programmatic solutions. Python, SQL and R are often used for this type of data wrangling and have built-in functions (libraries) to speed up the process or perform common tasks.
Here, we’re going to focus on how to wrangle data in Excel. In some cases, Excel might not be able to convert columns into the format you want. In this case, you can use a series of functions to manipulate your data. The table below shows some common functions and examples of how you might use them.
 
Function Example Output
CLEAN
Removes all non-printable characters from text
=CLEAN(CHAR(9)&”Monthly report”) ‘Monthly report’
CONCAT
Returns combined text from multiple ranges and / or strings without a delimiter
=CONCAT(“hello”, “world”) ‘helloworld’
FIXED
Returns a number rounded to the specified number of decimal places. Note: If you omit the number of decimals, FIXED will round to 2 decimal places by default.
=FIXED(1.1234,2) 1.12
LEFT
Returns the left-most character(s) of a string
=LEFT(“hello”)
=LEFT(“hello”, 4)
‘h’
‘hell’
LOWER
Returns a text in lowercase
=LOWER(“Hello, World!”) ‘hello, world!”
MID
Returns the character(s) between specified start and end points
=MID(“Hello, World!”,8,12) ‘World’
PROPER
Returns text with the first letter capitalised, as well as any other letters that follow a non-letter character
=PROPER(“hello, world!”) ‘Hello, World!”
REPLACE
Returns a text string with replacement text between points as specified
=REPLACE(“Hello, World!”,8,12, “everyone”) “Hello, everyone!
RIGHT
Returns the right-most character(s) in a string
=RIGHT(“hello”)
=RIGHT(“hello”, 4)
‘o’
‘ello’
SUBSTITUTE
Returns text with one or more text substitutions
=SUBSTITUTE(“Hello, World!”, “World”, “everyone”)
=SUBSTITUTE (“Hello, World!”, “I”, “L”)
‘Hello, everyone!’
‘HeLLo, WorLd!’
TEXTJOIN
Returns a string combining text from multiple ranges and / or strings including a delimiter
=TEXTJOIN(“, “, TRUE, “Hello”, “World!”) ‘Hello, World!’
TRIM
Removes all spaces other than those between words
=TRIM(“ Hello, World! “) ‘Hello, World!’
UPPER
Returns text in upper case
=UPPER(“Hello, World!”) ‘HELLO, WORLD!’
VALUE
Returns a number from a text string representing a number
=VALUE(“$1000”) 1000
 
When you need to tackle duplicate values, Excel can find unique values and remove duplicate rows. These are two closely related procedures.
 
To filter for unique rows, click Data > Sort & Filter > Advanced and select the appropriate options. Note: This only hides duplicate values from view.
 
Screenshot of the ‘Sort & Filter’ section of the menu ribbon in Excel
 
To remove duplicate values entirely, use the Remove Duplicate functionality. Find this under Data > Data Tools > Remove Duplicates.
 
Screenshot of the ‘Remove Duplicates’ button in the Excel menu ribbon
 
Finding inconsistent values, or partial duplicates, is a little harder. A good technique is to highlight cells using conditional formatting. This allows you to specify rules for what is highlighted. To highlight duplicates using conditional formatting, select the range of cells and click Style > Conditional Formatting > Highlight Cells Rules > Duplicate Values.
 
Screenshot of the ‘Conditional Formatting’ menu in Excel with ‘Duplicate Values’ highlighted.
 
You can combine multiple conditions and different comparisons (eg, ‘contains’) to find some partial matches.
 
 
Note: The comparison of duplicate values depends on what appears in the cell – not the underlying value stored in the cell. For example, if you have the same date value in different cells, one formatted as ‘3/8/2006’ and the other as ‘Mar 8, 2006’, the values are unique.
 
 
Go to: Filter for unique values or remove duplicate values [1]
 
This Microsoft Support page has more about removing, filtering, or highlighting duplicates.
 
We know that if there’s missing data, we often need to manually remove blank rows, or rows with blanks. That can be tedious, but you can speed up the process using the Go To Special feature.
 
To remove blank rows and rows with blank cells:
 
 
    1. Select your range of data
 
    1. Click Home > Find & Select > Go To Special
 
    1. In the dialog box, select Blanks and then press OK. Excel will select all blanks in your selection
 
    1. To delete these rows, click Home > Delete > Delete Sheet Rows
 
 

Enriching

 
Excel doesn’t provide specific functionality to create dummy variables from data, but you can use the Find and Replace functionality on a selection, or create an extra column to hold your coded categorical data. If you create a new column, use a formula such as =IF(A1=’Yes’, 1, 0) to check whether the value in a cell (A1) is equal to ‘Yes’. If the expression is true, the result is 1; otherwise the result is 0.
Formulas and built-in functions are also useful when you calculate values to populate new columns.
Go to: Excel functions (alphabetical) [2]
Check this Microsoft Support page for a full list of functions in Excel.

Validating

Validation often involves checking the distribution of data, and you’ve already learned how to do this by calculating the five-number summary. In the next activity, we’ll see how to create charts in Excel.

Publishing

When you save your Excel workbooks, you can preserve formatting, calculations, and information across multiple sheets. It might be useful to convert your data to a text file, but this depends on where your data is going next.
You can convert an Excel worksheet to a text file by using the Save As command.
  1. Go to File > Save As.
  2. Click Browse.
  3. In the Save As dialog box, under the Save as type box, choose the text file format for the worksheet (eg, CSV (Comma delimited)).
  4. Go to the location where you want to save the new text file, and then click Save.
  5. A dialog box appears, reminding you that only the current worksheet will be saved to the new file. Check you have the right worksheet open and then click OK. You can save other worksheets as separate text files by repeating this procedure for each worksheet.
Go to: Save a workbook in another file format [3]
Check this Microsoft Support page for more information about export options and file types.

References

  1. Filter for unique values or remove duplicate values [Internet]. Microsoft Support. Available from: https://support.microsoft.com/en-us/office/filter-for-unique-values-or-remove-duplicate-values-ccf664b0-81d6-449b-bbe1-8daaec1e83c2?ui=en-us&rs=en-us&ad=us
  2. Excel functions (alphabetical) [Internet]. Microsoft Support. Available from: https://support.microsoft.com/en-us/office/excel-functions-alphabetical-b3944572-255d-4efb-bb96-c6d90033e188?ui=en-us&rs=en-us&ad=us
  3. Excel functions (alphabetical) [Internet]. Microsoft Support. Available from: https://support.microsoft.com/en-us/office/excel-functions-alphabetical-b3944572-255d-4efb-bb96-c6d90033e188?ui=en-us&rs=en-us&ad=us
This article is from the free online

Excel for Data Analysis

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education

close