Skip main navigation

Excel Functions: How to Wrangle Data with Spreadsheets

This article explores some common functions of Excel in data wrangling and examples of how you might use them.

There’s a variety of tools and techniques for importing and processing data. Traditionally, and for relatively small, simple data sets, we do this manually with text editors or spreadsheets. For larger volumes of data or more complex scenarios, we can use programmatic solutions. Python, SQL and R are often used for this type of data wrangling and have built-in functions (libraries) to speed up the process or perform common tasks.

Here, we’re going to focus on how to wrangle data in Excel. In some cases, Excel might not be able to convert columns into the format you want. In this case, you can use a series of functions to manipulate your data. The table below shows some common functions and examples of how you might use them.

 

Function Example Output
CLEAN
Removes all non-printable characters from text
=CLEAN(CHAR(9)&”Monthly report”) ‘Monthly report’
CONCAT
Returns combined text from multiple ranges and / or strings without a delimiter
=CONCAT(“hello”, “world”) ‘helloworld’
FIXED
Returns a number rounded to the specified number of decimal places. Note: If you omit the number of decimals, FIXED will round to 2 decimal places by default.
=FIXED(1.1234,2) 1.12
LEFT
Returns the left-most character(s) of a string
=LEFT(“hello”)
=LEFT(“hello”, 4)
‘h’
‘hell’
LOWER
Returns a text in lowercase
=LOWER(“Hello, World!”) ‘hello, world!”
MID
Returns the character(s) between specified start and end points
=MID(“Hello, World!”,8,12) ‘World’
PROPER
Returns text with the first letter capitalised, as well as any other letters that follow a non-letter character
=PROPER(“hello, world!”) ‘Hello, World!”
REPLACE
Returns a text string with replacement text between points as specified
=REPLACE(“Hello, World!”,8,12, “everyone”) “Hello, everyone!
RIGHT
Returns the right-most character(s) in a string
=RIGHT(“hello”)
=RIGHT(“hello”, 4)
‘o’
‘ello’
SUBSTITUTE
Returns text with one or more text substitutions
=SUBSTITUTE(“Hello, World!”, “World”, “everyone”)
=SUBSTITUTE (“Hello, World!”, “I”, “L”)
‘Hello, everyone!’
‘HeLLo, WorLd!’
TEXTJOIN
Returns a string combining text from multiple ranges and / or strings including a delimiter
=TEXTJOIN(“, “, TRUE, “Hello”, “World!”) ‘Hello, World!’
TRIM
Removes all spaces other than those between words
=TRIM(“ Hello, World! “) ‘Hello, World!’
UPPER
Returns text in upper case
=UPPER(“Hello, World!”) ‘HELLO, WORLD!’
VALUE
Returns a number from a text string representing a number
=VALUE(“$1000”) 1000

 

When you need to tackle duplicate values, Excel can find unique values and remove duplicate rows. These are two closely related procedures.

 

To filter for unique rows, click Data > Sort & Filter > Advanced and select the appropriate options. Note: This only hides duplicate values from view.

 

Screenshot of the ‘Sort & Filter’ section of the menu ribbon in Excel

 

To remove duplicate values entirely, use the Remove Duplicate functionality. Find this under Data > Data Tools > Remove Duplicates.

 

Screenshot of the ‘Remove Duplicates’ button in the Excel menu ribbon

 

Finding inconsistent values, or partial duplicates, is a little harder. A good technique is to highlight cells using conditional formatting. This allows you to specify rules for what is highlighted. To highlight duplicates using conditional formatting, select the range of cells and click Style > Conditional Formatting > Highlight Cells Rules > Duplicate Values.

 

Screenshot of the ‘Conditional Formatting’ menu in Excel with ‘Duplicate Values’ highlighted.

 

You can combine multiple conditions and different comparisons (eg, ‘contains’) to find some partial matches.

 

 

Note: The comparison of duplicate values depends on what appears in the cell – not the underlying value stored in the cell. For example, if you have the same date value in different cells, one formatted as ‘3/8/2006’ and the other as ‘Mar 8, 2006’, the values are unique.

 

 

Go to: Filter for unique values or remove duplicate values [1]

 

This Microsoft Support page has more about removing, filtering, or highlighting duplicates.

 

We know that if there’s missing data, we often need to manually remove blank rows, or rows with blanks. That can be tedious, but you can speed up the process using the Go To Special feature.

 

To remove blank rows and rows with blank cells:

 

 

    1. Select your range of data

 

    1. Click Home > Find & Select > Go To Special

 

    1. In the dialog box, select Blanks and then press OK. Excel will select all blanks in your selection

 

    1. To delete these rows, click Home > Delete > Delete Sheet Rows

 

 

Enriching

 

Excel doesn’t provide specific functionality to create dummy variables from data, but you can use the Find and Replace functionality on a selection, or create an extra column to hold your coded categorical data. If you create a new column, use a formula such as =IF(A1=’Yes’, 1, 0) to check whether the value in a cell (A1) is equal to ‘Yes’. If the expression is true, the result is 1; otherwise the result is 0.

Formulas and built-in functions are also useful when you calculate values to populate new columns.

Go to: Excel functions (alphabetical) [2]

Check this Microsoft Support page for a full list of functions in Excel.

Validating

Validation often involves checking the distribution of data, and you’ve already learned how to do this by calculating the five-number summary. In the next activity, we’ll see how to create charts in Excel.

Publishing

When you save your Excel workbooks, you can preserve formatting, calculations, and information across multiple sheets. It might be useful to convert your data to a text file, but this depends on where your data is going next.

You can convert an Excel worksheet to a text file by using the Save As command.

  1. Go to File > Save As.
  2. Click Browse.
  3. In the Save As dialog box, under the Save as type box, choose the text file format for the worksheet (eg, CSV (Comma delimited)).
  4. Go to the location where you want to save the new text file, and then click Save.
  5. A dialog box appears, reminding you that only the current worksheet will be saved to the new file. Check you have the right worksheet open and then click OK. You can save other worksheets as separate text files by repeating this procedure for each worksheet.

Go to: Save a workbook in another file format [3]

Check this Microsoft Support page for more information about export options and file types.

References

  1. Filter for unique values or remove duplicate values [Internet]. Microsoft Support. Available from: https://support.microsoft.com/en-us/office/filter-for-unique-values-or-remove-duplicate-values-ccf664b0-81d6-449b-bbe1-8daaec1e83c2?ui=en-us&rs=en-us&ad=us
  2. Excel functions (alphabetical) [Internet]. Microsoft Support. Available from: https://support.microsoft.com/en-us/office/excel-functions-alphabetical-b3944572-255d-4efb-bb96-c6d90033e188?ui=en-us&rs=en-us&ad=us
  3. Excel functions (alphabetical) [Internet]. Microsoft Support. Available from: https://support.microsoft.com/en-us/office/excel-functions-alphabetical-b3944572-255d-4efb-bb96-c6d90033e188?ui=en-us&rs=en-us&ad=us
This article is from the free online

Excel for Data Analysis

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now