Excel Functions: How to Wrangle Data with Spreadsheets
There’s a variety of tools and techniques for importing and processing data. Traditionally, and for relatively small, simple data sets, we do this manually with text editors or spreadsheets. For larger volumes of data or more complex scenarios, we can use programmatic solutions. Python, SQL and R are often used for this type of data wrangling and have built-in functions (libraries) to speed up the process or perform common tasks.
Here, we’re going to focus on how to wrangle data in Excel. In some cases, Excel might not be able to convert columns into the format you want. In this case, you can use a series of functions to manipulate your data. The table below shows some common functions and examples of how you might use them.
Function | Example | Output |
---|---|---|
CLEAN Removes all non-printable characters from text |
=CLEAN(CHAR(9)&”Monthly report”) |
‘Monthly report’ |
CONCAT Returns combined text from multiple ranges and / or strings without a delimiter |
=CONCAT(“hello”, “world”) |
‘helloworld’ |
FIXED Returns a number rounded to the specified number of decimal places. Note: If you omit the number of decimals, FIXED will round to 2 decimal places by default. |
=FIXED(1.1234,2) |
1.12 |
LEFT Returns the left-most character(s) of a string |
=LEFT(“hello”) =LEFT(“hello”, 4) |
‘h’ ‘hell’ |
LOWER Returns a text in lowercase |
=LOWER(“Hello, World!”) |
‘hello, world!” |
MID Returns the character(s) between specified start and end points |
=MID(“Hello, World!”,8,12) |
‘World’ |
PROPER Returns text with the first letter capitalised, as well as any other letters that follow a non-letter character |
=PROPER(“hello, world!”) |
‘Hello, World!” |
REPLACE Returns a text string with replacement text between points as specified |
=REPLACE(“Hello, World!”,8,12, “everyone”) |
“Hello, everyone! |
RIGHT Returns the right-most character(s) in a string |
=RIGHT(“hello”) =RIGHT(“hello”, 4) |
‘o’ ‘ello’ |
SUBSTITUTE Returns text with one or more text substitutions |
=SUBSTITUTE(“Hello, World!”, “World”, “everyone”) =SUBSTITUTE (“Hello, World!”, “I”, “L”) |
‘Hello, everyone!’ ‘HeLLo, WorLd!’ |
TEXTJOIN Returns a string combining text from multiple ranges and / or strings including a delimiter |
=TEXTJOIN(“, “, TRUE, “Hello”, “World!”) |
‘Hello, World!’ |
TRIM Removes all spaces other than those between words |
=TRIM(“ Hello, World! “) |
‘Hello, World!’ |
UPPER Returns text in upper case |
=UPPER(“Hello, World!”) |
‘HELLO, WORLD!’ |
VALUE Returns a number from a text string representing a number |
=VALUE(“$1000”) |
1000 |
When you need to tackle duplicate values, Excel can find unique values and remove duplicate rows. These are two closely related procedures.
To filter for unique rows, click Data > Sort & Filter > Advanced and select the appropriate options. Note: This only hides duplicate values from view.
To remove duplicate values entirely, use the Remove Duplicate functionality. Find this under Data > Data Tools > Remove Duplicates.
Finding inconsistent values, or partial duplicates, is a little harder. A good technique is to highlight cells using conditional formatting. This allows you to specify rules for what is highlighted. To highlight duplicates using conditional formatting, select the range of cells and click Style > Conditional Formatting > Highlight Cells Rules > Duplicate Values.
You can combine multiple conditions and different comparisons (eg, ‘contains’) to find some partial matches.
Note: The comparison of duplicate values depends on what appears in the cell – not the underlying value stored in the cell. For example, if you have the same date value in different cells, one formatted as ‘3/8/2006’ and the other as ‘Mar 8, 2006’, the values are unique.
Go to: Filter for unique values or remove duplicate values [1]
This Microsoft Support page has more about removing, filtering, or highlighting duplicates.
We know that if there’s missing data, we often need to manually remove blank rows, or rows with blanks. That can be tedious, but you can speed up the process using the Go To Special feature.
To remove blank rows and rows with blank cells:
-
- Select your range of data
-
- Click Home > Find & Select > Go To Special
-
- In the dialog box, select Blanks and then press OK. Excel will select all blanks in your selection
-
- To delete these rows, click Home > Delete > Delete Sheet Rows
Enriching
Excel doesn’t provide specific functionality to create dummy variables from data, but you can use the Find and Replace functionality on a selection, or create an extra column to hold your coded categorical data. If you create a new column, use a formula such as =IF(A1=’Yes’, 1, 0)
to check whether the value in a cell (A1) is equal to ‘Yes’. If the expression is true, the result is 1; otherwise the result is 0.
Formulas and built-in functions are also useful when you calculate values to populate new columns.
Go to: Excel functions (alphabetical) [2]
Check this Microsoft Support page for a full list of functions in Excel.
Validating
Validation often involves checking the distribution of data, and you’ve already learned how to do this by calculating the five-number summary. In the next activity, we’ll see how to create charts in Excel.
Publishing
When you save your Excel workbooks, you can preserve formatting, calculations, and information across multiple sheets. It might be useful to convert your data to a text file, but this depends on where your data is going next.
You can convert an Excel worksheet to a text file by using the Save As command.
- Go to File > Save As.
- Click Browse.
- In the Save As dialog box, under the Save as type box, choose the text file format for the worksheet (eg, CSV (Comma delimited)).
- Go to the location where you want to save the new text file, and then click Save.
- A dialog box appears, reminding you that only the current worksheet will be saved to the new file. Check you have the right worksheet open and then click OK. You can save other worksheets as separate text files by repeating this procedure for each worksheet.
Go to: Save a workbook in another file format [3]
Check this Microsoft Support page for more information about export options and file types.
References
- Filter for unique values or remove duplicate values [Internet]. Microsoft Support. Available from: https://support.microsoft.com/en-us/office/filter-for-unique-values-or-remove-duplicate-values-ccf664b0-81d6-449b-bbe1-8daaec1e83c2?ui=en-us&rs=en-us&ad=us
- Excel functions (alphabetical) [Internet]. Microsoft Support. Available from: https://support.microsoft.com/en-us/office/excel-functions-alphabetical-b3944572-255d-4efb-bb96-c6d90033e188?ui=en-us&rs=en-us&ad=us
- Excel functions (alphabetical) [Internet]. Microsoft Support. Available from: https://support.microsoft.com/en-us/office/excel-functions-alphabetical-b3944572-255d-4efb-bb96-c6d90033e188?ui=en-us&rs=en-us&ad=us
Reach your personal and professional goals
Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.
Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.
Register to receive updates
-
Create an account to receive our newsletter, course recommendations and promotions.
Register for free