Skip to Main Content
UNH Library home

Introduction to Excel and Spreadsheets

Cleaning Data

To prepare data for later analysis, it is important to have a clean data table.  Depending on the origin of the data, you may need to do some of the following steps to ensure that the data are as complete and consistent as possible:

Data Structure

Even when you have a clean data table, the data structure may not be precisely right for the kind of analysis you want to do. You may need to:

  • Split one column into multiple
  • Calculate new columns
  • Reshape a table from wide format to long
    Say you have a dataset where you have a measurement that is taken every year for many years, like CO2 Emissions. If you have one column for every year of the measurement, the table is in wide format. If, instead, you have a column that says "Year" and a column that says "CO2 Emissions," the table is in long format.


     

Some people call this kind of data arrangement tidy data. This arrangement is useful for many tools beyond just PivotTables -- Tableau, ggplot2, pandas, etc. all work well with tidy data.

Excel has many functions for extracting and combining data from columns, calculating new columns based on old columns, and even using conditional statements to tailor the output of functions. What's more important than knowing every function up front is deciding how specific your data need to be. Here are some questions you can ask yourself:

  • Will I ever need to analyze the data based on a piece of information that is currently combined with other information in a single cell?
  • Are there any other categories I could create to group my rows in a meaningful way?
  • Are the values within each column inconsistent? For example, do my numerical columns also have text in them that might cause errors when I try to perform a calculation?

When you identify something that might need to change, you can browse or search for an Excel function that will help.

Converting data from a wide format to a long format, on the other hand, is trickier to do in Excel. You may want to try another tool, like OpenRefine.

This LibGuide was adapted with permission from Duke Libraries Excel by Angela Zoss