Skip to Main Content
Florida Tech Evans Library Logo

Data Cleaning

An introductory guide to data cleaning concepts, tools, and methods.

Common Functions Used in R

R has many packages useful for data cleaning. The majority of these packages are a part of the TIdyverse Library. The Tidyverse library packages contain many functions that can help with data cleaning tasks. Some functions have a lower barrier to entry and some are used much more frequently than others. Here is a list with information breaking down a few of the most popular data cleaning functions used in R. 

Note: If you load the Tidyverse library is loaded into R all at once with library(Tidyverse), all of the packages will be loaded and you will not have to remember which function belongs to which package. 

Exploratory Functions: 

 

Functions for Changing the Format of a Data Set: 

 

Functions for Coercing Data Types: 

Note: The prefix is. can be used in place of "as." to check the whether the object passed to the function is of that specific data type. 

***There are other options for coercing data types. See the full documentation for more information. 

 

Functions for Coercing Date-Time data: 

Note: the date and time values of ymd() and hms() will parse dates in the order the letters are entered. The letters may appear in any order, and exclude letters as well, so my() and md() are acceptable for ymd(). Likewise, hm() and ms() are acceptable for hms(). Any other combination or ordering of letters is also acceptable, as long as the string passed as an argument also follows that format. 

 

Functions for Working with Strings: