Skip to Main Content
Evans Library of Florida Tech
Research Guides
Topic Guides
Data Cleaning
Cleaning with Python
Search this Guide
Search
Data Cleaning
An introductory guide to data cleaning concepts, tools, and methods.
Home
Data Cleaning Basics
Cleaning Tools
Cleaning with Python
Cleaning with R
Learning Resources
Common Functions used in Python
Pandas Functions:
pd.read_csv()
Read a comma-separated value file (.csv) into Python as a DataFrame.
pd.melt()
Spread a column so that values stored in a single column can be made into columns as well.
pd.pivot_table()
Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame
pd.concat()
Concatenate pandas objects along a particular axis.
pd.DataFrame.merge()
Merge DataFrame objects by performing a column-column join similar to database-style join commands.
pd.notnull()
Check a Pandas object for missing values.
Regex Functions:
re.compile()
Compile a regular expression pattern into a Python object.
re.findall()
Return all non-overlapping matches of a pattern in a string, as a list of strings.
Commonly used Python Methods:
.head()
Return the first n rows in an object. The n defaults to 5.
.tail()
Return the last n rows in an object. As with .head(), the n defaults to 5.
.info()
Return information about a data frame, including the index and column data types, non-null values, and memory usage.
.value_counts()
Return an object containing counts of unique values for chosen data.
.describe()
Provides summary statistical information about chosen data.
.split()
Split each string in the chosen values based on a pattern.
.astype()
Coerce a Pandas object to a specific data type.
.apply()
Apply a function to each row or column in a data frame.
.replace()
Replace values passed to to_replace argument with specified values.
.drop_duplicates()
Return a data frame where duplicate rows have been removed from specified columns.
.fillna()
Fill in NA / NaN values using a specified method.
<<
Previous:
Cleaning Tools
Next:
Cleaning with R >>