End Coding Colab Session (Day 4)
import pandas as pd
import numpy as np
= 'https://bit.ly/messy_csv'
url = pd.read_csv(url) messy_df
Cleaning Pandas
In this collaborative coding exercise, you will work together and apply your new data cleaning skills to a simple dataframe that has a suprising number of problems.
Hereβs our course cheatsheet on cleaning data:
Feel free to refer to this cheatsheet throughout the exercise if you need a quick reminder about syntax or functionality.
First, letβs import the necessary libraries and load an example messy dataframe.
Letβs apply what weβve learned so far to clean the messy environmental dataset.
Your task is to clean this dataframe by
Removing duplicates
Handling missing values (either fill or dropna to remove rows with missing data)
Ensuring consistent data types (dates, strings)
Formatting the βsiteβ column for consistency
Making sure all column names are lower case, without whitespace.
Try to implement these steps using the techniques weβve learned.
End Coding Colab Session (Day 4)