End Activity Session (Day 4)
Code
= 'https://ucsb.box.com/shared/static/dnnu59jsnkymup6o8aaovdywrtxiy3a9.csv' url
Reading, Filtering, and Visualizing Data in Pandas
This end-of-day session is focused on using pandas for loading, visualizing, and analyzing marine microplastics data. This session is designed to help you become more comfortable with the pandas library, equipping you with the skills needed to perform data analysis effectively.
The National Oceanic and Atmospheric Administration, via its National Centers for Environmental Information has an entire section related to marine microplastics β that is, microplastics found in water β at https://www.ncei.noaa.gov/products/microplastics.
We will be working with a recent download of the entire marine microplastics dataset. The url for this data is located here:
Objective: Write your own notebook that contains a simple DataFrame exploration as well as some basic grouping, filtering, and aggregation, and visualization⦠all within the pandas library.
Objective: Learn to load data into a pandas DataFrame and display the first few records.
df
from the provided URL into a pandas DataFrame.Iβve already taken a look at this data set and noticed there was a column with sample date called Date
. We can use the parse_date
option of the read_csv()
function to convert values in the Dates
column of the csv into datetime objects in pandas while reading the file.
Oceans
~
to invert built-in function results
The ~
operator inverts a list of Boolean values (switches True
to False
and vice versa).
This operator isnβt useful for most selection operations because you can just use ==
and !=
to invert selection criteria. However, the ~
operator becomes very handy when there is a need to invert the results of a built-in function.
For example, the use of the ~
operator and isnull()
combine to create an efficient way to filter dataframes where the value of a df[column
] is not isnull()
:
Note that the results of the built-in function - df['column'].isnull()
need to be wrapped in ( )
for the ~
operator to work properly.
oceans
that groups the data in df according to the value of the Oceans column.pieces/m3
df3
) from your filtered dataframe (df2
) that contains only rows where Measurement is greater than zero.Using .copy() when filtering a dataframe ensures that youβre working with a new DataFrame, not a view of the original. This is especially important when youβre filtering data and then modifying the result, which is common in data science workflows.
df3
that contains the log10 of Measurements.The numpy library has a log10()
function that you will find useful for this step!
df3
π Congratulations, youβre officially doing python data science! π
Be sure to save your notebook and add comments and reflections at the end of your notebook before heading out for the day.
End Activity Session (Day 4)