End Activity Session (Day 3)
Code
import pandas as pd
import numpy as np
Using Pandas Series for Data Analysis
A cartoon panda looking over a year’s worth of monthly class exams. The panda is doing great; A+! (Midjourney5)[https://www.midjourney.com/jobs/6b63c3ca-c64d-41b8-a791-7e4b2594c781?index=0]
In this end-of-day activity, we’ll practice using Pandas Series for data analysis and learn how to use NumPy’s random number generator. We’ll create a series of test scores using random numbers and explore how to make our random number generation reproducible.
First, let’s import the necessary libraries and set up our environment.
NumPy provides a powerful random number generation tool called Generator
. Let’s explore how to use it and why it’s important in data science.
We can create a random number generator object like this:
This creates a generator with a random seed. Each time you run your code, you’ll get different random numbers.
In data science, it’s often crucial to be able to reproduce our results. We can do this by setting a seed for our random number generator. Here’s how:
Now, every time we use this rng
object to generate random numbers, we’ll get the same sequence of “random” numbers. This is extremely useful for debugging, sharing results, and ensuring consistency in our analyses.
scores
that contains 10 elements representing monthly test scores. We’ll use random integers between 70 and 100 to generate the monthly scores, and set the index to be the month names from September to June:Now that we have our test scores series, let’s analyze the data by answering the following questions:
Calculate the mean of all scores in the series.
Calculate the mean of the first five months’ scores.
Calculate the mean of the last five months’ scores.
Compare the average scores from the first and second half of the year.
To demonstrate the importance of seeding, try creating two series with different random number generators:
True
Now try creating two series with random number generators that have different seeds:
In this activity, you practiced creating and analyzing a Pandas Series representing test scores. You also learned about NumPy’s random number generator and the importance of seeding for reproducibility in data science. These skills are fundamental in data analysis and will be useful in more complex data science workflows.
Remember to document your code and results clearly in your Jupyter Notebook. Good luck!
End Activity Session (Day 3)