End interactive session 3C
Code
import numpy as np
import pandas as pd
Arrays and Series
An illustration depicting the idea of arrays and series. MidJourney 5
Before we begin our interactive session, please follow these steps to set up your Jupyter Notebook:
+
button in the top left cornerPython 3.10.0
from the Notebook optionsUntitled.ipynb
tabSession_XY_Topic.ipynb
(Replace X with the day number and Y with the session number)Remember to save your work frequently by clicking the save icon or using the keyboard shortcut (Ctrl+S or Cmd+S).
In this interactive session, weβll explore the most essential aspects of NumPy arrays and Pandas Series. These fundamental data structures are crucial for data manipulation and analysis in Python. Weβll focus on the key concepts that are most relevant for beginning data scientists. Weβre also going to assume that you will primarily work with Pandas DataFrames and Series, so we wonβt spend too much time on the details of NumPy arrays.
Letβs start by importing the necessary libraries:
NumPy arrays are the building blocks for many data structures in Python, including Pandas Series and DataFrames. Letβs explore their basic properties and operations.
Array + 2: [3 4 5 6 7]
Array * 2: [ 2 4 6 8 10]
Mean: 3.0
Sum: 15
Now itβs your turn! Create a NumPy array of your favorite numbers and calculate its standard deviation (np.std()
).
Pandas Series are one-dimensional labeled arrays built on top of NumPy arrays. Theyβre like a column in a spreadsheet or a single column of a DataFrame.
Series from list:
0 1
1 2
2 3
3 4
4 5
dtype: int64
Series with custom index:
a 10
b 20
c 30
d 40
e 50
dtype: int64
Element at index 'c': 30
First three elements:
a 10
b 20
c 30
dtype: int64
Series + 5:
a 15
b 25
c 35
d 45
e 55
dtype: int64
Mean: 30.0
Median: 30.0
Your turn! Create a Pandas Series representing daily temperatures for a week. Use the days of the week as the index. Then, calculate and print the maximum temperature.
Letβs explore some key differences and similarities between NumPy arrays and Pandas Series.
# Create a NumPy array and a Pandas Series with the same data
np_arr = np.array([1, 2, 3, 4, 5])
pd_series = pd.Series([1, 2, 3, 4, 5])
print("NumPy array:", np_arr)
print("Pandas Series:\n", pd_series)
# Demonstrate label-based indexing in Pandas Series
pd_series.index = ['a', 'b', 'c', 'd', 'e']
print("\nPandas Series with custom index:\n", pd_series)
print("Value at index 'c':", pd_series['c'])
# Show that NumPy operations work on Pandas Series
print("\nSquare root of Pandas Series:\n", np.sqrt(pd_series))
NumPy array: [1 2 3 4 5]
Pandas Series:
0 1
1 2
2 3
3 4
4 5
dtype: int64
Pandas Series with custom index:
a 1
b 2
c 3
d 4
e 5
dtype: int64
Value at index 'c': 3
Square root of Pandas Series:
a 1.000000
b 1.414214
c 1.732051
d 2.000000
e 2.236068
dtype: float64
Now itβs your turn! Create a NumPy array and a Pandas Series, both containing the same data (use any numbers you like). Then, calculate the mean of both and compare the results. Are they the same? Why or why not?
In this session, weβve covered the essential aspects of NumPy arrays and Pandas Series. Remember:
As you continue your journey in data science, youβll find these structures invaluable for efficient data manipulation and analysis.
End interactive session 3C