NumPy Cheatsheet for Pandas Users (Beginner-Friendly)
Importing NumPy
import numpy as np
Creating NumPy Arrays
From Python Lists
= np.array([1, 2, 3, 4, 5]) arr
From Pandas Series or DataFrame
# From Series
= pd.Series([1, 2, 3, 4, 5])
s = s.to_numpy()
arr
# From DataFrame
= pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df = df.to_numpy() arr
Basic Array Operations
Element-wise Operations
= np.array([1, 2, 3])
arr1 = np.array([4, 5, 6])
arr2
# Addition
= arr1 + arr2
result
# Multiplication
= arr1 * arr2
result
# Division
= arr1 / arr2 result
Simple Mathematical Functions
# Square root
= np.sqrt(arr)
sqrt_arr
# Exponential
= np.exp(arr)
exp_arr
# Absolute value
= np.abs(arr) abs_arr
Statistical Operations
Basic Statistics
# Mean
= np.mean(arr)
mean
# Median
= np.median(arr)
median
# Standard deviation
= np.std(arr) std
Min, Max, and Sum
# Minimum
= np.min(arr)
min_val
# Maximum
= np.max(arr)
max_val
# Sum
= np.sum(arr) total
Array Manipulation
Reshaping
= np.array([1, 2, 3, 4, 5, 6])
arr = arr.reshape(2, 3) reshaped
Transposing
= arr.T transposed
Flattening
= arr.flatten() flattened
Random Number Generation
Random Sampling
# Generate 5 random numbers between 0 and 1
= np.random.rand(5)
random_uniform
# Generate 5 random integers between 1 and 10
= np.random.randint(1, 11, 5) random_integers
Setting Random Seed
42) # For reproducibility np.random.seed(
Working with Missing Data
Handling NaN Values
# Check for NaN
np.isnan(arr)
# Replace NaN with a value
=0.0) np.nan_to_num(arr, nan
Useful NumPy Functions for Pandas Users
unique() and value_counts()
# Get unique values
= np.unique(arr)
unique_values
# Get value counts (similar to pandas value_counts())
= np.unique(arr, return_counts=True) values, counts
where()
# Similar to pandas' where, but returns an array
= np.where(condition, x, y) result
concatenate()
# Concatenate arrays (similar to pd.concat())
= np.concatenate([arr1, arr2, arr3]) concatenated
When to Use NumPy with Pandas
- Performance: For large datasets, NumPy operations can be faster than pandas.
- Memory efficiency: NumPy arrays use less memory than pandas objects.
- Specific mathematical operations: Some mathematical operations are more straightforward in NumPy.
- Interfacing with other libraries: Many scientific Python libraries use NumPy arrays.
Remember, while these NumPy operations are useful, many have direct equivalents in pandas that work on Series and DataFrames. Always consider whether you can perform the operation directly in pandas before converting to NumPy arrays.