Importing Pandas
import pandas as pdReading Data
# Read CSV file
df = pd.read_csv('filename.csv')
# Read Excel file
df = pd.read_excel('filename.xlsx')Exploring the DataFrame
# Display first few rows
df.head()
# Display basic information about the DataFrame
df.info()
# Get summary statistics
df.describe()
# Check for missing values
df.isnull().sum()Handling Missing Data
# Drop rows with any missing values
df_cleaned = df.dropna()
# Fill missing values with a specific value
df['column_name'].fillna(0, inplace=True)
# Fill missing values with the mean of the column
df['column_name'].fillna(df['column_name'].mean(), inplace=True)Removing Duplicates
# Remove duplicate rows
df_unique = df.drop_duplicates()
# Remove duplicates based on specific columns
df_unique = df.drop_duplicates(subset=['column1', 'column2'])Renaming Columns
# Rename a single column
df = df.rename(columns={'old_name': 'new_name'})
# Rename multiple columns
df = df.rename(columns={'old_name1': 'new_name1', 'old_name2': 'new_name2'})Changing Data Types
# Convert a column to float
df['column_name'] = df['column_name'].astype(float)
# Convert column to numeric type
df['column_name'] = pd.to_numeric(df['column_name'], errors='coerce')
# Convert column to datetime
df['date_column'] = pd.to_datetime(df['date_column'])
# Convert a column to string
df['column_name'] = df['column_name'].astype(`string`)Filtering Data
# Filter rows based on a condition
df_filtered = df[df['column_name'] > 5]
# Filter rows based on multiple conditions
df_filtered = df[(df['column1'] > 5) & (df['column2'] < 10)]Handling Outliers
# Remove outliers using Z-score
from scipy import stats
# Only keep data with a Z-score < 3
df_no_outliers = df[(np.abs(stats.zscore(df['column_name'])) < 3)]
# Cap outliers at a specific percentile
lower = df['column_name'].quantile(0.05)
upper = df['column_name'].quantile(0.95)
df['column_name'] = df['column_name'].clip(lower, upper)Resources for More Information
Remember, these are just some of the most common operations for cleaning DataFrames. As you become more comfortable with pandas, you’ll discover many more powerful functions and methods to help you clean and manipulate your data effectively.