import pandas as pd
import numpy as np
A cartoon panda is in charge of imports and exports at a busy seaport MidJourney 5
Overview
In this session, we will be exploring data import using the read_csv()
function in pandas. Live coding is a great way to learn programming as it allows you to see the process of writing code in real-time, including how to deal with unexpected issues and debug errors.
Objectives
- Understand the fundamentals of flow control in Python.
- Use
read_csv()
options to handle different .csv file structures. - Learn how to parse dates and handle missing data during import.
- Learn how to filter columns and handle large files.
Develop the ability to troubleshoot and debug in a live setting.
Getting Started
To get the most out of this session, please follow these guidelines:
Prepare Your Environment: - Log into our server and start JupyterLab. - Open a new Jupyter notebook where you can write your own code as we go along. - Make sure to name the notebook something informative so you can refer back to it later.
Step 1: Create a New Notebook
- Open Jupyter Lab or Jupyter Notebook.
- Create a new Python notebook.
- Rename your notebook to
pd_read_csv.ipynb
.
Step 2: Import Required Libraries
In the first cell of your notebook, import the necessary libraries:
Step 3: Set Up Data URLs
To ensure weβre all working with the same data, copy and paste the following URLs into a new code cell and run the cell (SHIFT-ENTER):
# URLs for different CSV files we'll be using
= 'https://bit.ly/eds217-basic'
url_basic = 'https://bit.ly/eds217-missing'
url_missing = 'https://bit.ly/eds217-dates'
url_dates = 'https://bit.ly/eds217-noheader'
url_no_header = 'https://bit.ly/eds217-tabs'
url_tsv = 'https://bit.ly/eds217-large' url_large
Step 4: Prepare Markdown Cells for Notes
Create several markdown cells throughout your notebook to take notes during the session. Here are some suggested headers:
- Basic Usage and Column Selection
- Handling Missing Data
- Parsing Dates
- Working with Files Without Headers
- Working with Tab-Separated Values (TSV) Files
- Handling Large Files: Reading a Subset of Data
Step 5: Create Code Cells for Each Topic
Under each markdown header, create empty code cells where youβll write and execute code during the live session.
Step 6: Final Preparations
- Ensure you have a stable internet connection to access the CSV files.
- Have the Pandas documentation page open in a separate tab for quick reference: https://pandas.pydata.org/docs/
Ready to Go!
Youβre now set up and ready to follow along with the live coding session. Remember to actively code along and take notes in your markdown cells. Donβt hesitate to ask questions during the session!
Happy coding!
Session Format
Introduction
- Brief discussion about the topic and its importance in data science.
Demonstration
- I will demonstrate code examples live. Follow along and write the code into your own Jupyter notebook.
Practice
- You will have the opportunity to try exercises on your own to apply what youβve learned.
Q&A
- We will have a Q&A session at the end where you can ask specific questions about the code, concepts, or issues encountered during the session.
After the Session
Review your notes and try to replicate the exercises on your own.
Experiment with the code by modifying parameters or adding new features to deepen your understanding.
Check out our class read_csv() cheatsheet.