Day 5: Tasks & Activities

A cartoon panda is eating a banana Midjourney5

Introduction

In this activity, you’ll explore the “Banana Index” dataset, which compares the environmental impact of various food products to that of a banana. These data were developed by the Economist magazine in 2023 and they posted their data to github for us to use. This exercise will help you practice working with pandas DataFrames, data manipulation, and visualization skills while learning about the environmental impacts of food production.

Reference:

The Economist and Solstad, S., 2023. The Economist’s Banana index. First published in the article “A different way to measure the climate impact of food”, The Economist, April 11, 2023.

Setup

First, let’s import the necessary libraries and load the data:

Code

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data
url = "https://github.com/TheEconomist/banana-index-data/releases/download/1.0/bananaindex.csv"
df = pd.read_csv(url)

Code

# Display the first few rows:
print(df.head())

          entity  year  emissions_kg  emissions_1000kcal  \
0            Ale  2022      0.488690            0.317338   
1  Almond butter  2022      0.387011            0.067265   
2    Almond milk  2022      0.655888            2.222230   
3        Almonds  2022      0.602368            0.105029   
4    Apple juice  2022      0.458378            0.955184   

   emissions_100g_protein  emissions_100g_fat  land_use_kg  land_use_1000kcal  \
0                0.878525            2.424209     0.811485           0.601152   
1                0.207599            0.079103     7.683045           1.296870   
2               13.595512            4.057470     1.370106           2.675063   
3                0.328335            0.119361     8.230927           1.423376   
4               29.152212           19.754980     0.660629           1.382839   

   Land use per 100 grams of protein  Land use per 100 grams of fat  \
0                           1.577687                       3.065766   
1                           3.608433                       1.495297   
2                          12.687839                       4.600530   
3                           4.261040                       1.610136   
4                          43.232158                      26.246743   

   Bananas index (kg)  Bananas index (1000 kcalories)  \
0            0.559558                        0.362340   
1            0.443134                        0.076804   
2            0.751002                        2.537364   
3            0.689721                        0.119923   
4            0.524851                        1.090638   

   Bananas index (100g protein)  Chart?  type       Banana values  Unnamed: 16  
0                      0.113771    True     1              Per KG     0.873350  
1                      0.026885    True     1  Per 1000 kcalories     0.875803  
2                      1.760651    True     1    Per 100g protein     7.721869  
3                      0.042520    True     1                 NaN          NaN  
4                      3.775280    True     1                 NaN          NaN

Code

# Display the dataframe info:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 160 entries, 0 to 159
Data columns (total 17 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   entity                             160 non-null    object 
 1   year                               160 non-null    int64  
 2   emissions_kg                       160 non-null    float64
 3   emissions_1000kcal                 160 non-null    float64
 4   emissions_100g_protein             158 non-null    float64
 5   emissions_100g_fat                 160 non-null    float64
 6   land_use_kg                        160 non-null    float64
 7   land_use_1000kcal                  160 non-null    float64
 8   Land use per 100 grams of protein  158 non-null    float64
 9   Land use per 100 grams of fat      160 non-null    float64
 10  Bananas index (kg)                 160 non-null    float64
 11  Bananas index (1000 kcalories)     160 non-null    float64
 12  Bananas index (100g protein)       160 non-null    float64
 13  Chart?                             160 non-null    bool   
 14  type                               160 non-null    int64  
 15  Banana values                      3 non-null      object 
 16  Unnamed: 16                        3 non-null      float64
dtypes: bool(1), float64(12), int64(2), object(2)
memory usage: 20.3+ KB
None

Tasks

1. Data Preparation

Set the index of the DataFrame to be the ‘entity’ column.
Remove the ‘year’, ‘Banana values’, ‘type’, ‘Unnamed: 16’, and ‘Chart?’ columns.
Display the first few rows of the modified DataFrame.

2. Exploring Banana Scores

For each of the pre-computed banana score columns (kg, calories, and protein), show the 10 highest-scoring food products.
Edit the function below so that is returns the top 10 scores for a given column:

Code

def return_top_ten(df, column):
    """ Return the top 10 values of a column """
    pass

return values from functions

The pass in our function is a temporary statement that allows the function to execute but not do anything. You need to remove the pass statement and add a return statement that provides the necessary functionality. For example, if the function was supposed to add 2 to every value of a column, you’d delete the pass statement and add return df[column] * 2

Use your function to display the results for each of the three Banana index columns.

3. Common High-Scoring Foods

Identify which foods, if any, appear in the top 10 for all three banana score lists (kg, calories, and protein).

Unpacking iterables using the * operator

Python sets allow you to quickly determine intersections: in_all_three = set.intersection(seta, setb, setc), or you can use the * operator to unpack a list of sets directly: in_all_three = set.intersection(*list_of_sets)

4. Land Use Analysis

Create a new column named ‘Bananas index (land use 1000 kcal)’, calculating that food item’s use of land for every 1,000 kcal in comparison to a banana.

Tip

The data on land_use_1000kcal for bananas is found as the entry for this column in the Bananas row.

Display the 10 foods with the highest land use score.
Compare this list with the previous top 10 lists. Are there any common foods?

5. Cheese Analysis

Identify the type of cheese with the highest banana score per 1,000 kcal. How does it compare to other cheeses in the dataset?

Conclusion

Summarize your findings from this analysis. What insights have you gained about the environmental impact of different foods? What aspects of Pandas do you want to practice more?

End Activity Session (Day 5)