End Activity Session (Day 5)
Code
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the data
= "https://github.com/TheEconomist/banana-index-data/releases/download/1.0/bananaindex.csv"
url = pd.read_csv(url) df
Exploring the Banana Index: Environmental Impact of Food Production
A cartoon panda is eating a banana Midjourney5
In this activity, youβll explore the βBanana Indexβ dataset, which compares the environmental impact of various food products to that of a banana. These data were developed by the Economist magazine in 2023 and they posted their data to github for us to use. This exercise will help you practice working with pandas DataFrames, data manipulation, and visualization skills while learning about the environmental impacts of food production.
The Economist and Solstad, S., 2023. The Economistβs Banana index. First published in the article βA different way to measure the climate impact of foodβ, The Economist, April 11, 2023.
First, letβs import the necessary libraries and load the data:
entity year emissions_kg emissions_1000kcal \
0 Ale 2022 0.488690 0.317338
1 Almond butter 2022 0.387011 0.067265
2 Almond milk 2022 0.655888 2.222230
3 Almonds 2022 0.602368 0.105029
4 Apple juice 2022 0.458378 0.955184
emissions_100g_protein emissions_100g_fat land_use_kg land_use_1000kcal \
0 0.878525 2.424209 0.811485 0.601152
1 0.207599 0.079103 7.683045 1.296870
2 13.595512 4.057470 1.370106 2.675063
3 0.328335 0.119361 8.230927 1.423376
4 29.152212 19.754980 0.660629 1.382839
Land use per 100 grams of protein Land use per 100 grams of fat \
0 1.577687 3.065766
1 3.608433 1.495297
2 12.687839 4.600530
3 4.261040 1.610136
4 43.232158 26.246743
Bananas index (kg) Bananas index (1000 kcalories) \
0 0.559558 0.362340
1 0.443134 0.076804
2 0.751002 2.537364
3 0.689721 0.119923
4 0.524851 1.090638
Bananas index (100g protein) Chart? type Banana values Unnamed: 16
0 0.113771 True 1 Per KG 0.873350
1 0.026885 True 1 Per 1000 kcalories 0.875803
2 1.760651 True 1 Per 100g protein 7.721869
3 0.042520 True 1 NaN NaN
4 3.775280 True 1 NaN NaN
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 160 entries, 0 to 159
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 entity 160 non-null object
1 year 160 non-null int64
2 emissions_kg 160 non-null float64
3 emissions_1000kcal 160 non-null float64
4 emissions_100g_protein 158 non-null float64
5 emissions_100g_fat 160 non-null float64
6 land_use_kg 160 non-null float64
7 land_use_1000kcal 160 non-null float64
8 Land use per 100 grams of protein 158 non-null float64
9 Land use per 100 grams of fat 160 non-null float64
10 Bananas index (kg) 160 non-null float64
11 Bananas index (1000 kcalories) 160 non-null float64
12 Bananas index (100g protein) 160 non-null float64
13 Chart? 160 non-null bool
14 type 160 non-null int64
15 Banana values 3 non-null object
16 Unnamed: 16 3 non-null float64
dtypes: bool(1), float64(12), int64(2), object(2)
memory usage: 20.3+ KB
None
Set the index of the DataFrame to be the βentityβ column.
Remove the βyearβ, βBanana valuesβ, βtypeβ, βUnnamed: 16β, and βChart?β columns.
Display the first few rows of the modified DataFrame.
For each of the pre-computed banana score columns (kg, calories, and protein), show the 10 highest-scoring food products.
Edit the function below so that is returns the top 10 scores for a given column:
return
values from functions
The pass
in our function is a temporary statement that allows the function to execute but not do anything. You need to remove the pass
statement and add a return
statement that provides the necessary functionality. For example, if the function was supposed to add 2 to every value of a column, youβd delete the pass
statement and add return df[column] * 2
Identify which foods, if any, appear in the top 10 for all three banana score lists (kg, calories, and protein).
*
operator
Python set
s allow you to quickly determine intersections: in_all_three = set.intersection(seta, setb, setc)
, or you can use the *
operator to unpack a list of sets directly: in_all_three = set.intersection(*list_of_sets)
The data on land_use_1000kcal
for bananas is found as the entry for this column in the Bananas
row.
Display the 10 foods with the highest land use score.
Compare this list with the previous top 10 lists. Are there any common foods?
Identify the type of cheese with the highest banana score per 1,000 kcal. How does it compare to other cheeses in the dataset?
Summarize your findings from this analysis. What insights have you gained about the environmental impact of different foods? What aspects of Pandas do you want to practice more?
End Activity Session (Day 5)