In this exercise, youβll analyze Eurovision Song Contest data using pandas. Youβll practice various data manipulation techniques and explore trends in the contestβs history.
Setup
First, import the necessary libraries and load the dataset:
Code
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns# Load the dataseturl ="https://github.com/Spijkervet/eurovision-dataset/releases/download/2020.0/contestants.csv"eurovision_df = pd.read_csv(url)
Task 1: Data Exploration and Cleaning
Display the first few rows of the dataset.
Check the data types of each column.
Identify and handle any missing values.
Convert the βyearβ column to datetime type.
Task 2: Filtering and Transformation
Create a new dataframe containing only data from 1990 onwards
Important
Use .copy() to make sure you create a new dataframe and not just a view.
Calculate the difference between final points and semi-final points for each entry and make a histogram of these values using the builtin dataframe .hist() command.
Task 3: Sorting and Aggregation
Find the top 10 countries with the most Eurovision appearances (use the entire dataset for this calculation)
Calculate the average final points for each country across all years. Make a simple bar plot of these data.
Note
Use value_counts() for counting appearances and groupby() for calculating averages.
Task 4: Grouping and Analysis
Determine the country with the highest average final points for each decade.
Hint: Grouping Years in Pandas
When working with time series data, itβs often useful to group years into larger intervals like decades, 5-year periods, etc. Hereβs a general approach using pandas:
For decades (10-year intervals):
df['decade'] = df['year'].dt.year //10*10
For any N-year interval:
N =5# Change this to your desired interval (e.g., 2, 5, 10, 20)df['year_group'] = df['year'].dt.year // N * N
Remember: - // is integer division (rounds down) - Multiplying by the interval after division ensures the start year of each group
These methods create a new column that you can use with groupby() for aggregations across your chosen time intervals.
Task 5: Joining Data
Read in a new dataframe that contains population data stored at this url:
Code
population_url ='https://bit.ly/euro_pop'
Join this data with the Eurovision dataframe.
Warning
Ensure that country names match exactly between the two dataframes before joining.
Calculate total entries per capita by country.
Substeps:
3a. Create a new dataframe containing the counts of entries for each county (use value_counts)
3b. Merge the dataframe of counts of entries for each country with the population dataframe.
3c. Calculate entries per million population (using entries per million to make the numbers easier to work with)
3d. Sort the results by entries per capita
3e. Print the top 10 values
Task 6: Time Series Analysis
Plot the trend of maximum final points awarded over the years.
Identify any significant changes in the scoring system based on this trend.
(This step simply requires visual interpretation of the plot, but perhaps you could explore if there are actual rules changes underlying observed patterns using google)
Task 7: Choose your own analysis!
Come up with your own analysis of the Eurovision data that reveals some pattern across the data or through time. Feel free to discuss your ideas with others; often this leads to new ideas or refinement of ones you are already working on.
Reflection
Now that youβve completed the Eurovision data analysis exercise, itβs time to reflect on your experience. Add a new markdown cell to your notebook and answer the following questions:
Which tasks did you feel most comfortable with? Why do you think these were easier for you?
Which tasks did you find most challenging? What made these tasks difficult?
Are there any pandas commands or concepts that youβd like to explore further? List a few and briefly explain why youβre interested in them.
How do you think the skills you practiced in this exercise could be applied to other datasets or real-world problems?
What was the most interesting insight you gained about the Eurovision contest from this analysis?
Note
Remember, reflection is a crucial part of the learning process. It helps you identify areas for improvement and reinforces what youβve learned.
Remember to document your code, explain your reasoning, and interpret the results of your analysis throughout the exercise.