Authoring with Quarto: qmd python template workbook
tools
Published
August 4, 2022
Objective
This article serves as a template for Python only in a quarto workbook. It uses a .qmd file and a conda installation that has been configured previously (see article from the 2022-04-09). Differently from the Python + R template, this workbook doesn’t use reticulate to activate the virtual environment. To run the code in the chunks indicate the virtual environment for the workbook (in visual studio code use Ctrl+Shift+P and then Python: Select Interpreter and put the path - here is /home/joao/JR-IA/renv/python/condaenvs/renv-python). Important to note not to install IPkernel as it is often recommended by the pop ups in vsc.
Introduction
In this article we’re exploring another open data source, this time the World Bank. The topic selected for testing the data collection and visualization is the share of renewable energy for selected countries in the last 25 years. The article also confirms the compatibility of the python programming language in this website which is build with the R programming language.
Setup
%conda activate /home/joao/JR-IA/renv/python/condaenvs/renv-python%conda env list
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run
$ conda init <SHELL_NAME>
Currently supported shells are:
- bash
- fish
- tcsh
- xonsh
- zsh
- powershell
See 'conda init --help' for more information and options.
IMPORTANT: You may need to close and restart your shell after running 'conda init'.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Install wbdata
The World Bank website suggests several methods to obtain the data from its databases. After some reading we’ve opted here for the python module wbdata. The possibilities to install the module are described in wbdata documentation. In our website the python configuration has been done with conda which is not available. We’ve then opted for a direct download from github and saved the library folder directly in our project folder. This has been enough to make it available. The chunk below confirms that python sees the module.
We can then import it in a python chunk directly as below:
import wbdata
Load modules
Besides loading wbdata we also need to load the common python libraries for data wrangling and plotting:
import pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport datetimeimport json
Now we’re ready for data loading and exploration.
Renewable electricity output
Search country by code
As in similar articles we’ve opted to call the API as least as possible and to store the data objects locally. This avoids loading the entity servers and ultimately to see one’s access blocked. Our starting point has been the selection of countries for our analysis. The wbdata API provides a method to get countries called get_country. In the next chunk we show how we have queried once the data base for all the countries and save it as a json object.
renewables_country_list=wbdata.get_country()withopen('data/renewables_country_list.json', 'w') as json_file: json.dump(renewables_country_list, json_file)
Now we can load anytime we like from our local folder:
withopen('data/renewables_country_list.json', 'r') as json_file: country_list = json.load(json_file)
The returned object is a list of dictionaries. We can check the entry for Switzerland with:
This was a possible approach but it requires knowing the countries codes upfront. In case they’re not know there’s another country search method that accepts various arguments the search_countries method shown below.
Search country by name
We’re using here the keyword argument to provide country names. We start by creating a list with the country names that we feed in the loop. Using a similar sub setting approach with the index [0] and then the [‘id’] we can extract the code and assign it to a list.
target_countries = ['United States', 'Portugal', 'Switzerland']country_codes = []for country in target_countries: country_entry = wbdata.search_countries(country) country_code = country_entry[0]['id'] country_codes.append(country_code)print(country_codes)
['USA', 'PRT', 'CHE']
This country_codes list is the one that we will use later in our query for renewable energy figures.
Search indicators
Now that our countries are selected we’re going to identify an indicator of interest on which to make the analysis. As before we do this only once and store the result to a json object.
indicators_list = wbdata.search_indicators('renewable energy')withopen('data/renewables_indicators_list.json', 'w') as json_file: json.dump(indicators_list, json_file)
Now for the purposes of analysis and investigation we reload the indicators object as often as we need:
withopen('data/renewables_indicators_list.json', 'r') as json_file: indicators_list = json.load(json_file)
For the sake of testing we’re exploring here a different way of looking into the json file. Less intuitive but easier than the for loop is to use the pandas method json_normalize:
There are various methods to get the data from the world bank database. Again there is a json approach and one directly with pandas. As we have only one indicator and three countries we’re opting for a simpler approach with pandas. This should be sufficient as we don’t expect deeply nested data which is one of the main benefits of the json format.
We query the database once and store the result in a local csv file.
renewables=wbdata.get_dataframe(indicators, country = country_codes, convert_date =True)renewables.to_csv("data/renewables.csv")
We’ve opted to use seaborn. In particular when we want to use the hue this is a good approach. We assign the plot to a variable to allow to call directly some matplotlib methods for axis configuration:
sns.set_theme(style="whitegrid")g=sns.relplot(x='date', y='renewable_energy_perc_total', hue='country', data=renewables, kind='line')g.set(xlabel ='Time', ylabel ='Renewable energy consumption \n(% of total final energy consumption)')
<seaborn.axisgrid.FacetGrid at 0x7f99caddae90>
g.fig.suptitle('Renewable energy in selected countries')plt.tight_layout()plt.show()