In this article we’re exploring another open data source, this time the World Bank. The topic selected for testing the data collection and visualization is the share of renewable energy for selected countries in the last 25 years. The article also confirms the compatibility of the python programming language in this website which is build with the R programming language.
python: /home/joao/JR-IA/renv/python/condaenvs/renv-python/bin/python
libpython: /home/joao/JR-IA/renv/python/condaenvs/renv-python/lib/libpython3.7m.so
pythonhome: /home/joao/JR-IA/renv/python/condaenvs/renv-python:/home/joao/JR-IA/renv/python/condaenvs/renv-python
version: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0]
numpy: /home/joao/JR-IA/renv/python/condaenvs/renv-python/lib/python3.7/site-packages/numpy
numpy_version: 1.21.6
NOTE: Python version was forced by RETICULATE_PYTHON
Install wbdata
The World Bank website suggests several methods to obtain the data from its databases. After some reading we’ve opted here for the python module wbdata. The possibilities to install the module are described in wbdata documentation. In our website the python configuration has been done with conda which is not available. We’ve then opted for a direct download from github and saved the library folder directly in our project folder. This has been enough to make it available. The chunk below confirms that python sees the module.
py_module_available("wbdata")
[1] TRUE
We can then import it in a python chunk directly as below:
import wbdata
Dependencies install
When trying to run the module in our setup, several dependencies were missing. We keep the code here of the installation for future reference.
Besides loading wbdata we also need to load the common python libraries for data wrangling and plotting:
import pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport datetimeimport json
Now we’re ready for data loading and exploration.
Renewable electricity output
Search country by code
As in similar articles we’ve opted to call the API as least as possible and to store the data objects locally. This avoids loading the entity servers and ultimately to see one’s access blocked. Our starting point has been the selection of countries for our analysis. The wbdata API provides a method to get countries called get_country. In the next chunk we show how we have queried once the data base for all the countries and save it as a json object.
renewables_country_list=wbdata.get_country()withopen('posts/20220803/data/renewables_country_list.json', 'w') as json_file: json.dump(renewables_country_list, json_file)
Now we can load anytime we like from our local folder:
withopen('data/renewables_country_list.json', 'r') as json_file: country_list = json.load(json_file)
The returned object is a list of dictionaries. We can check the entry for Switzerland with:
This was a possible approach but it requires knowing the countries codes upfront. In case they’re not know there’s another country search method that accepts various arguments the search_countries method shown below.
Search country by name
We’re using here the keyword argument to provide country names. We start by creating a list with the country names that we feed in the loop. Using a similar sub setting approach with the index [0] and then the [‘id’] we can extract the code and assign it to a list.
target_countries = ['United States', 'Portugal', 'Switzerland']country_codes = []for country in target_countries: country_entry = wbdata.search_countries(country) country_code = country_entry[0]['id'] country_codes.append(country_code)print(country_codes)
['USA', 'PRT', 'CHE']
This country_codes list is the one that we will use later in our query for renewable energy figures.
Search indicators
Now that our countries are selected we’re going to identify an indicator of interest on which to make the analysis. As before we do this only once and store the result to a json object.
indicators_list = wbdata.search_indicators('renewable energy')withopen('data/renewables_indicators_list.json', 'w') as json_file: json.dump(indicators_list, json_file)
Now for the purposes of analysis and investigation we reload the indicators object as often as we need:
withopen('data/renewables_indicators_list.json', 'r') as json_file: indicators_list = json.load(json_file)
For the sake of testing we’re exploring here a different way of looking into the json file. Less intuitive but easier than the for loop is to use the pandas method json_normalize:
id name
0 2.1_SHARE.TOTAL.RE.IN.TFEC Renewable energy consumption(% in TFEC)
1 3.1_RE.CONSUMPTION Renewable energy consumption (TJ)
2 4.1.2_REN.ELECTRICITY.OUTPUT Renewable energy electricity output (GWh)
3 EG.FEC.RNEW.ZS Renewable energy consumption (% of total final...
Using this information we prepare the variable for the final query. In this case it is sufficient to do it manually.
There are various methods to get the data from the world bank database. Again there is a json approach and one directly with pandas. As we have only one indicator and three countries we’re opting for a simpler approach with pandas. This should be sufficient as we don’t expect deeply nested data which is one of the main benefits of the json format.
We query the database once and store the result in a local csv file.
renewables=wbdata.get_dataframe(indicators, country = country_codes, convert_date =True)renewables.to_csv("data/renewables.csv")
We’ve opted to use seaborn. In particular when we want to use the hue this is a good approach. We assign the plot to a variable to allow to call directly some matplotlib methods for axis configuration:
sns.set_theme(style="whitegrid")g=sns.relplot(x='date', y='renewable_energy_perc_total', hue='country', data=renewables, kind='line')g.set(xlabel ='Time', ylabel ='Renewable energy consumption \n(% of total final energy consumption)')
g.fig.suptitle('Renewable energy in selected countries')plt.tight_layout()plt.show()