Principles of Reproducible Analytical Pipelines for Official Statistics

Image
e-learning

e-Learning

06 to 31 May 2024

Online

SIAP

e-Learning

Download ICS

Print this page

close

Share

You can share the page by clicking the icon.

Overview

This e-learning course aims to build capacity in national statistical systems for the development and implementation of Reproducible Analytical Pipelines (RAPs) for Official Statistics.

What is a Reproducible Analytical Pipeline?

Simply put, reproducible analytical pipelines (RAPs) are automated statistical processes (data processing and analysis) that codify to the greatest extent possible the production of official statistics. Common tools that are used to develop RAP include software such as R or Python, and version control management tools such as Git.

Reproducibility is at the heart of the approach. It implies that the outputs can be generated again with any new or revised input datasets using the RAP developed. This also implies drafting documents explaining the RAP that make it possible to build institutional knowledge and use the RAP in the future by new staff.  

Why are Reproducible Analytical Pipelines  important for Official Statistics?

All national statistical systems are engaged in the regular, high frequency production of many official statistics. For example, most countries compile monthly consumer price index (CPI). The input data for the compilation of CPI is generally the same from month to month. By developing and implementing an RAP for the compilation of CPI, countries can improve the timeliness and quality of the CPI since automation reduces the time required to clean and analyze the data; it also reduces the chance of errors that could occur when relying on non-automated processes. 

Furthermore, the Sustainable Development Goals (SDGs) require that countries use more diverse data sources in the compilation of indicators. The COVID-19 crisis has shown that automated tools can facilitate data analysis and reporting when these sources are updated. These tools, including software such as R and sharing platforms such as GitHub, allow statisticians  to streamline data cleaning, compilation, and analysis. 

Implementing these new approaches to data processing and analysis and utilizing these modern tools can aid in the timelier production of statistics and lower the time staff need to produce statistics. This approach is also more transparent, easier to review and to share among staff and thus less prone to errors in the data and/or statistical analysis. Furthermore, implementing good practices based on RAP can contribute to building institutional knowledge by ensuring that work programmes can be more seamlessly transferred among different staff and over time.

Please click here to register for this course. Further information about the course will be provided to registered participants at the end of April.

Documents

Concept Note PDF

For more information, Please contact

escap-siapun.org