
Upcoming e-Learning Courses 2024
20 January to 21 February 2025
Using administrative data to produce official statistics
National statistical systems are increasingly using administrative data to compile official statistics. Such data can be utilized to better meet the increasing demands for new statistics and indicators that are highly disaggregated. Administrative data is not collected for the primary purpose of compiling official statistics, and statisticians need to ensure that the data meets certain criteria before using it to produce official statistics. This course provides an overview of administrative data, a discussion of data quality issues and institutional mechanisms to ensure that administrative data can be used in the production of official statistics. The course builds upon content developed for in-person training courses conducted by UNSD and to which members of the Collaborative on Administrative Data have provided valuable input.
17 February to 14 March 2025
Principles of Reproducible Analytical Pipelines for Official Statistics
This e-learning course aims to build capacity in national statistical systems for the development and implementation of Reproducible Analytical Pipelines (RAPs) for Official Statistics.
What is a Reproducible Analytical Pipeline?
Simply put, reproducible analytical pipelines (RAPs) are automated statistical processes (data processing and analysis) that codify to the greatest extent possible the production of official statistics. Common tools that are used to develop RAP include software such as R or Python, and version control management tools such as Git.
Reproducibility is at the heart of the approach. It implies that the outputs can be generated again with any new or revised input datasets using the RAP developed. This also implies drafting documents explaining the RAP that make it possible to build institutional knowledge and use the RAP in the future by new staff.\
Why are Reproducible Analytical Pipelines important for Official Statistics?
All national statistical systems are engaged in the regular, high frequency production of many official statistics. For example, most countries compile monthly consumer price index (CPI). The input data for the compilation of CPI is generally the same from month to month. By developing and implementing an RAP for the compilation of CPI, countries can improve the timeliness and quality of the CPI since automation reduces the time required to clean and analyze the data; it also reduces the chance of errors that could occur when relying on non-automated processes.
Furthermore, the Sustainable Development Goals (SDGs) require that countries use more diverse data sources in the compilation of indicators. The COVID-19 crisis has shown that automated tools can facilitate data analysis and reporting when these sources are updated. These tools, including software such as R and sharing platforms such as GitHub, allow statisticians to streamline data cleaning, compilation, and analysis.