510 likes | 536 Vues
Learn how UNIDO leverages statistical metadata management to enhance industrial statistics, data collection, processing, and dissemination. Gain insights into the overall strategy and metadata principles for improved statistical outcomes.
 
                
                E N D
Case Study: UNIDO Valentin Todorov UNIDO v.todorov@unido.org METIS 2008 (Luxembourg, 9-11 April 2008) METIS 2008, Luxembourg: Valentin Todorov
Outline • Introduction and Overview • Statistical Metadata Systems and the Statistical Cycle • Statistical Metadata in each phase of the Statistical Cycle • Systems and Design Issues • Organizational and Cultural Issues METIS 2008, Luxembourg: Valentin Todorov
About UNIDO • UNIDO was set up in 1966 • Became a specialized agency of the UN in 1985 • Promote industrialization throughout the developing world • 172 Member States(as of 3 December 2007) • Headquarters in Vienna • Represented in 35 developing countries METIS 2008, Luxembourg: Valentin Todorov
About Statistics in UNIDO • Service Module “Industrial Governance and Statistics”: • monitor, benchmark and analyse their industrial performance and capabilities • formulate, implement and monitor strategies, policies and programmes to improve the contribution of industry to productivity growth and the achievement of the UN Millennium Development Goals (MDGs) • Building capabilities in industrial statistics - providing technical assistance to: • Introduce best practice methodologies and software systems • Enhance the quality and consistency of the industrial statistics databases METIS 2008, Luxembourg: Valentin Todorov
About the Organisation All statistical activities are carried out by the Research and Statistics Branch – PCF/RST METIS 2008, Luxembourg: Valentin Todorov
Overall strategy and metadata management principles • Conceptual development was initiated in 1999 • An integrated data and data documentation (metadata) framework • A smooth migration policy - must not disrupt established UNIDO data services • Stepwise development in the context of a migration project of the statistical databases from an IBM mainframe to a client/server platform • Backed by the UNIDO Quality Assurance Framework METIS 2008, Luxembourg: Valentin Todorov
Overall strategy (cont.) • Following the International Recommendations for Industrial Statistics (2008) • Common formats and nomenclatures for exchange and sharing of statistical data and metadata- SDMX • Availability of the metadata in three languages (English, French and Spanish) • Based on a formal framework - the proposed information system architecture comprises two cubes, one for statistical data and another for the metadata interrelated by a set of shared dimensions - see Froeschl et al. (2002), Froeschl and Yamada (2000) METIS 2008, Luxembourg: Valentin Todorov
UNIDO Statistical Process • Initialisation • Pre-filling of the out-going UNIDO General Industrial Statistics Questionnaire with previously reported statistical data and metadata (non-OECD countries) • Excel format • In the appropriate language - English, French or Spanish • Automated using the available data and metadata • Data Collection • NSO: the completed and returned to UNIDO by the NSO questionnaires (excel format, rarely hard copy) are entered into the system and are ready for further validation and processing • OECD: Data for OECD member countries (excel format) are ready for further validation and processing METIS 2008, Luxembourg: Valentin Todorov
UNIDO Statistical Process • Transformation/Processing • The data collected from the primary or secondary sources are further transformed to a ready-to use data sets • The data transformation is done in five stages, which not only constitute an operational framework for UNIDO statisticians, but also provides additional description of statistics (generated metadata which are attributed to each data item) • After undergoing the complete processing phase the incoming and generated data and metadata are stored in the databases • Dissemination • International Yearbook of Industrial Statistics • INDSTAT and IDSB CD products • Web Country Statistics (Country Brief) • Ad hock requests by internal and external users METIS 2008, Luxembourg: Valentin Todorov
Mapping of the UNIDO cycle phases to these developed by the METIS group METIS 2008, Luxembourg: Valentin Todorov
Overall structure of ISDE METIS 2008, Luxembourg: Valentin Todorov
ISDE Applications • ADMIN – provides administrative services, like user and authorisation management, logging and auditing of the system, backup and restore management • outside of the life cycle • Nomenclature Explorer - maintenance of the core definitional metadata (not related to particular data sets or items) • outside of the life cycle • Questionnaire - management of the pre-filling and distributing of the questionnaires • used in the Initialisation phase METIS 2008, Luxembourg: Valentin Todorov
ISDE Applications • Data Wizard – the main data and metadata maintenance tool • Used in the Data Collection and Transformation phases • Provides services for • Reading in the data and metadata from the returned back questionnaire (Excel) • Initial validation of the read in data and storing in the database (at stage 1) • Maintenance of the data and metadata • Screening • Aggregation and further data validations and transformations METIS 2008, Luxembourg: Valentin Todorov
ISDE: Publication applications • Yearbook – a complex set of applications for production of the International Yearbook of Industrial Statistics • aggregation, layout, • PDF file generation according to pre-defined templates and other tools • The final result is a publication ready PDF file of about 700 pages • INDSTAT CD – produce the INDSTAT type of CD products • IDSB CD – produce the IDSB type of CD products • WEB – generate the necessary data and metadata for updating the WEB dissemination database • This database is outside of the ISDE system • Managed by the computer section METIS 2008, Luxembourg: Valentin Todorov
ISDE Applications • Presentation Wizard – mainly a visualization tool which can be used in the Dissemination phase for answering ad hock requests, but because of its versatile functionality it finds a wide usage also in the Data Transformation phase • Other applications – in this category are included any other applications used in the process, like SAS, R, tools for compilation of Production index numbers and National Accounts data (which are outside of the scope of this document) METIS 2008, Luxembourg: Valentin Todorov
Implementation Strategy • Developed in the context of migration from Mainframe to a Client/Server platform • A stepwise approach was chosen because of the following reasons: • The project was not urgent • The software test and sustaining of the new system - in-house • Only limited resources/funds were available • The staff was very willing to participate in the project • The goal was not only to migrate the system but rather to develop a completely new one and the requirements were not yet completely specified • A key requirement was that the established UNIDO data services must not be disrupted METIS 2008, Luxembourg: Valentin Todorov
Implementation Steps • Step 1: High level architecture design, Data model, physical C/S database, definitional metadata tool • Rigorous analysis of the existing system and development of a data model - as generic as possible in order to be able to accommodate any subsequent changes • Based on the data model a loader application was developed which allowed in any moment to synchronize the data in Mainframe and in the Sybase database • The development of the new metadata subsystem was initiated by implementing a tool for maintenance of the definitional metadata • Thus a kind of proof of concept was successfully completed METIS 2008, Luxembourg: Valentin Todorov
Implementation Steps • Step 2: Reference metadata, dissemination applications • A capture/maintenance tool was developed • The description/methodological metadata – Word, Excel - were entered into the system • The Mainframe footnote database (data-item level metadata) was imported • Thus the complete process of maintenance of the available metadata was migrated to the Client/Server platform • Data dissemination applications were developed which allowed to produce the recurrent statistical publications/products from the Mainframe system and from the Client/Server platform in parallel - an ideal acceptance test for the new applications by just comparing the results METIS 2008, Luxembourg: Valentin Todorov
Implementation Steps • Example: International Yearbook of Industrial Statistics • From the Mainframe was produced as a camera-ready line printer output which was glued together with many MS Word and MS Excel documents • From the Client/Server system a page numbered PDF file of about 700 pages is automatically generated • Step 3: Pre-filled questionnaire, data capturing and maintenance • Pre-filling of the questionnaire - for a second time from the new Client/Server data- and metadata-base • Development of the data capturing/maintenance tools - now in the phase of final acceptance testing • From June 2008 - only the Client/Serve system will be used • Ultimate decoupling of the new system from the Mainframe METIS 2008, Luxembourg: Valentin Todorov
Metadata classification No formal metadata classification, but according to their usage and their role in the statistical production process we distinguish roughly between: • Structural or definitional metadata: refer to metadata that act as identifiers and descriptors of the data (and metadata) • Reference metadata: describe the properties and quality of the statistical data • System metadata: used to drive automated processing throughout the phases of the lifecycle METIS 2008, Luxembourg: Valentin Todorov
Metadata in the lifecycle • In each phase of the lifecycle the structural/definitional metadata are used • The structural metadata are created/updated relatively independently from the lifecycle • Add a new country (e.g. Serbia and Montenegro recently) • Currency change (e.g. Slovenia, Malta and Cyprus recently) • Country groupings: two more countries joined EU (Bulgaria and Romania) • No metadata are created in the first and last phase (Initialisation and Dissemination) but it is possible that corrections are performed METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Initialisation • Pre-filling of the out-going UNIDO General Industrial Statistics Questionnaire with previously reported statistical data and metadata • System metadata: drive the automated processing • Template for the questionnaire • Language • ISIC revision • Output format (unit exponent, digits) • Operational metadata: stage 1 data used for pre-filling • Descriptive, methodological, implicit metadata used for pre-filling into the questionnaire METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Data Collection • After receiving back the completed questionnaires, they are entered (automatically) in the system for validation and further processing • Together with the data the received metadata are entered into the system • The provided metadata are sometimes not described from the viewpoint of international comparability but rather from the viewpoint of national standards. In such cases the UNIDO statistical staff re-describes/rearranges the provided metadata into explicit information for the deviation from the international standard METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Data Collection (cont.) • Metadata can be attached to each data item • “Missing because of confidentiality reasons” or • combinations of ISIC codes like “1511 includes 1512” • Data for OECD member countries • collected through joint OECD/UNIDO questionnaire and • transmitted to UNIDO (Excel format) • do not contain metadata (extracted from other OECD publications) METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Transformation • The metadata collected from the NSOs together with the data undergoes the same transformation process as the data and is complemented by metadata generated by the transformation process • The data transformation is done in five stages - additional description of the data • At the same time Source and Method metadata are maintained for each data item • If appropriate, re-description of the provided metadata from viewpoint of international comparability is performed METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination • International Yearbook of Industrial Statistics • the main UNIDO statistical product • the latest yearbook released in 2008 covered the data for the period from 1995 to latest year • The country data was updated for 74 countries and is compiled from the Stage 1 and Stage 2 • CD products, which might include data from all stages described earlier - www.unido.org/statistics • Country Brief - statistics by selected variables from the different UNIDO databases for each member state which are posted in UNIDO web-site: http://www.unido.org/statistics METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination METIS 2008, Luxembourg: Valentin Todorov
Systems and Design Issues • Client/Server architecture build on .Net technology • Centralised database: • Sybase ASE 12.5 on Linux • Test and production databases • Client (desktop) applications developed using MS Visual studio in C# • Commonality through using shareable component libraries – C# • Other tools: • SAS, R, STATA • Development tools METIS 2008, Luxembourg: Valentin Todorov
Organizational and Cultural Issues • No specialised metadata roles are necessary • processing of metadata and data are tightly coupled • responsibilities are organized by country • No special training for the staff was necessary • all statisticians participated actively in the specification and the development of the system • the system testing was performed by parallel runs on the Client/Server and Mainframe • Nevertheless a complete set of documentation and training materials is being prepared • unifying the terminology and the information about the system • induction training of new colleagues • operational and maintenance concept documents METIS 2008, Luxembourg: Valentin Todorov
THE END METIS 2008, Luxembourg: Valentin Todorov
Examples METIS 2008, Luxembourg: Valentin Todorov
Example:NomenclatureExplorer METIS 2008, Luxembourg: Valentin Todorov
Example:ADMIN -Topics METIS 2008, Luxembourg: Valentin Todorov
Example:DataWizardView/EditQuestionnaire METIS 2008, Luxembourg: Valentin Todorov
Example:DataWizardView/EditMetadata METIS 2008, Luxembourg: Valentin Todorov
Example: R Graphics METIS 2008, Luxembourg: Valentin Todorov
Example: Implicit metadata • For example several industry categories can be combined and reported together by a given country for a given indicator and years • In the questionnaire returned by the NSOs such a combination is expressed in the following way … 1511 Processing/preserving of meat 1234 a/ 1512 Processing/preserving of fish … a/ 1513 Processing/preserving of fruit & vegetables … a/ … REMARKS: a/ 1511 includes 1512 and 1513 • ‘Exclude’ for other country specific classification discrepancies • ‘Substitute’ for synonyms • Aggregations METIS 2008, Luxembourg: Valentin Todorov
Example: System metadata in the Initialisation phase - I METIS 2008, Luxembourg: Valentin Todorov
Example: System metadata in the Initialisation phase - II METIS 2008, Luxembourg: Valentin Todorov
Example: Descriptiveand methodologicalmetadata used in the Initialisation/Data Collection phase METIS 2008, Luxembourg: Valentin Todorov
Example: Metadataattached to each data item used or created in the Initialisation and Data Collection phase METIS 2008, Luxembourg: Valentin Todorov
Overall structure of ISDE METIS 2008, Luxembourg: Valentin Todorov
Backup slides METIS 2008, Luxembourg: Valentin Todorov
Operational Framework: Stages • Stage 1 – responses to national questionnaires. Detection and if possible correction of obvious reporting errors • Used for re-filling the following edition of the questionnaire • Data are considered official • Stage 2 – incorporation of published national data. Inconsistent data are corrected using supplementary information from national publications • Published in International Yearbook of Industrial Statistics • Data are considered official METIS 2008, Luxembourg: Valentin Todorov
Operational Framework: Stages (cont.) • Stage 3 – disaggregation of data. Data are adjusted to eliminate the departures from the level of ISIC aggregation • using national and international sources • using supplementary data • Stage 4 – automatic disaggregation and interpolation. Missing data are estimated applying related proportion or interpolation whenever applicable • For ISIC 3-digit only • Stage 5 – estimation of provisional data for the latest years • Selected variables only METIS 2008, Luxembourg: Valentin Todorov
Reference metadata • Implicit metadata – a special class of metadata arising throughout the specific usage of other metadata. Typical example are the ISIC combinations • Operational Metadata – generated by the process of data transformation and attributed to the respective data items • a stage indicator reflecting the data item’s credibility • “Source” and “Methods” metadata, describing the source of the data item and methods applied for its generation METIS 2008, Luxembourg: Valentin Todorov
Reference metadata (cont.) • Descriptive and Methodological metadata – received from the primary data reporters and than are further processed together with the data. • During this processing additional metadata can be added. • Can be attached to all possible levels ranging from the complete data set down to individual data items. METIS 2008, Luxembourg: Valentin Todorov