170 likes | 262 Vues
Database Management. 2nd EUKLEMS Consortium Meeting, 9-11 June 2005, Helsinki This project is funded by the European Commission, Research Directorate General as part of the 6th Framework Programme, Priority 8, "Policy Support and Anticipating Scientific and Technological Needs".
 
                
                E N D
Database Management 2nd EUKLEMS Consortium Meeting, 9-11 June 2005, Helsinki This project is funded by the European Commission, Research Directorate General as part of the 6th Framework Programme, Priority 8, "Policy Support and Anticipating Scientific and Technological Needs".
EUKLEMS Database requirements • Interface with Consortium inputs • Produce ‘consistent’ KLEMS data from disparate data • Store EUKLEMS database • Generate Productivity estimates • Store and document ‘Assumptions’ and Methods • Interface with EUKLEMS public output
Examples of Practical Problems (1) • Aggregation: • Chain aggregate 2-digit output by industry to a 1-digit level • Disaggregation: • Estimate computer mfg. and other machinery output given output of total machinery mfg. • Ensure that the results aggregate to the observed total. • Employ available indicators (“pattern data”) of how to split the two. • Balancing • Estimate investment by industry and by asset type. • Observe detail in only 1 dimension at a time. • Estimates aggregated by type to equal the observed totals by industry. • Estimates aggregated by industry to equal the observed totals by type. • Employ available indicators (e.g. from other countries) to estimate the detailed breakdown.
Examples of Practical Problems (2) • Concording • Converting data for timeseries breaks in national industry codes • Converting data from national industry code to EUKLEMS definitions • Cross-classification (technically the same as concording) • Estimating labor by ‘quality-level’ using data by occupation.
Productivity Research Data System • Programmed as application layer on top of SAS, designed to overcome practical data hurdles • Data to be manipulated are coded in relational databases • Identities coded in meta data • SAS macros to manipulate the data make use of the meta data • Aggregation, disaggregation, balancing, and concording • Specialized tools to estimate Capital stocks, labour services, capital services, TFP, Domar weighting, alternative deflation • In addition: • Documentation • reference manual (also on-line) • users guide • User community (in statu nascendi)
What is a relational database? • Seperate data and meta-data • Meta-data has number of dimensions, e.g. • Year: 1970, 1971,….. • Country: A, B,C,.. • Industry: 1, 2, 3, …. • Unit: nominal, real, deflator • Type: investment in building, machinery,…, • 1970_B_3_nom_mach: 5,870
General lay-out of Analytical module • 1. Collection of basic data • A. Basic data by country • 2. Making meta-data (hierarchies, concordances) • B. Data in Prodsys readable format • 3. Combining datasources: (dis)agg, concor, balancing • C. Input data KLEMS: full time-series and industries • 4. Applying productivity tools: harmonization and experiment • D. Output data KLEMS • 5. Selection of series for analytical database • E. Public analytical database • Statistical module is same as analytical, except only data and procedures which are backed by NSO
Partner Involvement • Use SAS/Prodsys or Do not use SAS/Prodsys? Advantages of using SAS/Prodsys is full flexibility to experiment, but steep learning curve and expensive • Importantly: SAS/Prodsys is not needed for data delivery, only Excel!
Data delivery standard • Each table neeeds the following information • Description of source (publisher, NSO) and version (release date) • Classification Meta-data associated with each dimension of national database • Concordance with EUK dimension lists • Document with guidelines will be prepared
Tree structure of data hierarchy: A as Grand Total with nodes (‘children’) B and Ceach subdivided further down A B C D E F G H
Highest node on top, children indented,each node always immediately followed by itsown children: A .B ..D ..E ..F .C ..G ..H
Sequence of processing raw data • Prepare metadata (hierarchy) and concordance • Metadata and concordance are read into ProdSys • Then data is read into ProdSys • Metadata determine how original raw data gets ‘translated’ to SAS (table) version of original raw data • Metadata + already present EU KLEMS metadata + concordance determine how SAS version of original raw data gets transformed into EU KLEMS data • You don’t have to rearrange the actual data into this hierarchical structure; ProdSys assigns the data to the correct category in SAS once the category is part of the metadata.
Concordances • A raw data category may have to be split into several EU KLEMS categories • We need to be able to determine how the split is made: weights • Use pattern data to calculate these weights
NACE1 (4-digit)-to-EUK concordance nace1-4-digit-to-euk72.xls
Next steps • Document will be prepared on data delivery • User guide for Prod Sys • Prod Sys is still in development