230 likes | 245 Vues
Explore the Common Structure of Statistical Information (CoSSI) for efficient dissemination of statistical data, publications, and metadata. Learn about XML-based formats, metadata structure, and implementation across various data formats. Ensure standardized information exchange and seamless data processing.
E N D
Dissemination of Statistical Data, Publications and Metadata - Process Based on Common Structure of Statistical Information (CoSSI) Harri Lehtinen(harri.lehtinen@stat.fi)
CoSSI: (Common Structure of Statistical Information) • The point of departure in the CoSSI was an (infological) analysis of the information being considered. • The conclusion from the analysis was that although in practice the definition of statistical information has varied according to a given situation and application, in reality statistical information has a certain simplifiable and acceptable universal structure. • The CoSSI describes the general structure that is not dependent on the situation of the statistical information presented in differing formats.=>CoSSI defines the structures of statistical data, metadata and publications. Harri Lehtinen
XML based dissemination - CoSSI • Modules: • Document metadata • Statistical metadata • Processing metadata • Publications • DATA: • Matrices (XDF) • Tables (CALS) • Sparse matrix (KEYS) CoSSI – (www.stat.fi/cossi) Harri Lehtinen
Implementation • Modular DTD system • Document Type Definitions • Use of standards • CALS, XDF, Dublin-Core... • Statistical matrix (statinfo_xdf.dtd):statmeta.dtd, docmeta.dtd, xdf.dtd • Statistical table (statinfo_cals.dtd):statmeta.dtd, docmeta.dtd, cals.dtd • Publications and documents (publication.dtd):docmeta.dtd, statmeta.dtd, statinfo_cals.dtd, figure.dtd... XML • One XML-file -> data and metadata • Multi-lingual documents Harri Lehtinen
Metadata • Statistical metadata • Information vital for the interpretation of numerical statistical information • Document metadataInformation about: • The producer of document • Document’s content • Processing metadata • Information for a software to process data Harri Lehtinen
Statistical metadata Content model ofstatistical metadata Document metadata Statistical metadata Variable name Concept definition Operational definition Description Calculation formula Measurement unit Classification ID Type Author Date Values Figure Harri Lehtinen
Creator Person Publisher Organisation Contributor Person Date Published, modified Language Main and other language Document information SVT and Category Identifier URN, URL, ISBN, ISSN, DOI, Number Document metadata Content model ofdocument metadata Subject Keywords Content description Type Format Rights Coverage Relations Source Harri Lehtinen
x x … x … x 11 12 1j 1p x x … x … x 21 22 2j 2p . . . . . . . . x x … x … x i1 i2 ij ip . . . . . . . . x x … x … x n1 n2 nj np Variable x x … x … x 1 2 j p . . Statistical unit a i . . n a Content model of statistical data matrix Statistical data Title Document metadata Statistical metadata Processing metadata Statistical data matrix XDF Variables Class values Statistical units Footnotes Harri Lehtinen
Statistical table CALS Column headings Row headings Numerical data Table footnotes Statistical table Statistical metadata Content model ofstatistical table Table title Document metadata Processing metadata Harri Lehtinen
Document Document metadata Documentsand publications Document main title Ingress Introduction Abstract Headnote Product specification Chapters Title Sections Title Paragraphs Summary Footnotes Bibliography Appendix Definition lists Harri Lehtinen
Paragraph Paragraph List (unordered / ordered) Statistical table Figure Link Footnote reference Bibliographical reference Emphasis Harri Lehtinen
Implementation to the PC-Axis • Need for the XML format for the PC-Axis • CoSSI-matrix-format is close to the PC-Axis data format and supports also multi-lingual data • Processing metadata for the PC-Axis (pxmeta) • Mapping of PC-Axis metadata to the CoSSI-model statistical, document and processing metadata • Three data formats • Matrix (XDF) • Table (CALS) • Keys (PC-Axis) => but the same metadata for all formats! • Allows more metadata than the original PC-Axis format • Automatical conversion between data formats Harri Lehtinen
CoSSI for the PC-Axis Data part is in different formats but everything else stays the same • Matrix • Docmeta • Procmeta • Statmeta • Data -> XDF • Table • Docmeta • Procmeta • Data -> CALS • Statmeta • Keys • Docmeta • Procmeta • Statmeta • Data -> Keys Information is the same in all formats! Harri Lehtinen
PC- Axis - tables PDF XLS / Dissemination process –Office97 .PX PX-Editmanual or batch processing - checking - edit metadata Automaticalpublishing -Timercontrolled Databaseservices .PX .PX PX-Web PX-Edit or PC-Axismanual or batch processing - exclusion- save as: Excel or txt Statisticalapplication Web-site FastWeb -Timercontrolled www.stat.fi PX-Edit Publication production (Monthly & quarterly publ, publication tables...) SuperStar to PX SAS to PX HTML Publicationeditor Metadata: - statistical metadata- classifications - processingmetadata FastWeb:- Conversionto XHTML Conversionto PDF Word, Excel,... PX-templates Harri Lehtinen
What we need: • More and better metadata • Validation • Language versions • All information in a single file • Archiving • Automatical conversion to different dissemination channels • Structured searches • SVG • Vendor free solution • To add new dissemination channels Harri Lehtinen
.PX .PX .PX .PX Statisticalapplication PX-Edit -> PX&CoSSI SuperStar -> PX&CoSSI Publicationeditor SAS -> PX&CoSSI Arbortext Metadata: - statistical metadata- classifications - processingmetadata PDF PDF PDF Monthly & quarterly publ, publication tables...) eXist,XML-database / XML based dissemination process – XML and PC-Axis Publishingandpreview PX-Web:PC-Axis tables Databaseservices PX-Web FastWeb-XML Conversion Disseminationdatabase HTML HTML Web-site eXist,XML-database www.stat.fi Printinghouse RSS,SDMX RSS,SDMX Harri Lehtinen
Statisticalapplication .xml .xml PX-Edit -> PX&CoSSI SuperStar -> PX&CoSSI Publicationeditor SAS -> PX&CoSSI Arbortext Metadata: - statistical metadata- classifications - processingmetadata PDF PDF PDF Monthly & quarterly publ, publication tables...) eXist,XML-database / XML based dissemination process – integration completed Databaseservices FastWeb-XML Publishingandpreview PX-Web PX-Web:matrices(PXML) Conversion Disseminationdatabase HTML HTML Web-site eXist,XML-database www.stat.fi Printinghouse RSS,SDMX RSS,SDMX Harri Lehtinen
XML Database and Statistical Information Harri Lehtinen
eXist XML database Statistical metadata Statistical publications Statistics Statistical tables Harri Lehtinen
Statistical publication in the Arbortext editor Harri Lehtinen
Statistical metadata for a variable in a table Statistical metadata for a variable ”Disposable income” Harri Lehtinen
HTML output of a statistical publication with statistical metadata Link to the statistical metadata Harri Lehtinen
User interface for publishing and preview Harri Lehtinen