Assessing the feasibility of micro-data access Atle Alvheim Assistant Director Norwegian Social Science Data Services Luxembourg 26 - 27 October 2006
Norwegian Social Science Data Services (NSD)email@example.com
”Much more time went into finding or obtaining information than into digesting it” Dr. J.C.R. Licklider time spent on digesting and thinking time spent on finding and accessing ( ) Maximize There is a lack of “Tools for thought”
Situation to-day Situation tomorrow? • Only a fraction of data resources available on line • Lack of standardization • Poor integration between data and metadata • Institutional, legal, and commercial obstacles • All empirical data available on-line • An integrated gateway to be used to integrate and locate relevant resources • The ability to browse, visualize, and analyze data on-line • Hyperlinks from data to relevant scientific publications and resources • Empirical feedback system to build the collective memory of a data collection
Data Sharing What are we looking for? Data Metadata Tools
Labeled stuff Unlabeledstuff The bean example is taken from: A Manager’s Introduction to Adobe eXtensible Metadata Platform, http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf Why are metadata important?
Sharing The functions of metadata Finding Assessing Understanding
Data Documentation Initiative (DDI) An international XML-based standard for the content, presentation, transport, and preservation of documentation Benefits of the DDI Approach Interoperability Richer content Single document - multiple purposes On-line subsetting and analysis Precision in searching Codebooks can be exchanged and transported seamlessly, and applications can be written to work with these homogeneous documents. Providing the data analyst with broader knowledge about a given collection. The codebook contains all of the information necessary to produce several different types of output. DDI documents are easily imported into on-line analysis systems, rendering datasets more readily usable for a wider audience. Field-specific searches across documents and studies are enabled.
A life-cycle model of data Data Archiving Study Concept Data Collection Data Processing Data Distribution DataDiscovery DataAnalysis Repurposing Combined life cycle model
A common European data portal • Metadata is all about communication • Madiera: A set of tools, + an idea: Data is a kernel that facilitates a ”discussion” • Maybe future libraries consist of datasets with linked or derived knowledge-products, books, papers, tables, etc, wikis • Could we imagine libraries of hypoteses ? • Libraries of questions and discussions more than of answers ?
What was the specified MADIERA Objectives ? • The development of an integrated and effective distributed social science portal to facilitate access to a range of data archives and disparate resources. WP3 • The development of specific add-ons to existing virtual data library technologies, in particular data location technology WP4 • The employment of a multi-lingual thesaurus to break the language barriers to the discovery of key resources. WP5 • An extensive programme to add content, both at the data/information and knowledge levels. WP6 • Extensive training of data providers and users to encourage the continuos growth of the infrastructure WP7
End users Data providers A Web of the Social Sciences • Building on a distributed model where data and resources are stored and maintained locally • For the end user the system will appear as a integrated system • A virtual data library offering global access to locally supported data holdings
What is then necessary to develop useful procedures ? • Metadata standards lift data from digits to research information • Technical solutions, software: Information- and access systems, in addition to analysis and download possibilities • Political agreements, conditions for data access • Economic agreements, logging, audits
EXAMPLE A common resource European Social Survey (ESS) europeansocialsurvey.org ess.nsd.uib.no An academically driven social survey designed to chart and explain the interaction between Europe's changing institutions and the attitudes, beliefs and behaviour patterns of its diverse populations.
ANOTHER EXAMPLE: Aggregate data The determinants of active civic participation at European and national level (CIVICACTIVE) nsd.uib.no/civicactive
A third example: a common entry-point madiera.net The MADIERA project has developed an effective infrastructure for the European social science community by integrating data with other tools, resources and products of the research process.
An Irish researcher A Finnish researcher A Swiss researcher • A registration procedure, register with home archive • Look up and access data across holdings Attitudes towards Immigrants (A problem area) Data on Finland (A geographic area) Eurobarometer (A data collection) A scheme
A ”Data-archive Political Context” for 20+ national archives • It might be money involved • Is the data a free or commercial good ? • There are categories of users, what about non-academic use, non-CESSDA use ? • Who are to fix the prices ? • Varying Access rules. The crossing of national borders • What laws apply. Who set the rules • Who is responsible ? What sanctions available ? • There are some “Common good” data • Eurobarometers, Value studies, ISSP, ESS, Comparative collections • Could best be provided from one single point (?) • Charging ? Access Conditions ? Double Storage ? • It is a good thing to have national archives, enhances amount of data available and betters the accessibility. • Need justification and visibility
Madiera: A common portal for all of Europe, ++ Portal • Functionality: • Link many local servers • Search and browse possibilities • __________________ • Standardised software and • standardised documentation • Translation possibilities Politics: Coordinated access rules Politics Money NSD FSD SSD ZA DDA DANS UKDA