340 likes | 433 Vues
What Makes a Data Archive Tick: Marrying Content and User Support . Steven Worley National Center for Atmospheric Research Computational and Information Systems Laboratory May 17-21, 2010 Summer Institute for Data Curation for Earth and Environmental Science
E N D
What Makes a Data Archive Tick: Marrying Content and User Support Steven Worley National Center for Atmospheric Research Computational and Information Systems Laboratory May 17-21, 2010 Summer Institute for Data Curation for Earth and Environmental Science Graduate School of Library and Information Science University of Illinois, Urbana-Champaign
How to make and keep the archive content relevant to the users? • How to engage the users?
How to make and keep the archive content relevant to the users? Know your users • Define your focus community • Cannot serve everyone • Design service not to limit others • At decision points (e.g. changes in service) ask: • “Is this a significant benefit for my users?” • The case @ NCAR • Atmospheric, oceanic, and some related geo-science research • Graduate students and higher education • NCAR scientists, researchers @ universities with graduate degree programs in meteorology and oceanography • Over 50% of 6000+ unique users, annually, are outside focus group
How to make and keep the archive content relevant to the users? Understand their science, currently, and trends • Attend seminars, symposia, meetings where they present their work • Corollary: Have science educated staff • The case @ NCAR – Research Data Archive • All have MS degrees, or greater • meteorology (6) • oceanography (2) • computing science (1) • exception – admin. (1)
How to make and keep the archive content relevant to the users? Understand their science, currently, and trends • Routinely review journals, bulletins, and relevant news letters • Search for science strongly dependent on your data focus • Contact authors, offer data sharing service • @ NCAR
How to make and keep the archive content relevant to the users? Understand their science, currently, and trends • Develop close contacts with a few key users • Seek ‘honest’ opinions about your service • Make your service known – presentations, publications • @ NCAR
How to make and keep the archive content relevant to the users? Know how your users work • How do they prefer to handle data? • Digital files – write and run program codes to evaluate content • Digital files – specific formats that are application friendly • E.g. netCDF, GIS, WMO • ASCII text convenient for worksheets • Images of analyses (charts, line graphs, 2D/3D contoured plots) • @NCAR • Digital files are key • Some images for discovery, but not critical • Design the systems to deliver what users want
How to make and keep the archive content relevant to the users? Choosing the content • At decision points (e.g. adding a new dataset) ask: • “Can we handle this efficiently?” • Does it supplement or extend the central data foci? • Does it address a new need or trend? • Are the formats aligned with user preferences? • If not, can we make a cost effective conversion? • Do you have staff (data scientists / stewards) that can understand the scientific content? • @ NCAR • Atmospheric, oceanic, related geo-sciences observations or analyses derived from observations to support climate and weather research.
How to make and keep the archive content relevant to the users? Choosing the content • Evaluate user metrics • What datasets are most popular? • Who is using what – can you distinguish your focus group? • Are there any trends? • Caution: this is only part of the story • @ NCAR • Our user registration allows us to track this • Examples
Unique Users by service path Users in four service categories • MSS to CISL HPC environment • Web to world-wide community • Orders – one off consulting assisted data preparation • TIGGE 6 thousand users annually • FY09: MSS=266, Web=5649, Orders=196, TIGGE=44
Amount of data by service path Users in four service categories • MSS to CISL HPC environment • Web to world-wide community • Orders – one off consulting assisted data preparation • TIGGE 162 TB in FY09 • FY09: MSS=31, Web=120, Orders=9, TIGGE=2
User ranked popular datasets Top 10 datasets/groups FY09 ~ 6000 Unique Users Annually NCAR-CSM Symposium on Climate and Energy
How to make and keep the archive content relevant to the users? Remain flexible – expect constant change • Be ready to take opportunities when they come along • Re-adjust priorities • Resist ‘tight’ mission control • Take advice from advisory groups, but don’t depend on them exclusively • Use holistic approach • @ NCAR, unplanned for example • Arctic System Reanalysis – NSF sponsored research critical to assess the changes happening in the Arctic • Need controlled access to first prototype data – We do this!
How to make and keep the archive content relevant to the users? Sustaining for the long-term • Richness and data value grow over time • Data assets tend to compliment each other – add value to many different research questions • Scientific publications lead to broader and increased interest • Definitive data citation is a work in progress • Staffing needs to be base/core funded • Grant directed funding can lead to a fractured, ad hoc, incomplete archive • Can be a major frustration for users • @ NCAR – the Research Data Archive • Began 40+ years ago • Today sustained by 9 persons
How to make and keep the archive content relevant to the users? Collaborations • Participate/volunteer for committees and panels that tackle data issues (all sorts) • Learn from others, share knowledge • Share efforts and data with other organizations • No one group can do it all (don’t have resources and all expertise required) • @ NCAR (conf. like SIDC for EES) • Volunteerism: NAS, AMS, NOAA, WMO, NASA • National and International data agreements with: • European Centre for Medium Range Forecasting • Japanese Meteorological Administration • U.S. National Weather Service, National Center for Environmental Prediction
How to Engage the Users? Data Discovery – how can people find you? • All 600+ RDA Datasets have metadata in GCMD • Automatically, exported via OAI – PMH • Similarly: RDA > CDP@NCAR > BADC in UK
How to Engage the Users? Design your portal to evolve – it will/should • 2002 • Search • Navigation • List of menus • Unique layout of links • Picture of people
How to Engage the Users? 2008 • Search • Two ways • Navigation • Links • News • Text • People
How to Engage the Users? • Primary design feature for web portal • Data Discovery – Find Data! • 2010 • All about search • Gone from top • people • text • news NCAR-CSM Symposium on Climate and Energy
How to Engage the Users? Navigation once they arrive • Working principles • Uniform across web portal • Keep organizational elements out of prime visual territory • @ NCAR • Have user registration – only required to get data • All discovery metadata open – unlimited searching
How to Engage the Users? The complete data knowledge package, and data cycle • What is a complete data knowledge package? • Rich metadata plus the data files! • One example • http://dss.ucar.edu/datasets/ds277.0/
How to Engage the Users? The pieces that make rich metadata • Dataset navigation (Access, Documentation, Software) • Title • Summary
How to Engage the Users? The pieces that make rich metadata • Period of data record • Update cycle • Scientific parameters (Variables) • Earth reference levels
How to Engage the Users? The pieces that make rich metadata • Times – temporal increment • Data types – points or grids • Geo-spatial coverage • Source organizations
How to Engage the Users? The pieces that make rich metadata • Related Internet sites • Publications • Acknowledgement statement
How to Engage the Users? The pieces that make rich metadata • Volume – size of the dataset • Data formats • Related datasets in the NCAR collection • Consulting contact (email and phone) • A 2nd pointer to Data Access
How to Engage the Users? The complete data knowledge package, and data cycle Data Cycle Facts • Datasets are re-published – new versions. • Datasets are corrected and extended in time or space. • Scientific analysis and publication will occur randomly along the data cycle. Data referencing is more challenging than traditional publication referencing because of the data cycle. How can you accurately trace/recover what has been used for publication?
How to Engage the Users? The complete data knowledge package, and data cycle • @ NCAR • Don’t have systematic (organization-wide) way to handle the data cycle • We do not discard/delete old versions of data • Ad hoc approach • Currently, building a version tracking software • Versioning will be included in DOI implementation
How to Engage the Users? Consultation Critical two-way communication • 1. Benefits for the user • Guidance to best available datasets • Consolidate research ideas into required data sources • Software assistance • Customized data preparation if necessary • 2. Benefits to the archive stewardship • Detect ways to improve our search process • Learn about data requirement trends • Occasionally, acquire new data resources from scientific efforts • Learn about data problems we might have
How to Engage the Users? Provide research tool support and documentation • Provide users a starting point for data evaluation • Simple access programs – the languages used by the focus community • Pointers to applications (IDL, MatLab, NCL, NCO, etc.) • Specific example are VERY helpful! • Must maintain software/applications and documentation for the long-term. • Guarantee users will understand the meaning and have access.
How to Engage the Users? Provide research tool support and documentation • @ NCAR • Remain aware of proprietary software taps, • E.g. for documents • will .xls be viable 50 years from now - .xlsx is now standard? Is .pdf any better? • Prefer data file formats that define everything to the byte/bit level • Computer code could always be written to access these. • All kinds of reports, project descriptions, and documents that explain the intent of the data are vital for the long-term. • Use dedicated document directories for each datasets
How to Engage the Users? Follow-up aid • Notification service for significant dataset changes • If an error is corrected – should notify all users of the data • Subscription service • Inform users when new data is available • Prepare special products based on user determined template – e.g. past requests • @ NCAR • We have automated notification service • Provided users register accurately • We do not have subscription service - yet
How to make and keep the archive content relevant to the users? • How to engage the users? http://dss.ucar.edu/