CEOS IDN Task Team May 9, 2002
IDN Agenda – 9 May 2002 • IDN Minutes from Darmstadt and IDN Profile are at http://idn.ceos.org • Data Policy • IDN Metrics (from GCMD node) • IDN Content History • Content Strategies • IDN Keywords • Authoring Tools • MD8 Status • Break
IDN Agenda – 9 May 2002 • MD8 Status – continued • MD8 Software Waiting List • IDN Collaborations • Collaborations: Operational Portals • IDN’s Use of ZOPE for Communications • MD9 and ISO 19115 • Lorant Czaran on ISO 19115 • Issues/Concerns
Data Policy Issues • Global Change Research Policy Statements from the Executive Office of the President - OSTP in 1991 • U.S. Global Change Research Program requires an early and continuing commitment to the establishment, maintenance, validation, description, accessibility and distribution of high-quality, long-term data sets. • Full and open sharing of the full suite of global data sets for all global change researchers is a fundamental objective. • Preservation of data needed for long-term global change research is required. • Data archives must include easily accessible information about the data holdings, including quality assessments, supporting ancillary information, and guidance and aids for locating and obtaining data. • National and international standards should be used to the greatest extent possible for media and for processing and communication of global data sets. • Data should be provided at the lowest possible cost to global change researchers in the interest of full and open access to data. • For those programs in which selected principal investigators have initial periods of exclusive data use, data should be made openly available as soon as they become widely useful.
Data Policy Issues • National Academy of Sciences (U.S.) • National Research Council • U.S. National Committee for CODATA • CODATA 2002 - Frontiers of Scientific and Technical Data (29 September - 3 October) • CGED – Dr. Anne Linn • Dr. Bernard Minster, Chairman • Upcoming workshop on Carbon Cycle data. • International Policies
Combined NSC, DPC, NEC Climate Change Policy Panel (Program Review) Committee on Climate Change Science and Technology Integration Chair: Secretary of Commerce, Vice-Chair, Secretary of Energy Executive Director: OSTP Director Interagency Working Group on Climate Change Science and Technology Chair: Deputy/UnderSecretary of DOE, Vice Chair: Deputy/UnderSecretary of DOC Secretary: OSTP AD for Climate Science & Technology Climate Change Science Program Office Director: Commerce Detailee Climate Change Technology Program Department of Energy
Redirects to Other Data Top redirects from DIFs 20012000 • NASA data/web pages 631 244 • NOAA data/web pages 355 389 • EOSDIS DAAC data/web pages 349 290 • USGS data/web pages 329 972 • CDIAC data/web pages 40 90 • CCRS/CEONET/GeoConnections 34 n/a • International data/web pages 118 193 • Other data/web pages (various) 466 2185
Decline in Usage? • GCMD web usage has tended to be flat over the past year. • Prior to 9-11, usage was showing a +1.6%increase for the year. • Since 9-11, usage has declined. Overall GCMD usage has declined by 3% from the past year. • Numeric domains have increased by 18% over 2000, but .gov domains have declined by almost 47% since 2000. • FGDC Clearinghouse changed filtering of Isite queries. • Is decline due to increased information available (information saturation), 9-11, decline in interest on climate issues, other factors?
Decline in Usage? Web page hits have increased since Jan 2001, while # unique hosts has decreased. Possible reasons: • Domain contraction: more users on fewer hosting domains. AOL has 13.58% of global ISP market; more gov agencies using single domain (e.g., usgs.gov). • More users behind firewalls. • More hits are by robots? (we block them from DIFs but not web pages). • Fewer users are making more hits.
Who Links to the GCMD? GCMD is #1 on Google search for “global change” Week of April 1, 2002 • Top 10 sites that link to GCMD (from Google) • PODAAC • NSIDC • WWW Virtual Library – Meteorology • AADC Metadata page • LBNL Energy Crossroads Climate Change Page • WHOI COFDL Laboratory • The Weather Pointers Page • NOAA/PFEL • Yahoo! Environment and Pollution (French) • NASADAACS page • Quaternary Web Resources (Colby College)
Who Links to the GCMD? • Google’s top 10 sites that link to GCMD (pt 1) (Week of April 15, 2002) Google-ranked sites that are most often linked with links to GCMD • GES DAAC Direct Links to MODIS Data • http://acdisx.gsfc.nasa.gov/data/dataset/MODIS/nofrills.html • GES DAAC MODIS Overview • http://daac.gsfc.nasa.gov/MODIS/overview.shtml • BakerHughes Industry Links-Labs, Research, Gov • http://www.bakerhughes.com/bakerhughes/resources/labs.htm • PCI Geomatics Industry Links • http://www.pcigeomatics.com/corpinfo/ind_links.html • VGL Data Links • http://www.umich.edu/~vgl/booksdata/data.html • SeaWiFS Evaluation Products • http://daac.gsfc.nasa.gov/data/dataset/SEAWIFS/06_New_Products/
Who Links to the GCMD? • Google’s top 10 sites that link to GCMD (pt 2) • DAAC Alliance Products&Services Page • http://nasadaacs.eos.nasa.gov/data/path12.html • Harvard University Environment and Sustainable Development • http://www.cid.harvard.edu/esd/esdlinks/esdlinks.html • Blackwell Publishers - Geospatial Datasets • http://www.blackwellpublishers.co.uk/geog/data.asp • RPI/Rensselaer Research Libraries • http://www.lib.rpi.edu/dept/library/html/resources/subjects/science/earth.html • RSMAS Library Internet Resources • http://www.rsmas.miami.edu/support/lib/library_links.html
Past Content Struggle • Non-existent to poor authoring tools. • Inadequate operations facility for interacting with the database. • Science Coordinators had to gather all data set info through intensive, laborious process. • Little interest or cooperation by data centers or data set producers. • Result: Prior to 1994 - 3 DIFs/month/coordinator were written.
Present Content Strategy • Make improved authoring tools available. • Provide effective operations facility for validation/loading entries into database. • Provide capability to update “on the spot”. • Unsolicited entries now arriving - from data set producers, data center personnel, portal representatives, other international and interagency groups. • (Although still time-consuming to gather all information.) • Result: 35.3 DIFs/coordinator/month (April 2001 - March 2002)
Future Content Strategy • Make further improvements of authoring tools. • Further enhance Operations Client and QA facility for quality control and loading of entries. • Provide ownership of entries through portals and distributed nodes, and thus expect more contributions from partners. • Distribute final validation QA function beyond GCMD node - providing even more sense of ownership and responsibility. • Increase interest - sometimes initially by software developers and later by content providers.
Reasons for MD8 Operations Client – A Client User’s Perspective • One person performed all database administration tasks • Increased interest by partners to write and share metadata • Clumsy text-based interface introduced errors and increased maintenance
How MD8 Has Changed Our Mode of Operation for the Better • Database administration tasks shared by science coordinators. • Decreased time between submission of metadata and its entry into the database. • Allows users to perform tasks that previously required knowledge of command line Oracle SQL. • Eases the process of managing personnel and valids. • Graphical User Interface.
Operations (OPS): Extracting content from the database
Content Strategy - Using the QA Ops • You gotta’ know when to hold ‘em, • Know when to load ‘em, • Know when to walk away, • Know when to run. • You never count your DIFs when they’re only in the table... • There’ll be time enough for countin’ when the loadin’s done. • You gotta’ know when to bold ‘em, • Know when to fold ‘em, ...
Data Center Bucket Revision • Original list created for HCIL Interface. • Buckets not adequate for Science Keyword Interface. • Overlapping Buckets • Minimal Quality Control of Original Buckets • Science coordinators created new bucket list. • Staff is in the process of matching each Data Center valid to a new bucket.
NEW Buckets Academic Commercial Consortia/Institutions Multinational Non-Government Agencies Non-US Government US Federal Agencies US State and Regional Agencies Old Buckets Commercial DOC DOD DOE DOI EPA Federal Agencies Institutions International International Agencies NASA NOAA Non-Profit Organizations NSF Regional Agencies Universities USDA USGS World Data Centers Data Center Bucket Revision
Data Center Bucket Revision US Federal Agencies DOC NASA DOD NSF DOE USDA DOI USGS EPA DOT
Keyword Changes Guiding Principles: Follow the Rules! • Earth science parameters are a 4-tier controlled vocabulary for indexing and retrieving metadata. • Parameter hierarchy includes a 5th level uncontrolled “detailed variable”. • CATEGORY > TOPIC > TERM > VARIABLE > detailed variable • Example: EARTH SCIENCE > Solid Earth > Geochemistry > Chemical Weathering
Keyword Process • Keywords requiring modification can usually be modified through database operations so that • all DIFs affected are modified at the same time. • New keywords are simply added to the database and to the list of controlled keywords available in tools and interfaces. • Usually manual process to ensure existing DIFs are indexed with the new keyword.
Summary of Science Keyword Changes • Added 54 new Variables and 4 new Terms • Modified 39 Variable and 2 Terms • Currently: 1199 Variables in GCMD • Modified Marine Geophysics and Bathymetry Terms and Variables • Many keywords were not being used or could be re-classified under better Terms • Modified Terrestrial Ecosystem Variables from singular to plural (e.g., forest to forests) • Modified Marine Sediments Variables • Suggested by C. Moore at NOAA/NGDC/MGG • Change Term Solar-Terrestrial Interactions to Sun-Earth Interactions • Sun-Earth was more recognizable Term • More compatible with home page redesign - took up less “real estate” in keyword hierarchy.
Keywords Added • Added Marine Biology, Marine Geochemistry, Marine Tectonics, Marine Volcanism , and Sea Surface Topography Terms and Variables to Oceans • Added Land Use/Land Cover Term and Variables to Human Dimensions • Added Geomorphology Term and Variables to Solid Earth • Added Natural Hazards Term and Variables to Human Dimensions • Added Aquatic Habitat and Demersal Habitat Variables to Biosphere • Added Forest Science/Conservation Variables to Biosphere (Canada) • Added Snow Chemistry (NSIDC)
Who Suggested Keyword Changes in 2001? • GCMD Staff • EOSDIS DAAC/DAAC Alliance data providers • MSFC/GHRC • NSIDC DAAC • GSFC DAAC • SEDAC • ORNL DAAC • ECS Science Office • NOAA/NGDC (marine geophysics) • Canada/CCRS (forest science) • IODE (marine biology, oceans)
Community Usage of GCMD Keywords • CEOS Interoperability Protocol (CIP) • uses Category > Topic > Term • EOSDIS Data Gateway (EDG) • uses Topic > Term > Variable • EOSDIS Core System (ECS) • uses all 5 levels , including detailed variable • Other Communities using GCMD Keywords • FGDC (although not required); many agencies using FGDC metadata use GCMD keywords as “theme thesaurus” • Canada and GeoConnections • Mercury • U. Cal. Natural Reserve System • NOAA • Semantic web • NASA’s Visible Earth (part of Earth Observatory) • DODS
Keyword Process ECS and EDG Notification Policy • ECS and EDG are notified of GCMD-approved science keyword changes prior to implementation • Process gives ECS and EDG time to notify science and data teams as to potential software changes. New keywords added to the GCMD are usually not a problem. Modification of existing keywords is more problematic.
Authoring Tools Current Authoring Tools include: • DIFbuilder • DIFbuildlet • ModDIFbuilder • SERFbuilder • ModSERFbuilder • ESIP DIFbuilder • JCADM DIFbuilder • Usage of the Authoring Tools has increased from outside partners (DAACs, GLOBEC and AMD)