470 likes | 577 Vues
Ecoinformatics Workshop Summary. SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM. Topics covered. Grid networks – Ecogrid Workflow systems – Kepler / Ptolemy II Metadata compilers – Morpho Databases – MySQL, MetaCAT, DBDesigner QA/QC – SAS, S-Plus, Access, Excel
E N D
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM
Topics covered • Grid networks – Ecogrid • Workflow systems – Kepler / Ptolemy II • Metadata compilers – Morpho • Databases – MySQL, MetaCAT, DBDesigner • QA/QC – SAS, S-Plus, Access, Excel • Interactive & Dynamic web sites – DreamWeaver
SEEK EcoGrid • Goal: standardize interfaces (using web and grid services) • We have standardized data via EML • Integrate diverse data networks from ecology, biodiversity, and environmental sciences • Grid-standardized interfaces • Uniform interface to: • Metacat, SRB, DiGIR, Xanthoria, etc. • Anyone can implement these interfaces • Hides complexity of underlying systems • Metadata-mediated data access • Supports multiple metadata standards • EML, Darwin Core as foci • Computational services • Pre-defined analytical services • On-the-fly analytical services
EcoGrid client interactions • Modes of interaction • Client-server • Fully distributed • Peer-to-peer • EcoGrid Registry • Node discovery • Service discovery • Aggregation services • Centralized access • Reliability • Data preservation
Kepler: scientific workflows EML provides semi-automated data binding Scientific workflows represent knowledge about the process; Kepler captures this knowledge
Metadata what are they? and why should they be created?
Metadata Example In front of you are two tuna cans. How do you decide which one to buy?
Metadata Example Metadata helps you decide which one to get !
Ecological Metadata Language • Adopted by the LTER Information Management • Metadata specification developed by the ecology discipline for the ecology discipline • Based on prior work of Ecological Society of America and others (Michener et. al., 1997) • Seven years in development – 14 versions • EML 2.0.1 • Implemented as an XML Schema • Supports four separate modules • Dataset • Citation • Software • Protocol
Associated Metadata • Data Set • Data Table • Xml files
Morpho • provides a way for ecologists to share data by defining a common structure to document their data • uses an XML format to create the common structure.
Morpho – entering metadata Again, chose from the earlier entries, another, data package or enter new information
Morpho - metadata Once data is up loaded to Morpho you can edit data or metadata This is the window that press finish in the morpho wizard.
Databases • Small scale & on local computer – Access • Bigger & on server - MySQL
Example - why use a database? • Coordinate field data collection and data entry forms DATE SITE WEB PLOT QD SPECIES OBS COVER HEIGHT COUNT PHEN COMMENTS 2/3/1999 FPC 1 E 1 ERPU8 1 0.5 4 13 V NA 2/3/1999 FPC 1 E 1 ERPU8 2 0.1 2 16 V NA 2/3/1999 FPC 1 E 1 GUSA2 1 0.01 4 2 V NA 2/3/1999 FPC 1 E 1 GUSA2 2 0.1 5 1 V NA 2/3/1999 FPC 1 E 1 GUSA2 3 0.5 12 1 V NA
Database example DATE SITE WEB PLOT QD SPECIES OBS COVER HEIGHT COUNT PHEN COMMENTS 2/3/1999 FPC 1 E 1 ERPU8 1 0.5 4 13 V NA 2/3/1999 FPC 1 E 1 ERPU8 2 0.1 2 16 V NA 2/3/1999 FPC 1 E 1 GUSA2 1 0.01 4 2 V NA 2/3/1999 FPC 1 E 1 GUSA2 2 0.1 5 1 V NA 2/3/1999 FPC 1 E 1 GUSA2 3 0.5 12 1 V NA • Divide to 4 tables: • Location table • Species table • Visit table • Observation table
Database example Location Visit Observation Species
QA/QC QC • Designing data sheets • Data entry using • Validation rules • Filters • Lookup tables • Validate entered data • Double entry • Prior data • Filters
QA/QC QA • Graphics • Box plots • Scatterplots • Normal probability plots • Formal statistical methods • Grubbs’test Edwards 2000
QA/QC The goal of QA is NOT to eliminate outliers! Rather, we wish to detect unusual & extreme values.
µ + 3σ µ µ - 3σ
What did I learn? • Know your subject. Have a plan. • Some planning (little time) in advance will save a lot of head-ache (and time and money and missed opportunities) later. • Unorganized data might become a quick way to wall yourself off the increasingly collaborative and computerized research world.