Data Models for Ecological Databases
E N D
Presentation Transcript
Data Models for Ecological Databases John Porter Department of Environmental Sciences University of Virginia
File system-based Hierarchical Network Relational Object-oriented You’ve seen these before, now lets go into more detail DBMS Types
File-System Based Directory Files Files Files • very simple and easy to set up • inefficient • few capabilities
Project Datasets Investigators Variables Locations Codes Methods Hierarchical • Hierarchical • efficient • not very general • e.g. phylogenetic structures • geographical images
Network Database Projects Links are hard-coded into database. They are not a property of the data Datasets Locations • very flexible • unwieldy to modify • not widely used
Projects Location_id Data_id Datasets Locations Location_id Relational Database Linkages are through the properties of the data itself - not hard coded • widely-used, mature • table-oriented • restricted range of structures
Methods Object Data Structure Object Oriented • developing -few commercial implementations • diverse structures • extensible Complex data structures, along with the methods to use the data are in the database
Data Modeling • DBMS Systems are highly flexible • Good: they can do a lot! • Bad: they have to be told how to do it! • A Database Management System is the CANVAS, the DATA MODEL is the painting…….
Data Modeling • Data modeling is used to develop the database structures used in a database • Your data model effects • reliability of the data • efficiency and speed of queries • the complexity of the database • Data modeling is an art, not a science!
Some Terminology: Tables contain attributes or fields (columns) and multiple observations or tuples (rows)
Species Observation Genus Species Observer CommonName Date Flat-file Tables in boxes Attributes in ovals
Normalization • One widely-used approach for reducing errors within a database is to normalize your data structures • Normalization is the process of eliminating duplicate or redundant information
Spec_code Spec_code Observation Species Genus CommonName Species Date Observer Two-table Relational Database
Species Observations Specimens Images Locations Observers Internet Links Complex Data Model Notation: One-to-one One-to-many or
Personnel Projects Mailing Lists Dataset Dataset Locations Variable Variable Codes Data Model for Metadata at theVCR/LTER Optional Linkage Mandatory Linkage
“Beanstalk”& “String of Pearls” • Metadata • methods • units • Location Table • Lat/Lon
Beanstalk / String of Pearls • Highly normalized • Extremely flexible - capable of handling many different kinds of data • Inefficient • Queries can be very slow • Can require large amounts of space
Why is there no perfect data model for ecological data? • One of the reasons data modeling is an ART not a SCIENCE is that ecologists use data in many different ways • Data that is perfectly formed for one kind of analysis may be unusable for another • Different analytical software may be used
Why No Perfect Model? • Generally ecologists want to use data in “flat file” formats that combine all the tables containing data into a single, denormalized “spreadsheet”-type format- but even that format can vary between researchers • ClimDB needed to support single parameter and multiple parameter formats to meet researcher needs