Outline of presentation • Data Management • Compare database • Versus spreadsheets, word processor docs, … • Relational Databases • Parts & Terms • tables, forms, queries, reports (we’ll skip reports) • fields, records, keys • Relationships • Linking Tables
1 1 2 2 3 3 Mature Forest A B 1 2 1 2 3 3 A Old Growth C 1 2 3 B 1 1 1 2 2 2 3 3 3 B C A 1 2 3 Clear Cut C
Data Management Issues • Organization! • Data Entry (error-prone process) • Quality Control – Quality Assurance • Metadata (possible data values, how collected, etc.) • Tracking specimens, samples • Data retrieval
Spreadsheet vs. Relational Databases • Relational Database • Data entry • Data storage • Data retrieval • Spreadsheet • Manipulating Data (eg. Pivot tables) • Summarizing & Presenting Data (eg. graphing) • (Formatting data for statistics programs)
Embedded Information Spreadsheets = “flat files” Databases = “multi-dimensional”
It is possible to sort and filter records in the spreadsheet (look under DATA in the menu bar). Filtering temporarily removes all unwanted records from view. This is also possible in a database, with some more sophisticated options available.
It is very easy to copy cells or entire rows of data in spreadsheets, but more difficult in databases (one of the few advantages of spreadsheets over databases). However, if one needs to copy-down a lot of data, then the database is not well “normalized” (discussed later).
It is easy to search for and replace words in spreadsheets. This is also possible in databases, but with more sophisticated search and replace options.
The “auto-fill” option in a spreadsheet completes a word it recognizes from entries immediately above the current one. In databases one can use a “lookup table” (discussed later) for a full list of values (eg. names), which might not yet appear in the data set.
Word processor files are the least capable of filtering, finding & replacing, and assisting data entry compared with spread sheets and databases.
Although with proper formatting a word processor document can look like a spreadsheet or database table, one cannot manipulate the rows and columns in the same way.
Archiving Data • As an aside, the best form to archive data in is tab delimited (.txt) or comma separated values (.csv) text files • Although programs and formats come and go, all database, spreadsheet, and word processor programs know how to handle .txt and .csv files
Comparing databases, spreadsheets, and documents 1 Auto complete is done in very different ways 2 Not linking in a true relational sense, except through a database 3 Properly set up (normalized) data can be back-filled
Tabulations • Matrix-style synopsis of data • “crosstab query” in MS Access • “pivot table” in MS Excel
Relational Databases • Four major components • Tables – these are where ALL data reside • Queries – select subsets of data (retrieve data) • Forms – “windows” into data tables (views of data) • Reports – summaries of data (formatted synopses)
Tables All data in relational databases reside in tables. Queries, forms, and reports are just convenient ways of looking at the data in the tables. As we shall soon see, the sizes and types of data that can be entered into a table can be regulated for better efficiency and error-proofing. And two or more tables that have a field in common can be linked to draw information from all related tables.
Some terminology: Each square is a “cell” of data
Another way to enter design view is to click on the table name once (so that it is highlighted), then click on the design view icon. Or right-click the table name and choose design view.
The DESIGN VIEW of a table is where one dictates the type and range of data that can be entered into each field. This can include formatting (such as capitalization), default values, and valid/non-valid entries.
“Lookup” is used to create a list of possible values that a field can take. This example uses a list of values in the field’s properties settings (in DESIGN VIEW). In DATA VIEW the field will have a drop down list of values (“Combo Box”). The full value will be filled in when the first letter is typed.
In this example the “lookup” is set to the list of species codes in the table “Species”
Miscellaneous • New records are always added at the end of the table (many people find this annoying) • “Esc” once to undo current typing • “Esc” twice to undo the whole record • Changes are saved when you move off the cell • No need to save the data in a database after any changes (formatting changes must be saved)
Linking Tables • Fields common b/w two or more tables can link • Keyed fields prevent duplicate entries • Keyed fields determine relationships b/w tables • Linked tables can reduce data entry and storage needs (using an idea called data normalization)
Keys and relationships • A keyed field is one that does not allow repeated values. • For example, if the field “Code Name” is keyed in a table, then the user would not be allowed to enter the same Code Name more than once (an error, “key violation” would appear). In this way, one constructs a list of unique values (eg. Code Names).
One-to-many relationship Because each Species Code is unique in the “keyed” Species table, and can be repeated many times in the Collections table, a “one-to-many” relationship is created between the two (indicated by the “1” and infinity). Referential integrity means that a Species Code cannot be entered into the Collections table if it is not in the Species table. Cascade Update allows one to change the species code once and propagate that change through Collections table. Cascade Delete deletes that species code in all tables that are connected. Use this feature cautiously. This is the “relationships view” of the database that allows the user to define which tables are linked and how. Keyed fields are in bold.