380 likes | 586 Vues
Database Tables. Review of Definitions. Database: A collection of interrelated data items that are managed as a single unit. Database Management System: The software application that organizes and retrieves data. Common DBMS’s are Oracle, Microsoft SQL Server, DB2, MySQL , and Access.
E N D
Review of Definitions • Database: A collection of interrelated data items that are managed as a single unit. • Database Management System: The software application that organizes and retrieves data. Common DBMS’s are Oracle, Microsoft SQL Server, DB2, MySQL, and Access. • Relational Database: A database consisting of tables which follow the rules specified by E.F. Codd.
More Definitions • Record: A horizontal row in a table. “Record” and “row” are used interchangeably. • Field: A vertical column in a table. “Field” and “column” are used interchangeably.
Table: A technical Definition • The term “table” is loosely used to refer to any spreadsheet-like grid of data. • In a relational database, however, “table” refers to the place in memory where the data is actually stored. • All DBMS’s offer different ways to look at the data stored in tables, and these frequently look like tables. • But officially, a table is the place where the data is actually stored—not just a view of the data.
Table Example • Here’s a table of the teams in this year’s Tour de France. • It contains all of the teams, and four fields of information about them.
View Example • A view looks like a table, but it is not the place where the data is stored: • In this view, we are only showing three of the 22 teams, and only two of the four fields from the Teams table. • It would be wasteful and confusing to store this information directly in the database along with the table it came from.
To Repeat • In a relational database, a table is where the data is actually stored. • Other displays of data may look like tables, but if they are not actually the data containers, they are views (or queries), not tables.
Entities • Definition: An entity is a person, place, thing, event, or concept about which data is collected. • Examples: • Person: Student, Employee, Voter • Place: City, County, State, Country • Thing: Vehicle, Part, Computer • Event: Game, Concert, Class • Concept: Political System, Team,
Attributes • Definition: An attribute is a unit fact that characterizes or describes an entity. • Suggest attributes for some of the entities listed earlier.
Entities, Attributes, Tables, Fields • In good database design, entities become tables, and attributes become the fields (columns) in those tables. • However, deciding whether something is an entity or an attribute isn’t always easy.
Entity or attribute? • Suppose a particular company assigns a company car to each employee. • That car might be considered an attribute of the employee. • However, a car is itself a thing about which data is collected: Make, model, license number, etc. This makes it an entity. • So: is car an attribute or an entity?
It depends • If you are just recording one identifier about the cars, such as license plate number, you can consider it an attribute. • If your database will include other information about the cars, you should treat “car” as an entity. • In general, entities have attributes; attributes do not.
Review Definitions • Entity: Person, place, thing, event, or concept about which data is collected. • Attribute: A unit fact that characterizes or describes an entity.
Primary Keys • I said before that a table in a database is where the data is actually stored. • However, it takes more than just storing data to make a proper database table.
It’s not easy being a table • Just containing data isn’t sufficient to be a table. • Consider the grid below. What might be wrong? • There is no way to tell the first three rows apart. • In a proper database table, each row (record) is different from all others. • The rows do not need to be unique in every column; rows can have identical attributes.
Good tables have primary keys • Definition: A primary key is a column or set of columns that uniquely identifies each row in a table. • A key is a way to identify a particular record in a table. The primary key is a field, or collection of fields, that allows for quick access to a record.
Candidate Keys • For some entities, there may be more than one field or collection of fields that can serve as the primary key. • These are called candidate keys.
Candidate Keys • All candidate keys can open the lock; • that is, they can uniquely identify a record in a table. • Consider a table containing U of M students. • What are some possible candidate keys?
Compound Keys • The primary key doesn’t have to be a single column. In many cases, the combination of two or more fields can serve as the primary key. For example, consider a table of all of the courses offered at the University. • The course number would not be sufficient to identify a single course; there are 101’s in many departments, and probably plenty of 373’s as well. • Department won’t work either; there are many courses in IOE, as well as Psychology, German, and Economics. • But when you combine department and course number, you have uniquely identified a course. • Therefore, those two fields together can serve as the primary key for the Course table. • Definition--Compound Key: A primary key consisting of two or more fields. • Definition—Simple Key: A primary key consisting of one field.
Choosing a Key • Picking/assigning a primary key can be tricky, and is a matter of some controversy. • Some books recommend that you always create a surrogate or artificial key; by default, Access does this for (or to?) you. • The Databases Demystified/Goodsell approach: • if the table already has a natural primary key use it; • if not, then use a combination of fields that guarantees uniqueness; • only if you can’t find a natural simple or compound key should you resort to creating a surrogate key.
“Natural” Keys • Many entities already have widely accepted and enforced keys that you can use in your database—these are termed “natural” keys. • Definition: A “natural key” is a pre-existing or ready-made field which can serve as the primary key for a table. • For employees, social security number is widely used. You can have two Tim Joneses working for you, but only one 123-45-6789. • For vehicles, you can use Vehicle Identification Number (VIN), a unique 17-character code that every vehicle manufactured is required to have. • Most “natural” keys aren’t really natural; they are simply artificial or “surrogate” keys that someone else created and that are now widely recognized.
Election Day • Databases Demystified spells out how to choose a primary key: • If there is only one candidate, choose it. • Choose the candidate least likely to have its value change. Changing primary key values once we store the data in tables is a complicated matter because the primary key can appear as a foreign key in many other tables. Incidentally, surrogate keys are almost always less likely to change compared with natural keys. • Choose the simplest candidate. The one that is composed of the fewest number of attributes is considered the simplest. • Choose the shortest candidate. This is purely an efficiency consideration. However, when a primary key can appear in many tables as a foreign key, it is often worth it to save some space with each one.
Surrogate Keys • Occasionally, you will come across a table that has NO candidate keys. • In this case, you must resort to a surrogate key (Databases Demystified refers to this as an “act of desperation”) • Definition: A “surrogate key” is a field (usually numeric) added to a table as the primary key when no natural keys are available. • When you have to resort to a surrogate key, don’t try to put any meaning into it. • The values in surrogate key fields frequently become important identification numbers: Social Security Numbers, Student IDs, VINs, SKUs (in stores), etc. They are frequently associated with some sort of ID card or tag.
Surrogate Key Example • Suppose you run a chicken farm with lots of chickens. • You want to be able to track the chickens—when they were hatched, weights at certain ages, how many eggs they lay, etc. • The chickens don’t have names, social security numbers, uniqnames, or any other simple way to tell them apart. • Therefore, you put a numbered tag on each chicken. This number becomes the surrogate key in your Chickens table.
Natural vs. Surrogate Keys • Using a surrogate key when a natural key is available can defeat the whole purpose of having a primary key. This example shows how using an AutoNumber primary key allows duplicate entries to be made (StaffID is the primary key): • If SocialSecurityNumber were the primary key, this duplicate entry would not be allowed by the DBMS.
Midterm Material! Formatting Conventions • Tables should generally be named as plural nouns, with the first letter capitalized: Customers, Orders, Products, etc. • Table names should not have any spaces or punctuation in them. The following are bad table names: Bob’s Customers, Back Orders, hours/day. (Details later.) • When designing a table, the primary key field or fields should be at the far left of the table. • It is common to see the primary key field(s) highlighted in some way—underlined, bold, with an asterisk, etc. • In online quiz 2, we highlight the primary key in orange, like this:
Rules for Table and Field Names • No Spaces! Access will allow you to name a table “My Company’s Employees”, but this causes many problems down the road, especially when interfacing with VB. • “Employees” would be a much better name • The whole database probably relates to “My Company”, so there’s no need to include that in the table name. Even if there were—NO SPACES!!!!!
Rules for Table and Field Names • No punctuation: Most DBMS’s (except for Access) won’t allow any punctuation in the name of a field or table except the underscore (_). • I’m not a fan of underscores, either—they tend to be obscured by hyperlinks and such. • For this class, if your table or field needs a multi-word name, use InteriorCapitals.
Access Keywords • Access has many keywords with special meaning that cannot be used as field or table names. • A complete list is here.
Basic Table Design • “Designing a table” means defining the fields (columns) in the table. • Adding rows (data) to a table isn’t designing it—the proper term for adding rows is “populating” the table.
Defining Fields • In Excel, designing an column is easy—you just type a word or two in the cell at the top of a column. • For a true relational database, designing columns takes more work. You must give a column a legal name (which varies somewhat between DBMS’s), a data type, and sometimes some constraints as well.
Designing a “Uniqname” Column • If you were creating a table of UM students in Excel, you might just type the word “Uniqname” in cell A1 to get started.
Relational Database Column Design • In a true relational database, you would define the column this way: • The name, “Uniqname”, is valid in every DBMS I know about, so you can use that. (Valid field names are discussed later.) • The data type would be Text in Access (some DBMS’s might call it something else—String or Varchar, for example) • Constraints could include: • Minimum of 1 character • Maximum of 8 characters • Letters only; no numbers, spaces, or punctuation
Practice • Which one of the following is a good table name? • IOE 373 Students • University • Stores • Miles/Gallon • Which one of the following is NOT a good table name? • Women • Order Details • OrderDetails • Locations
Foreign Keys • Definition: A foreign key is a field (or fields) in a table that is not the primary key in that table, but IS the primary key in another table. • Foreign keys are used for creating relationships (linking tables together), something we’ll look at in the next lecture.
Don’t Forget! • Good relational database design is about optimizing how the data is STORED, not how it is DISPLAYED. • Most “tables” you have seen—in books, in lectures, on the web—were probably optimized for display, not for storage. • Relational database tables are designed for consistency and to reduce redundancy. They are not designed for appearance. • When we learn SQL and Visual Basic, we will look at various ways to display the data stored in relational database tables.