Fundamentals of Relational Databases: Entities and Tables Schema

SQL Unit 1An Introduction to Relational Databases Kirk Scott

1.1 Entities, Tables, and Primary Keys • 1.2 One-to-Many Relationships and Foreign Keys • 1.3 One-to-One and Many-to-Many Relationships • 1.4 An Introduction to Data Types • 1.5 Nulls and Integrity

1.1 Entities • 1.1.1 Definition of an Entity • The term "entity" refers to any individual item that can have information stored about it in a database. • This may be a person, a thing, or some sort of abstraction that doesn't have a physical existence.

1.1.2 Describing Entities by Means of their Characteristics • In order to have information about an entity stored in a database, it has to be possible to describe it using values which may be words or phrases or numeric quantities.

1.1.3 Storing Information about Entities in Tables • Information about entities is stored in rectangular tables. • A table may contain information about more than one entity of the same kind. • If there are multiple rows in a table, then you are storing information on that many different entities. • The information about each individual entity is contained in a single row in the table. • The values which describe that entity are contained in the columns of that row.

Suppose you decide that people are entities that you want to store information about. • This is an illustration of the general idea:

1.1.4 Table Schema Notation • Table schema notation can be used to specify the name of a table and its fields without providing sample data. • Sometimes people are tempted to give tables plural names. • It turns out to be less confusing to talk about tables and their contents if the table names are singular. • The name of the table should describe a single instance of the kind of entity stored in a row of the table.

For example, if you want to store information about people, you would give the table the name "Person". • Here is the table specification using this naming convention and schema notation: • Person(SSN, name, dob)

1.1.5 Parallel Sets of Terminology for Relational Tables • Table, Row, Column • File, Record, Field • Relation, Tuple, Attribute

There are three parallel sets of terminology when referring to the structure of data mentioned above. • One set has already been used above. • Each row in the table contains information about one entity. • Each column contains a value describing a particular characteristic of the entity. • The characteristic described by a given column is the same for all of the entities in the table.

An older set of terminology, which is still used, refers to files, records, and fields instead of tables, rows, and columns. • Sometimes the term flat file is used. • This means that each record in the file has the same number of fields.

A more theoretical set of terminology refers to relations, tuples, and attributes, respectively. • The terms relation and tuple may be somewhat obscure, but the term attribute is very descriptive. • Each column in the table contains a value which describes an attribute of the entity in question.

In these notes all of these sets of terminology may be used at one time or another, and they may be mixed up, referring to attributes of records, or rows of files, for example. • You may also use the terminology indiscriminately when answering questions. • It's simply important that you know what the different terms mean.

1.1.6 Deciding on Entities in a Database • The designer of a database has to determine what entities will have information stored about them. • Entities are conceptual in nature. • For any given situation, there is not necessarily just one correct set of entities which will describe the situation. • Suppose you want to record information about mothers and children, for example.

You could regard both mothers and children as instances of persons, and decide that the underlying entity is a person. • You could also decide that mothers and children are distinct entities in your view of the world. • This second approach would have to be taken if the attributes you store for mothers and children are different.

1.1.7 Incorrect Design 1: Mixing Types of Entities in Tables • Suppose that you have decided to record different attributes for mothers and children and that they are conceptually different entities. • Suppose also that you show the relationship between mothers and children by putting the records for children after their mother's record. • It may be convenient or customary to show information in this way, but this is not a correct table design:

This is not allowed under the relational model. • The theory of relational databases is similar to set theory in math. • The rows in the tables are like elements in a set. • All of the elements have to be of the same kind.

If two kinds of entities are different, then they belong in different tables, even if they have the same number of attributes. • Each table has to contain information about one kind of entity only. • One table can't contain records for two different kinds of entities.

It is also true that the order of the elements of a set does not have any meaning. • Likewise, the order of the rows of a table has no meaning. • It may be convenient for users to see the rows sorted in some particular order, and it will be possible to display information in this way. • But intrinsically the order is immaterial.

If you enter data in a particular order, that can imply no relationship between rows; • also, if you happen to see the data displayed in a particular order, you can infer no relationship between the rows. • It is not permissible to store relationships between entities by means of their relative positions in tables. • Relationships can only be captured by means of attribute values.

1.1.8 Incorrect Design 2: Repeating Fields or Multi-valued Fields • It may also be convenient to show information as outlined below, but this is also incorrect:

This is not allowed under the relational model. • Different mothers may have different numbers of children. • This means that the number of attributes in a row for a mother could vary. • You may think that it would be possible to set a maximum number of children per mother and use that to set a fixed number of columns per row.

However, for any number you choose there are two undesirable results: • You may encounter a mother who has more children than the maximum; • and for all mothers who have less than the maximum you have lots of wasted space for information about children.

This example emphasizes the idea of a flat file. • The rows of a table can't be jagged. • They all have to contain the same set of attributes.

1.1.9 Incorrect Design 3: Concatenating Related Records • There is at least one more alternative design, which is incorrect. • It is worth taking a look at because it illustrates a different set of problems. • Suppose for the purposes of illustration that each mother had 4 attributes and each child had 2 attributes. • Then let the table be designed so that each row contained all of the information about one mother and one of her children.

For example:

The fundamental problem remains that information about more than one different kind of entity is being stored in one table. • The practical problem with this design is redundancy. • If a mother has more than one child, then the mother's data is repeated for as many children as she has.

This is bad for at least three reasons. • 1. It wastes space. • 2. If the mother's information ever has to be updated, more than one record may have to be updated, rather than just one.

3. And the most insidious problem of all is this: • By recording the same information more than one time, you open up the possibility that different values will be recorded for the same attribute in different places. • If this happens, it is clear that at least one of the entries is wrong, but it's impossible to tell which one. • This is known as a data integrity problem.

1.1.10 No Duplicate Records; Primary Keys • Relations are like sets in another way. • Duplicate elements are not allowed in a set, and duplicate rows are not allowed in tables. • This makes perfect sense. • What purpose would it serve to store the information about a given entity more than one time?

Another way of saying that there can be no duplicate records is that all of the records in a file are unique. • In other words, when taking the values in all fields of each record into account, no two records in the table contain exactly the same set of values.

It is also customary to have a single field which uniquely identifies each record. • That is to say, there are no duplicate values for that field in the whole table. • When recording information about people, their social security number is a good example of this. • No two people are supposed to have the same social security number. • If you know their number, you have positive identification and you can look them up and find out other information about them.

It is customary, but not required, to have the unique identifier be the first field in the table. • This field is called the primary key of the table. This picture, given previously, illustrates the general idea:

The schema notation for a table can be expanded to show which field is the primary key. One alternative is to underline the primary key field: • Person(SSN, name, dob) • Another alternative is to explicitly mark the primary key field by following it with the abbreviation p.k.: • Person(SSN p.k., name, dob)

1.2 One-to-Many Relationships and Foreign Keys

1.2.1 All Tables in a Database are Related • In theory, you could have a database consisting of one table, containing information about just one type of entity. • In practice, a database will store information about more than one type of entity and will consist of more than one table. • Each table in a database has to be related in some way to at least one of the other tables in the database.

Collectively, all of the tables have to be related to each other. • Informally, the first step in database design is determining the entities and attributes involved. • The second step is determining the relationships among the entities.

1.2.2 The Three Kinds of Relationships • There are three kinds of relationships that can exist between entities: • one-to-one (1-1), • one-to-many (1-m), • and many-to-many (m-n). • Each of these kinds of relationships can be captured in a relational database design.

1.2.3 The One-to-Many Relationship; ER Notation • It turns out that the 1-m relationship is the most basic one. • The mother-child relationship is of this type, and it can be illustrated using an entity-relationship (ER) diagram, shown below. • In this kind of diagram the tables are represented as rectangles and the relationship between the tables is represented by a line between them. • One end of the line is forked. • This is known as a crow's foot, and it is this end of the relationship which is "many":

Mother Child

It is also possible to include field names using this notation. • In this example mid stands for mother id and kid stands for child id. • These fields are the primary keys and they are indicated with the notation p.k.

Mother Child mid p.k. name … kid p.k. name …

1.2.4 Foreign Keys • Capturing the relationship between two tables depends on the use of what is called a foreign key. • A foreign key is a field in one table which happens to be the primary key field of another table. • Foreign key can be abbreviated f.k. • The way to capture a 1-m relationship is to embed the primary key of the "one" table as a foreign key in the "many" table.

Continuing to use ER notation, the example above can be expanded to show the 1-m, or primary key to foreign key relationship. • The mid, the primary key of the mother table, is embedded as a foreign key (with the same name) in the child table:

Mother Child mid p.k. name … kid p.k. name mid f.k. …

In some books the list of fields in the Child table wouldn't explicitly show the mid field. • They are relying on the crow's foot notation to indicate that the p.k. of the Mother table would be a f.k. in the Child table. • It is redundant, but probably clearer, to show all of the fields in each table explicitly.

The foreign key field doesn't have to have the same name as the corresponding primary key field. • As an illustration, in this example it would be possible to have mid p.k. in the Mother table and motheridf.k. in the Child table. • It is the notations p.k. and f.k. which represent the roles of the fields in the tables, not the names of the fields themselves.

Mother Child mid kid mid • Another way of illustrating this relationship is as follows:

In this representation the arrow graphically shows the primary key field being embedded as a foreign key field. • In some books the arrow might be shown going in the opposite direction. • When done in that way it represents the fact that values in the "many" table refer back to values in the "one" table. • The idea remains the same. • To capture a 1-m relationship, the primary key of the "one" table is embedded as a foreign key in the "many" table.

Fundamentals of Relational Databases: Entities and Tables Schema