Data Management: Databases and Organizations Richard Watson

Data Management: Databases and OrganizationsRichard Watson Summary of Chapter 7 and Basic Structures prepared by Kirk Scott

Data Modeling and SQL • Chapter 7. Data Modeling • Reference: Basic Structures

Chapter 7. Data Modeling • The building blocks of data modeling should be familiar to you: • Entities • Attributes • Relationships • Identifiers (keys) • The next five overheads taken from chapter 7 review the ER notation for these things

A model is the starting point for creating a database • No table need be created before the model is complete • Quality of the data model is essential • The model should be well formed: It should follow the basic rules for entities, attributes, relationships, and keys • The following overhead summarizes the characteristics of a well formed model

A quality data model should be high-fidelity • This means that it has to accurately and completely model the situation in the problem domain • A model which is well formed but does not model the problem domain is useless from a practical point of view

The phrase “quality improvement” in the context of data models means this: • It is unrealistic to assume that a good data model can be created on the first try • A data model will evolve as technical mistakes are caught • More importantly, it will evolve as a result of interaction with users as the problem domain and requirements are more completely understood

The Stock Example • A simple data model for nations and stocks is given on the next overhead • Superficially, it seems OK • It could be verbally summarized as “Nations have stocks”

The book now introduces the following additional textual information • Stocks are listed on stock exchanges (a new entity) • A nation may have >1 stock exchange • A given stock may be listed on >1 exchange, but it has 1 home exchange

Stocks can be listed on the exchanges of >1 country • Notice that the abstraction of a listing is repeated in this description • That suggests that a listing itself will be an entity • The next overhead shows a revised model that takes into account the new assumptions

The Geography Example • Next the book gives a simple example that’s supposed to model the relationships between nations, administrative units (states), and cities • See the next overhead for a straightforward model of this

The book next observes that exceptions are the bane of a good model • If you presume to model these globally, then your model should accommodate all possible situations • The book asks, “How many errors can you find in the initial data model?” • See the table on the following overheads for answers

The next overhead shows a nations, administrative units, cities data model that has been revised to take into account these exceptional cases/errors in the initial model • This revised model may seem needlessly complex • However, the complexity is not needless • This is an accurate model of the situation that covers all cases • The initial model was insufficiently complex • It was wrong

The Women, Men, Marriage, and People Examples • This topic was brushed on all the way back in unit one • Capturing the relationships among people is a very common problem that leads to some familiar challenges and design/model choices • On the following overhead is an ER diagram of the relationship between married men and women

The foregoing model is obviously hilariously limited in the kind of relationship it can capture • In addition, the book points out the following characteristics of the model which might indicate that a different model would be better • 1. The labeling of the model indicates that this is a marriage, but there is nothing in the fields that spells this out • In particular, you might think that there would be a date field, a marriage license number, something among the fields that was specific to marriage

2. The Man and Woman tables have the same set of attributes, different only in their being name manX or womanX • This might suggest that we are dealing with one entity type, person, rather than two distinct entity types, man and woman

3. The last observation concerns the fields manoname and womanoname • These stand for “other” name • As the model stands, a person can only have one other name • Alternatively, if the other name field is text, it might be filled with multiple values—not an ideal solution • A complete treatment of people and other names might introduce another table so that there could be a one-to-many relationship between people and their various names

The book doesn’t solve all of these problems, but it does come up with a second model • If there were two types to begin with and you combine into one, you frequently get a new field in the result • A person now has a gender field • Also, the labeling of the relationship could be made more generic • See the following overhead

Next the book tackles the topic of multiple marriages • If you’re dealing with a Person table, then the table is in a many-to-many relationship with itself • To distinguish between multiple marriages, potentially between the same partners, beginning and ending date fields can be added to the table in the middle • See the next overhead for the third version of the model

In the long run, some sort of arbitrary numbering scheme might be desirable • A marriage license number might work, but the book points out that legally speaking it might also be desirable to record common law marriages • Notice in general that a lot of data integrity questions start to arise with a model like this • See the next overhead for the fourth version of the model

Next the book considers adding children to the model • Children are modeled as the result of marriage • Of course, this is not always the case • As long as the marriageno field in the person table can be null, the model accommodates that • Still, it doesn’t allow you to record who a person’s parents are if the person wasn’t the result of marriage • See the next overhead for the fifth version of the model

The person model could be developed even further • This model barely scratches the surface of the variety of human relationships • It is already moderately complex but could become more complex

A model is complete when it contains everything needed in practice for a given problem • The model is unsuitable if it isn’t complex enough • It is also unsuitable if it contains detail that isn’t needed

The Book Example • The book entitles this “When’s a book not a book?” • In other words, the example is an invitation to clarify what you mean when you refer to entities in a design • Are you referring to individual objects? • Are you referring to kinds of objects? • What elements of a design make it possible to distinguish between these meanings? • A simplistic initial design is given on the next overhead

The book observes that a library may have more than one copy of a book • You might be tempted to model this by adding a copy number to the book record • The problem with that solution is that the basic book information would be repeated for every copy • The solution is to treat a “book” as an abstract entity and a copy as a separate, concrete entity • Such a design is shown on the next overhead

You may have noticed that although the ISBN should be a unique identifier for a book (not a copy) it is not used as a primary key in these designs • The problem is that books before a certain date did not have ISBN’s • Also, you may have hand-crafted modern books that weren’t commercially published and don’t have ISBN’s

The Employment History Example • This example starts out simply enough • A given company has divisions • The divisions have departments • Departments have employees • This is shown in the ER diagram on the next overhead

Next, the author observes that over time a given employee may hold different positions • These positions may be in different departments • Like marriages, the distinguishing features of positions may include a beginning and ending date • This is shown in the ER diagram on the next overhead

Next the author introduces the concept of a payslip into the record-keeping that the model includes • It’s not fully fleshed out in the next example, but when you look at the diagram you may have an inkling that the treatment of payslips is reminiscent of the treatment of line items • This is shown in the ER diagram on the next overhead

Data Management: Databases and Organizations Richard Watson