Chapter 6: Relational Data Modeling -- CONCEPTS, PRINCIPLES & APPROACHES--

Chapter 6: Relational Data Modeling-- CONCEPTS, PRINCIPLES & APPROACHES-- Data Warehouse Fundamentals Paul Chen www.cs522.com (Please reference white papers on Data Modeling at Seattle U teaching materials website)

Topics • Levels of Modeling • Relational Data Modeling—What is it? Types of Models and Pre-Modeling Activities • Understand Terms and Terminology –Tool Demonstration • Conceptual Data Modeling: What, Why, When, Who? Activity Description • Logical Data Modeling: What, Why, When, Who? Activity Description • Physical Data Modeling- An Overview 7 Prototypingand RAD

Databases & Modeling Databases & Modeling Type of Database New Trend Constructs Characteristics Relational Database ERD & EER Row/ Column Dimensional Modeling OLAP DW Multi-Dimensional Database Cube Distributed Component Object Model Distributed Database Client Object (DCOM) XML UML Object-Oriented Database Object Class Diagram Object = Data + Operations(Services); Entity = Data only

Topic 1: Level of Modeling Descriptive: The dealer sold 200 cars last month. Primarily Two Dimensional Database System Operational (OLTP) Explanatory: For every increase in 1 % in the interest, auto sales decrease by 5 %. Star Schema Cube Traditional DW (OLAP) Predictive: Predictions about future buyer behavior. Data Mining Cube + sophisticated analytical tools

Level of Analytical Processing Explanatory “WHAT IF” PROCESSING ANALYZE WHAT HAS PREVIOUSLY OCCURRED TO BRING ABOUT THE CURRENT STATE OF THE DATA Predictive Descriptive SIMPLE QUERIES & REPORTS DETERMINE IF ANY PATTERNS EXIST BY REVIEWING DATA RELATIONSHIPS Statistical Analysis/Expert System/ Artificial Intelligence Normalized Tables + Dimensional Tables Classification & Value Prediction Roll-up; Drill Down Query

DESCRIPTIVE MODELING Relational Data Modeling using ER Diagram • Conceptual Data Model (Analysis - Requirements Gathering; What’s it?) • Logical Data Model (Design-How is it?) • Physical Data Model (Implementation)

EXPLANATORYMODELING • Also calledDimensional Modelling (to be discussed in chapter 7) • Ways to derive the database component of a data warehouse • Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables.

PREDICTIVE MODELING (to be discussed in chapter 10) • Similar to the human learning experience • Uses observations to form a model of the important characteristics of some phenomenon. • Uses generalizations of ‘real world’ and ability to fit new data into a general framework. • Can analyze a database to determine essential characteristics (model) about the data set.

Topic 2: Relational Data Modeling-What’s it? A data model is a collection of constructs, business rules and sample data which together supports a dynamic representation of real world objects and events. • Constructs: entity-relationship diagrams (conceptual & logical or functional) and tables • Business rules: constraints such as referential integrity rules and operators (add, update, delete) • Sample Data for verification and prototyping: Verifying the accuracy of the model.

Objectives • Testing the real system before building it. • Assisting in understanding an organization’s data requirements. • Facilitating physical data base design.

Types of Models • Conceptual Data Model (Analysis - Requirements Gathering; What’s it?) • Logical Data Model (Design-How is it?) • Physical Data Model (Implementation)

Pre-modeling Activities • Data collection phase • Facilitation techniques (for ex. JAD session) • Roles and responsibilities • Tools and repository • Naming standards • Modeling convention (What methods to use) • Data protection/backup and recovery procedures

Data Collection Phase * • Sampling and Investigating Hard Data • The Needs for Sampling: Containing costs; Speeding up the data gathering; Improving effectiveness; Reducing bias • Sampling Design -Four steps: 1. Determine the data to be collected or described 2. Determine the population to be sampled 3. Choose the type of sample 4. Decide on the sample size * Please review chapter 5

Kinds of Information Sought in Investigation Types of hard Data (other than interviewing and Observation) • Memos • Signs on bulletins boards or in work areas • Corporate Web sites • Manuals • Policy handbooks • Record layout

Retina Scan “That recent Tom Cruise movie, Minority Report, shows advertising that targets each individual consumer as they pass by the signage. That’s the extreme, but I can see it going that way,” said St. Denis.

Five Steps in Interview Preparation • Reading background material • Establishing interview objectives • Deciding when to interview • Preparing the interviewee • Deciding on question type and structure

Two Types of Questions – Open-End Questions vs. Closed Questions Closed interview questions Such as “ How many subordinates do you have? Benefits: • Getting to relevant data • Keeping control over the interview Drawbacks: • Failing to obtain rich detail • Intimidating the interviewee

Three Basic ways of Structuring Interviews • Pyramid Structure: Starting from closed questions, then gradually expand into open territory. • Funnel Structure: The reverse of pyramid structure approach. • Diamond-Shaped: A combination of the two above structures.

JAD (Joint Application Development) • JAD sessions (also called facilitated session) are used to gather information and feedback and confirm the results of requirements gathering. • JAD sessions replace the traditional way of conducting a series of interviews on a one-to-one basis with the users. Advantages: Achieving consensus during the session when multiple sources of information exist, raising and addressing issues or assigning them for resolution, and immediately confirming information.

Topic 3: Understand Terms and Terminology • Independent entity • Dependent entity • Associative Entity • Identifying relationship • Non-identifying relationship

Understand Terms and Terminology • Identifier An attribute distinctly identifies each occurrence of an entity. For ex., bank account Id. , and student Id. • Association (relationships) An association is a relationship between two or more entities. Employee works for company Part has item

Understand Terms and Terminology • Cardinality(the form of relationship) Associations occur in there forms: one-to-one; one-to-many; many-to-many • Tables A table is a two-dimensional representation of data consisting of columns and rows. • Primary Key Used to identify entities. Unique identification for a row in a table. Allow no nulls and no duplicates. May be system assigned.

Understand Terms and Terminology • Foreign key A foreign key is one or more data elements whose value is based on the primary identifier of another entity, thus allowing the system to ‘join’ and get related information from other entities. The ‘joining’ of different entities in this manner eliminates the need of data repetition and redundancy. • Normalization A technique to make sure that the data in a logical model is defined once and only once. Normalization helps minimum data redundancy, and minimize update abnormalities.

Topic 4: Conceptual Data Modeling: What? What is it? 1. It is a conceptual representation of data without concern for its logical (functional) or physical aspects. 2. It is a set of high-level business data models which provides a framework for the data modeling activities at the next level.

Conceptual Data Modeling: When? When should it be done? 1. In support of the data requirements of a process model under development at the corresponding level. or 2. Outside the system application lifecycle on a department, division, or company wide basis.

Conceptual Data Modeling: Who? Who should do it? 1. The group responsible for assuring that data structure reflects business policies and rules. 2. It should be a joint effort between the owners and custodians of the data, the users of the data, and the analysts.

Conceptual Data Modeling: Why? • Documents the type of data (information) which must be represented in a system independent of specific application, organizations, or technology. • Maximizes data sharing; minimizing redundancy. • Provides foundations for physical database design. • Describes the unique business enterprise specifically.

Conceptual Data Modeling: Why? • Outside of application life cycle on a company-wide basis. • Data modeling expresses inherent associations which are the most part, independent of anyone one application. • Data entities change very little even through the way they are used can change for each application. • A complete maintained conceptual data model should shorten the requirements definition phases of system development life cycle.

Data Modeling: Approach • Data partitioning Use a top-down approach to define the data requirements of a system. The purpose is to divide and conquer (from subject to entity), and to evolve from the conceptual level to logical level until physical database is derived. • Standard deliverables For each of the levels, there is a set of standard deliverables that must be produced. The documentation items must be well defined so that the data at each level is well understood.

Data Partitioning Via Modeling (How) (What, Why, Who, Where) Subjects Conceptual Level Technical considerations Entities Relationships Logical Level Data Elements Frequencies Physical Level Data Definition Language -DDL(create, Alter, drop tables) Data Manipulation Language (select, insert, delete, update)

Conceptual Data Modeling --Activity Description 1. Define the system boundary – Data Context Diagram. 2. Partition a subject into entities. 3. Discover entities -super-type and subtype; part and whole. 4. Define associations between entities. 5. Define major attributes. 6. Define unique identifier for the entity. 7. Assign cardinalities and set up relationships between the entities. 8. Validate the model with the users.

1. Define the System Boundary by Subject Context Diagram • A subject (a group of entities with strong affinity) is a class of data objects representing the mission and resources of the organization. • “Subjects” provides mechanisms for controlling how much of a model a reader (user, analyst, manager) is able to consider and comprehend at a time. For a small system, go directly to define entities. For ex: Library (subject) decomposed into book, member, and account. (entities).

ATM ERD –Subject Context Level Customer ATM uses Has Owns Bank Consortium Account Affiliated Bank Holds Consists of

Data (Subject) Context Diagram • A Data context diagram is a special case of ERD in which a single diagram represents the problem domain in terms of data requirements. For example: The ATM diagram illustrates and highlights Several important characteristics of the system: • The people and organization with which the system communicates. • It documents the significant connections (relationships) between the data objects within as well as outside the problem domain.

2. Partition a Subject into Entities Criteria for partitioning a subject into entities are: 1. The entities included in a subject all tend to describe the subject and have a strong affinity with the subject. 2. The entities of a subject should be of equal importance, as measured by the range, complexities or importance of their data. 3. Each entity should belong to only one subject.

3. Discover Entities • Identifying data objects via business event analysis An individual stimulus from one data object to another is an event. Events are discovered by investigating the external influences that act upon a system, and the data transformation that occurs within a system that converts inputs into outputs.Thus, source and target data objects that interact with an event can be identified. For example, an event “customer buys a product” as depicted below is initiated by the data object “customer” who requires a response from the data object “sale”, “product”, and “payment”.

Customer Sale Customer buys A Product Payment Product

Types of Entities To find potential entities (entities are nouns), look for: • Objects can be generalized or specialized • The entities of a subject should be of equal importance, as measured by the range, complexities or importance of their data. • Each entity should belong to only one subject

Subtype and Super-type Faculty Full Time Faculty Part Time Faculty Part (day) Time Faculty Part (Night)Time Faculty

Generalization & Specialization Aircraft Specialization Generalization Commercial Military 747 777 B52 B-1B

4. Define AssociationsBetween Entities Use a verb to describe associations between two or more entities that the user wants to keep track of. Each relationship must be specific as possible so that its meaning is clear. An individual owns a building. Owns: possession, rental; or management?

5. Define Major Attributes All the attributes of an entity must have meaning for each and every one of the occurrence of the entity. Only elementary data are included in the model. Attributes resulting from process algorithms should not be included in the model. For ex. Entity ‘individual’ has attributes such as name, address; sex (male or female); no. of dependents, etc. But Derived attributes (such as percentage) are not attributes.

6. Define Unique Identifier for the Entity This is an attribute that unambiguously identifies each occurrence of each entity. Some entities do no have their own identifier. These entities are qualified as dependent or week entities. The identifier of the entity to which the dependent entities are associated with must be used to uniquely identify their occurrences. For ex., a ‘child’ entity must have his or her parent identifier to be uniquely identified.

7. Assign Cardinalities & Set Up Relationships Between the Entities Cardinality is the minimum and maximum number of times an occurrence of an entity occurs in relationship to another entity. The minimum number: 0 or 1 The maximum number: 1 or M There are three types of associations: One-to-one; one-to-many; many to many.

Types of Relationships By Degree (# of Attributes the Relation Contains) The relationships (representing business rules) could be either as binary, recursive, or ternary. Doctor Patient Binary Part Order Ternary Part/Order Recursive Organization

Recursive Association A recursive association is one in which there is a relationship between An entity and itself. Information Dev Accounting Engineering Payable Product Receivable Facility Nuclear Coal

8. Validate the Model with the Users • The approach requires that the users be involved at the outset of the data modeling activity until the end of its implementation. They work side-by-side with system developers, providing input and validating the accuracy of the data requirements. This will ensure that the delivered end-product meets the users’ need.

Validate the Model with the Users • To identify and document the integrity constraints given in the user’s view of the enterprise. This includes identifying: • Required data • Referential integrity • Attribute domain constraints • Enterprise constraints • Entity integrity

Validate the Model With These Check Points • Attribute allocation • Rules followed • Cardinality necessity • Relationship necessity • Achievement of organization goals • Contributions to expected benefits

Topic 5: Logical Data Modeling: What, Why, When, Who? And Activity Description What is it? • It is a representation of data required to support the complete business needs for a particular business area, system or project. • It is a set of data models that provides a framework for the physical database construction activities. • It is a graphical representation of data objects that shows the relationship between the tables, views and functional core services used by modules in the application system.

Chapter 6: Relational Data Modeling -- CONCEPTS, PRINCIPLES & APPROACHES--