Chapter 6: The Relational Data Model

Chapter 6:The Relational Data Model Data Modeling and Database Design

The Motivation for Logical Data Modeling • Completion of conceptual modeling phase results in reasonably clear picture of data requirements for application system at high level of abstraction • Note: conceptual data model is technology-independent • During conceptual modeling, analysis and design activity is not constrained by boundaries of anticipated technology that will be used for implementation lest the richness of design will be compromised • Conceptual schema may contain constructs not directly compatible with technology intended for implementation

The Motivation for Logical Data Modeling (continued) Sometimes further refinement may be required to eliminate data redundancy in design Transforming conceptual schema to state better compatible with implementation technology of choice is achieved via logical data modeling Logical data modeling phase serves as transition from technology-independent conceptual schema to technology-dependent design 3 Chapter 6 – The Relational Data Model

The Relational Data Model • E. F. Codd in 1970 used the concept of mathematical relations to define the relational data model • Database = collection of relations • Relation = two-dimensional table • Row in the table = related data values = tuple • Column in the table = attribute • Set of all tuples in the table = relation • Relation consists of a heading and a body • Heading = relation schema, schema, intension, relvar • Body = extension • Domain = set of possible atomic values of an attribute

A Technical Definition of a Relational Data Model • If r is relation whose structure is defined by set of attributes A1, A2, …, An, then R(A1, A2, …, An) is called relation schema of relation r • An attribute, A, is ordered pair (N, D) where N is the name of attribute and D is the domain that named attribute represents • Alternatively, an attribute represents a ‘named role’ of a domain • Domain of Ai (i = 1, 2, …, n) is often denoted as Dom (Ai) • Relation schema, R, is named collection of attributes (R, {C}) where R is name of relation schema and {C} is set { (N1, D1), (N2, D2), . . . . . (Nn, Dn) } where N1, N2. …, Nn are distinct names • r is relation (or relation state) over schema R

A Technical Definition of a Relational Data Model (continued) • Relation state r of relation schema R(A1, A2. …, An), also denoted as r(R), is set of n-tuples {t1, t2. …, tm} • Each n-tuple tj (j = 1, 2, …, m) in r(R) is ordered list of n values <v1j, v2j, …, vnj> where each vij (i = 1, 2, …, m) is element of Dom (Ai) (i = 1, 2, …, n) or when allowed, a missing value represented by special value called null • Number of attributes (n) in R is called degree (or arity) of R • Number of tuples, m, in relation state is called cardinality of relation

A Technical Definition of a Relational Data Model (continued) Relation schema is sometimes loosely (and incorrectly) referred to as a relation C. J. Date (2004) has coined the term relvar (abbreviation for relation variable) to distinguish relation schema from relation Relational schema defines set of relation schemas in relational data model Important to note difference between relation and relation schema as well as relation schema and relational schema (also known as database schema) 7 Chapter 6 – The Relational Data Model

Characteristics of a Relation • A relation: • Is equivalent to a two-dimensional table • Has a heading and a body • Attributes of relation schema have unique names • Values of attribute in relation come from same domain that has no null values • Attributes of relation cannot have null values • Order of arrangement of tuples is immaterial

Characteristics of a Relation (continued) Each attribute value in tuple is atomic; hence, composite and multi-valued attributes are not allowed in relation Order of arrangement of attributes in relation schema is immaterial as long as correspondence between attributes and their values in relation is maintained Derived attributes are not captured in relation schema All tuples in relation must be distinct (i.e., relation schema must have unique identifier) 9 Chapter 6 – The Relational Data Model

Characteristics of a Relation (continued) Example

Universal Relation Schema Defined • Universal Relation Schema (URS) assumption dictates that every attribute name must be unique because attributes have global meaning in database schema • Therefore, if attribute name appears in several relation schemas, all of these denote same meaning • In ER model, however, same attribute name is allowed to appear in different entity types since they imply different roles for the attribute name • Thus, mapping of attributes from ER model to logical schema requires careful attention in order to ensure unique attribute names in logical schema • Note: referencing foreign key and corresponding referenced primary (or alternate) key having the same attribute name in a logical schema does not violate the URS assumption

Naming Conventions • Since relational database theory stipulates that attribute names must be unique over entire relational schema, the following guidelines represent the approach used in this chapter for developing attribute names: • Each attribute name begins with up to a three-letter prefix that represents a meaningful abbreviation of the name of relation schema to which attribute belongs • This prefix is followed by an underscore character • Only first letter of prefix is capitalized • Following the underscore character is suffix that corresponds to attribute name itself • This suffix may contain only lowercase letters, a pound sign (#), and underscore characters, and corresponds to name of the attribute in conceptual data model • Examples: Pl_name, Pl_p#, Pl_budget

Data Integrity Constraints • Data integrity constraints: • Are rules that govern the behavior of data at all times in a database • Generally referred to as just integrity constraints • Are technical expressions of business rules that emerge from user requirement specifications for database application • The source of integrity constraints is the business rules • Prevail across all tiers of data modeling – conceptual, logical, and physical • Are considered to be part of the schema in that they are declared along with structural design of data model (conceptual, logical, and physical) and hold for all valid states of a database that correctly model an application (Ullman and Widom, 1997)

Classification of Data Integrity Constraints • Inherent Model-based Constraints: • Constraints driven by modeling grammar • Schema-based or Declarative Constraints: • Domain constraints • Key constraints • Relationship structural constraints • Entity integrity constraints • Referential integrity constraints • Functional dependency constraints • Semantic Integrity Constraints: • Application-based procedural constraints • DBMS-based procedural constraints (e.g., assertions, triggers, etc.)

Types of Data Integrity Constraints • State Constraints: • All declarative and procedural constraints that every valid state of a database must satisfy • Transition Constraints: • Procedural constraints that define legal transitions of state

The Concept of Unique Identifiers • Superkey: • A set of one or more attributes, which taken collectively, uniquely identifies a tuple of a relation {uniqueness property} • Candidate Key: • A superkey with no proper subset that uniquely identifies a tuple of a relation {uniqueness property + irreducibility} • Primary Key: • A candidate key with no missing values for the constituent attributes {uniqueness property + irreducibility + entity integrity constraint} • Alternate Key: • Any candidate key that is not serving the role of the primary key

Graphically… Primary Key Candidate Key Superkey

Subsets and Proper Subsets

A Sample Relation Instance Example

The Concept of Key/NON-KEY Attribute • Key attribute: • Any attribute that is a proper subset of a candidate key • Non-key attribute: • Any attribute that is not a subset of a candidate key • Note: An attribute is either a keyattribute, a non-key attribute, or a candidate key! • Example: • Rx_rx# is not a key attribute of PRESCRIPTION-A since it is not a proper subset of a candidate key. • Rx_rx# is not a non-key attribute of PRESCRIPTION-A since it is a subset of a candidate key. • Rx_rx# is a candidate key of PRESCRIPTION-A since it is an irreducible superkey of PRESCRIPTION-A.

The Concept of Key/NON-KEY Attribute (continued)

Superkey and Candidate Key: Worksheet

Superkey and Candidate Key: Examples

Referential Integrity Constraint • While the key constraints (superkey and candidate key) and entity integrity constraint (primary key) pertain to individual relation schemas, a referential integrity constraint is specified between two relation schemas, R1 and R2. • Specifically, a referential integrity constraint is specified between two relations in order to maintain consistency across tuples of the two relations. • Informal definition: A tuple in one relation that refers to another relation must refer to an existing tuple in that relation. • Foreign key constraint: A specific (special) form of referential integrity constraint specification

Foreign Key Constraint • Foreign Key ConstraintEstablishes an explicit association between two relation schemas and maintains the integrity of such an association • Foreign key: An attribute(s) set, A2, in a relation schema R2 that shares the same domain with a candidate key (A1) of another relation schema R1; A2 is said to reference or refer to the relation schema R1. Note: R2 is known as the referencing relation and R1 is called the referenced relation. The attribute(s) doing the referencing (A2 in R2) is the foreign key, while the candidate key being referenced (A1 in R1) is the referenced attribute(s). • Referred to as Inclusion Dependency, this constraint is algebraicallyexpressed as:R2.{ A2} C R1.{ A1}

Foreign Key Constraint: Example (Source Schema)

Example of a Foreign Key (Version 1)

Example of a Foreign Key (Version 2)

Naming Convention for Foreign Keys • The name of a foreign key attribute in the referencing relation schema consists of: • The prefix used for the attribute names in the referencing relation schema, • An underscore, and • The referenced attribute name

A Brief Introduction to Relational Algebra Selection (σ): Selects all tuples that satisfy the selection condition from a relation R. Projection (π): Produces a new relation with only some of the attributes of R, and removes duplicate tuples. Union (U): Produces a relation that includes all the tuples in R1 or R2 or both R1 and R2; R1 and R2 must be union compatible. .

Introduction to Relational Algebra (continued) Intersection (∩): Produces a relation that includes all the tuples in both R1 and R2; R1 and R2 must be union compatible. Difference (-): Produces a relation that includes all the tuples in R1 that are not in R2; R1 and R2 must be union compatible. Natural Join (*): Produces all the combinations of tuples from R1 and R2that satisfy a join condition; R1 and R2 must be join compatible. . 31 Chapter 6 – The Relational Data Model

Example Relational Schema

Relation Instances of the Example Relational Schema

Selection Operation • The SELECT operator is used to select a horizontal subset of the tuples that satisfy a selection condition from a relation. • <selection condition>(R) • Example: Which award-winning plants have a budget that exceeds $2,000,000? <R_aw_pl_budget > 2000000> (AW_PLANT)

Projection Operation • The PROJECT operator selects certain attributes from the relation and discards other attributes. In other words, a Projection operation forms a new relation by taking a vertical subset of an existing relation. • <attribute list>(R)

Projection Operation (continued) • Example: What is the plant number and budget of each award-winning plant? <R_aw_pl_p#, R_aw_pl_budget> (AW_PLANT)

Set-Theoretic Operators

Union Compatibility in Set-Theoretic Operators Two relations R(A1, A2, …, An) and S(B1, B2, …, Bn) are said to be union compatible: • If they have the same degree (i.e., have the same number of attributes) and • If the domain of Ai is equal to the domain of Bi for 1  i  n (i.e., corresponding attributes in R and S share the same domain)

Set-Theoretic Operators Defined • Union (R  S)Yields a relation that includes all tuples that are either exclusively in R or exclusively in S or in both R and S. That is, duplicate tuples are eliminated. • Intersection (R  S)Yields a relation that includes all tuples that are in both R and S. • Difference (R – S)Yields a relation that includes all tuples that are in R but not in S.

Union Operation • Example: What plants are located either in Texas or are award-winning plants? AW_PLANT  TX_PLANT Note: AW_PLANT and TX_PLANT are union-compatible

Intersection Operation Example: Which award-winning plants are located in Texas? AW_PLANT  TX_PLANT Note: AW_PLANT and TX_PLANT are union-compatible

Difference Operation • Example: Which Texas plants are not award-winning plants? TX_PLANT - AW_PLANT Note: AW_PLANT and TX_PLANT are union-compatible

Natural Join Operation • Example: Perform a natural join of the award-winning plant and project relations. The plant attributes needed are only P# and P_name. • A natural join between PROJECT and a projection from AW_PLANT PROJECT * (<R_aw_pl_p#, R_aw_pl_namet> AW_PLANT)

Views • A viewis a named ‘virtual’ relation schema constructed from one or more relation schemas. • Unlike a relation schema, a view does not store data. • A view is just a logical window used to ‘view’ selected data (attributes and tuples) from one or a set of relations. • The value of a view at any given time is a ‘derived’ relation state resulting from the evaluation of a specified relational expression (e.g., join, project) at that time.

Advantages of Views • Views allow the same data to be seen by different users in different ways at the same time. • Views provide security by restricting user access to a predetermined set of tuples and attributes from predetermined relations. • Views hide data complexity from the user.

Mapping: ER Model → Logical Schema • More specifically: Mapping a Fine-granular Design-Specific ER model into a relational schema • The goal of any schema transformation method ought to be the preservation of the information capacity of the source data model (e.g., ER model) in the target data model (e.g., relational data model).

Mapping Entity Types • Create a relation schema for each base entity type in the ER diagram. • Create an attribute for every stored attribute. This implies: • For composite attributes only their constituent atomic components are recorded • Derived attributes are not recorded • Multi-valued attributes do not exist in a Fine-granular Design-Specific ER model • Choose a primary key from among the candidate keys by underlining the attribute(s) constituting the primary key. • For each weak entity type, add the primary key of the identifying parent entity type as attribute(s) in the relation schema. • The attribute(s) thus added plus the partial key of the weak entity type form the primary key of the relation schema representing the weak entity type.

Example Figure 6.2 Excerpt from Fine-granular Design-Specific ER diagram of Figure 3.12

Relation Schemas for the Entity Types in Figure 6.2 Note 1: Only the atomic attributes constituting Emp# and Name in Figure 6.2 are recorded in EMPLOYEE. Note 2: The derived attributes, No_of_dependents in EMPLOYEE and No_of_employees in PLANT (Figure 6.2) are not captured here.

Mapping Relationship Types • Characteristic of the Design-Specific ER model: • Only binary or recursive relationships are present • Only 1:1 or 1:n cardinality constraints are present • Relationship type mapped by enforcing a foreign key constraint between the relation schemas participating in the relationship type. • Remember: • A referential integrity constraint requires that the referenced attribute(s) exist in the referenced relation schema • The foreign key constraint requires that the referenced attribute(s) be a candidate key of the referenced relation. • Note: Participation constraints are not mapped.

Chapter 6: The Relational Data Model

Chapter 6: The Relational Data Model

Presentation Transcript

DATABASE DESIGN USING THE REA DATA MODEL

Chapter 2: Data Preprocessing

Chapter 3: Relational Model

Chapter 3: Relational Model

DataBase Data Modeling Using the Entity-Relationship Model

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Chapter 4. Making Decisions

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

CSE544 Query Execution

Chapter 25: Advanced Transaction Processing

Chapter 6 Relational Database Design

Chapter 3: Data Transmission

Chapter 5 Relational Database Management Systems and SQL

Chapter 3: Relational Model

Chapter 2

Database Systems The Relational Data Model

Chapter 23

Explain the four primary traits that determine the value of information.

Chapter 22: Distributed Databases

Data Base Management System Unit -2

Chapter 2 The Simple Linear Regression Model: Specification and Estimation