230 likes | 340 Vues
Explore database design concepts from E-R models to query processing, covering normalization, relational algebra, SQL, update operations, and index performance. Understand key constraints, integrity, schema design, and query optimization techniques. Learn about relational operators, database normalization forms, and database storage hierarchy. Enhance your database design skills for efficient data management.
E N D
CS405G: Introduction to Database Systems Final Review
Database Design Jinze Liu @ University of Kentucky
E-R model • E-R model • Entities • Attributes • Relationships
Database Design 8/21/2014 8/21/2014 5 5
From E-R Diagram to Relations • Relations • Schemas • Converting E-R diagram to relations • Keys • Super keys • Candidate keys • Primary keys • Relational integrity constraints
Key Constraints • Superkey: • (Uniqueness constraints) A set of attributes where no two distinct tuples can have the same values • Every relation has at least one superkey: • The set of all attributes. • Key: A minimal superkey • Uniqueness constraint (superkey) • Minimum Constraint • No attribute can be removed and still satisfy the uniqueness constraints. Jinze Liu @ University of Kentucky
Relational Integrity Constraints Constraints are conditions that must hold on all valid relation instances. There are four main types of constraints: Domain constraints The value of a attribute must come from its domain Key constraints Entity integrity constraints Referential integrity constraints 8/21/2014 8/21/2014 8 8
Database Normalization • Functional Dependency • Functional Closure • Keys • Redefined • Based on functional dependency • DB Norm Form • 1st, 2nd, 3rd, BCNF
Database Query Luke Huan Univ. of Kansas
Relational Algebra and SQL • Relational algebra • SQL query • SFW • Group by …, Having • Subqueries • Relationship between R.A. and SQL
RelOp RelOp Relational algebra A language for querying relational databases based on operators: • Core set of operators: • Selection, projection, cross product, union, difference, and renaming • Additional, derived operators: • Join, natural join, intersection, etc. • Compose operators to make complex queries Jinze Liu @ University of Kentucky
Summary of core operators σpR πLR RXS RS R- S ρS(A1, A2, …)R • Selection: • Projection: • Cross product: • Union: • Difference: • Renaming: • Does not really add “processing” power Jinze Liu @ University of Kentucky
Summary of derived operators RpS RS RS • Join: • Natural join: • Intersection: Jinze Liu @ University of Kentucky
Selection: σpR Projection: πLR Cross product: RXS Join: RpS Natural join: RS Union: RUS Difference: R-S Intersection: R∩S Monotone Monotone Monotone Monotone Monotone Monotone Monotone w.r.t. R; non-monotone w.r.t S Monotone Classification of relational operators Jinze Liu @ University of Kentucky
Update Operations on Relations • Update operations • INSERT a tuple. • DELETE a tuple. • MODIFY a tuple. • Constraints should not be violated in updates Jinze Liu @ University of Kentucky
Basic queries: SFW statement • SELECT A1, A2, …, AnFROM R1, R2, …, RmWHERE condition; • Also called an SPJ (select-project-join) query • (almost) Equivalent to relational algebra query π A1, A2, …, An (σ condition (R1XR2 X … X Rm)) Luke Huan Univ. of Kansas
Semantics of SFW • SELECT E1, E2, …, EnFROM R1, R2, …, RmWHERE condition; • For each t1 in R1: For each t2 in R2: … … For each tm in Rm: If condition is true over t1, t2, …, tm: Compute and output E1, E2, …, En as a row • t1, t2, …, tm are often called tuple variables • Not 100% correct, we will see Luke Huan Univ. of Kansas
Operational semantics of GROUPBY SELECT … FROM … WHERE … GROUP BY …; • Compute FROM • Compute WHERE • Compute GROUPBY: group rows according to the values of GROUPBY columns • Compute SELECT for each group • For aggregation functions with DISTINCT inputs, first eliminate duplicates within the group • Number of groups = number of rows in the final output Jinze Liu @ University of Kentucky
Database Design Jinze Liu @ University of Kentucky
physical data organization • Storage hierarchy (DC vs. Pluto) !count I/O’s • Disk geometry: three components of access cost; random vs. sequential I/O • Data layout • Record layout (handling variable-length fields, NULL’s) • Block layout (NSM, PAX) ! inter-/intra-record locality • Access paths • Primary versus secondary indexes • Tree-based indexes: ISAM, B+-tree !Again, reintroduce redundancy to improve performance ! Fundamental trade-off: query versus update cost
Performance Issues on Indexes • Indexes • ISAM • B+ Tree • Metrics • Storage • IO-costs • Operations • Single value query & range query • Insertion and deletion
Query Processing Implementation • Typical Query Processings • Selection • Join • Set operations. • Typical approaches • Sequential scans in unsorted database • Sorted database • What are the tradeoffs.