120 likes | 242 Vues
Join us for an insightful talk from Google on the vital role of constraints in database management. Don't miss the midterm review next Monday; please send in your questions beforehand. Wednesday office hours are canceled. We'll explore the best practices for data integrity, including keys and foreign keys, and their connections to data quality concepts like entropy. Learn how constraints can optimize database performance while maintaining data integrity. This seminar is essential for ensuring a successful outcome for your data projects!
E N D
Announcements • Awesome talk from Google on Wednesday. • Please don’t miss it! • Next Monday is midterm review. • Send questions! • Wednesday office hours are canceled
Why study Constraints? • Practical: Understanding the role of constraints important for data quality and performance. • Immediately: you need it do a good job on the project! • Interesting: Can actual formalism rigorously some elements of good design of data. • Shocking you need such little information! • Deep connections to data quality and notions like entropy. • Connections: • Graphical Models: MVDs are Conditional Independence. • Used In data integration & Optimization.
Key constraints • A key is a minimal subset of attributes that acts as a unique identifier for tuples Again: If two tuples agree on the value of attributes in the key, then they must be the same tuple. • Students(sid: string, name: string, gpa: float) 1. Which would you select as a key? 2. Is a key always guaranteed to exist? 3. Can we have more than one key?
Foreign Key constraints • Students(sid: string, name: string, gpa: float) • Enrolled(studid: string, cid: string, grade: string “only bona fide students may enroll in courses”, i.e., students must appear in the course table. Studid is not a key. Why? What is? We say that Studid is a foreign key that refers to Students
Declaring Foreign Keys • Students(sid: string, name: string, gpa: float) • Enrolled(studid: string, cid: string, grade: string CREATE TABLE Enrolled ( studid CHAR(20), cid CHAR(20), grade char(10), PRIMARY KEY (studid, cid), FOREIGN KEY (studid) REFERENCES Students )
Foreign Keys and update operations • Students(sid: string, name: string, gpa: float) • Enrolled(studid: string, cid: string, grade: string 1. What if we insert a tuple into Enrolled, but no corresponding student? 2. What if we delete a student? • Disallow the delete • Remove all of the courses for that student • SQL allows a third via NULL (not yet covered) DBA chooses (syntax on the web)
NULL and NOT NULL • To say “don’t know the value” NULL • NULL has (sometimes painful) semantics, more detail later • Students(sid: string, name: string, gpa: float) Say, Jim just enrolled in his first class. In SQL, we may constrain a column to be NOT NULL, e.g., name
General Constraints • We can actually specify arbitrary assertions • “There cannot be 25 people in the DB class” • We learned about triggers and check constraints. • In practice, we don’t specify many such constraints. Why? Performance! NB: Whenever we do something ugly, it’s for the sake of performance
Summary of Constraints • Constraints are how databases understand the semantics (meaning) of data • Assertion: they are also useful optimization • SQL supports general constraints, • Keys and foreign keys are most important