1 / 48

CS411 Database Systems

CS411 Database Systems. 05: Relational Algebra   π   ≤ ≠ = ρ   ≥ ⋈ ⋉⋊. How do we query (specify what info we want from) the database?. Find all the employees who earn more than $50,000 and pay taxes in Champaign-Urbana . Could write in C++/Java, but who would want to?

tarak
Télécharger la présentation

CS411 Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS411Database Systems 05: Relational Algebra   π ≤ ≠ = ρ≥⋈ ⋉⋊

  2. How do we query (specify what info we want from) the database? Find all the employees who earn more than $50,000 and pay taxes in Champaign-Urbana. • Could write in C++/Java, but who would want to? • Instead use high-level query languages: • Theoretical: Relational algebra, Datalog • Practical: SQL

  3. Relational algebra has 5 operations Input = relation(s), output = relations • Set union:  • Set difference:  • Selection: s • Projection: p • Cartesian product:  Can add some syntactic sugar and/or define new operators in terms of these

  4. Union takes the set union of two relations Input and output relations need to all have the same schema (either attribute names, or attribute numbers) OldDiagnosis NewDiagnosis OldDiagnosisNewDiagnosis Reminder: sets have no duplicates

  5. Sometimes we’d like to name or rename the output (syntactic sugar) AllDiag = OldDiagnosis NewDiagnosis or AllDiag(Patient, Disease) = OldDiagnosis NewDiagnosis or ρPatient,Disease(OldDiagnosisNewDiagnosis)

  6. Employee Name SSN John 999999999 Tony 777777777 • LastName, SocSecNo(Employee) LastName SocSecNo John 999999999 Tony 777777777

  7. Difference takes the set difference of two relations Input and output relations need to all have the same schema (either attribute names, or attribute numbers) OldDiagnosis NewDiagnosis WrongDiagnosis = OldDiagnosis NewDiagnosis

  8. Using attribute numbers (instead of domains) looks almost the same OldDiagnosis NewDiagnosis WrongDiagnosis = OldDiagnosis NewDiagnosis

  9. Selection keeps only the tuples that satisfy a particular condition Diagnosis Find all patients who have a fever sTemperature > 98.6(Diagnosis) Better for everyone’s sake if we write this as • [Temperature > 98.6] (Diagnosis) or even Diagnosis [Temperature > 98.6] You can write it any of these three ways in this class.

  10. Selection conditions can be relatively complex Attribute names/numbers Patient Disease Salary Constants Winslett Meningitis 40,000 = < > ≤ ≠ Patient = “Winslett” Salary < 40,000 and or not Patient = “Winslett” and Temperature > 98.6

  11. Find all employees with salary more than $40,000. s Salary > 40000(Employee)

  12. Projection eliminates all but the listed columns, and puts them in the listed order For convenience, we may write π[Disease, Patient] (Diagnosis) or even Diagnosis [Disease, Patient] Diagnosis List all the patients and their diagnoses πDisease, Patient (Diagnosis)

  13. The columns you project onto have to actually exist Diagnosis [Salary, Town] π[Disease] Employee Formally, πA1, …, An(R) is a legal relational algebra expression if each of A1, …, An is an attribute of R

  14. PSSN, Name (Employee)

  15. The cartesian product of two relations is usually enormous Diagnosis RareDiseases Take each possible combination of one tuple from the first relation and one tuple from the second relation (may need to rename some attributes) Diagnosis RareDiseases

  16. Relational algebra = every expression you can make using these 5 operators Any relation name is a relational algebra expression. If R and S are relational algebra expressions, then so are R – S, RS and RS. If R is a relational algebra expression and is a selection condition, then []R is a relational algebra expression. If R is a relational algebra expression and L is a list of attributes of R, then π[L]R is a relational algebra expression. Nothing else is a relational algebra expression.

  17. Derived RA Operations1) Intersection2) Most importantly: Join

  18. Intersection can be defined in terms of difference OldDiagnosis NewDiagnosis RightDiagnosis = OldDiagnosisNewDiagnosis = OldDiagnosis – (OldDiagnosis – NewDiagnosis) More generally, R  S = R – (R – (S)).

  19. A join is a cartesian product followed immediately by a selection OldDiagnosis NewDiagnosis Who has an old diagnosis that is different from one of their new diagnoses? π[#1] [#1 = #3 and #2 ≠ #4] (OldDiagnosisNewDiagnosis) A join

  20. How does that work? NewDiagnosis OldDiagnosis Temp(Pat1, Dis1, Pat2, Dis2) = OldDiagnosisNewDiagnosis

  21. BothDiagnoses = [Pat1 = Pat2 and Dis1 ≠ Dis2] (Temp) Temp

  22. BothDiagnoses = [Pat1 = Pat2 and Dis1 ≠ Dis2] (Temp) BothDiagnoses

  23. FinalAnswer=π[Pat1] BothDiagnoses BothDiagnoses FinalAnswer

  24. There is a convenient shorthand for joins R ⋈q S = sq(RS) Relational algebra expressions I’ll let you write it as R ⋈[q] S or simply R [] S A selection condition This is called a -join, or an equijoin when  is =.

  25. Natural joins join on attributes with the same name … Employees Managers Employees ⋈Managers

  26. A natural join is an equijoin on all attributes with the same name, followed by removal of the duplicate attributes R ⋈S = πeverything but the duplicate attributessR.A1=S.A1 and…and R.An=S.An(RS) A1 through An are all the attributes R and S have in common

  27. Natural joins don’t always make sense Emp(name, dept)    Dept (name, mgr) Emp ⋈ Dept is nonsensical

  28. Your first real query: who makes more than their manager? E(emp, dept, sal)         M(mgr, dept) ESM(emp, sal, mgr) = (E ⋈ M)[emp, sal, mgr]   (ESM [mgr = E.emp & ESM.sal > E.sal] E) [ESM.emp] Why???

  29. E M ESM = (E ⋈ M)[emp, sal, mgr]   E ⋈ M ESM[mgr = E.emp & ESM.sal > E.sal]E

  30. DIag TODIag OUTcome

  31. 1.  Who has what disease?   Diag 2.  Who has a disease they have been tested for?   (Diag⋈ ToDiag ⋈ Outcome)[pat] 3.  Who has a disease they tested positively for?   (Diag ⋈ ToDiag ⋈ (Outcome [result = ‘T’]))[pat]

  32. 4.  Who has a disease that they tested both positively & negatively for?    (Han)   TestedPositive(pat, dis) = (Diag ⋈ ToDiag⋈ (Outcome[result =‘T’]))[pat, dis] TestedNegative(pat, dis) = ....   (TestedPositiveTestedNegative)[pat]  

  33.  5. Who tested both positively and negatively for a disease, whether or not they have it?   TestedPos(pat, dis) = ToDiag⋈ (Outcome  [result = ‘T’]) [pat, dis]   TestedNeg(pat, dis) = ....   (TestedPosTestedNeg) [pat]   6. Who tested both positively & negatively for the same test?   (Winslett)   (Outcome⋈ [pat = pat & test = test & result ≠result] Outcome) [pat]

  34. 7. What testable disease does no one have?       (encephalitis)   ToDiag [dis]   -  Diag [dis] Note technique: compute opposite of what you want, take a difference.  Use in hard queries with negation (‘no one’) 8. What diseases does more than one person have?   (Diag [pat≠pat & dis = dis] Diag) [dis]  

  35.  9. What disease does everyone have?  clue that query is very hard DiagnosedDiseases(dis) = diag[dis]   Patients(pat) = diag[pat]   DiseasesNotEveryoneHas(dis) = ((Patients  Disease) - Diag)[dis]   DiagnosedDiseases - DiseasesNotEveryoneHas Note technique used! A very hard query might require taking the difference several times.

  36. Every relation has at least one key Key = minimal subset of the relation’s attributes such that if two tuples agree on those attributes, then they agree on all the attributes of the relation.   CourseInfo(C#, semester, room, time, instructor) If each course has at most one section and one instructor key = {C#, semester} If each course has at most one section,  possibly > 1 instructor key = {C#, semester, instructor} If each course has one section, one instructor, and a room different from that of any other course  at that time &  semester key = {semester, room, time} Key depends on the application! 

  37. You can define relational algebra on bags instead of sets (closer match to SQL) • Union: {a,b,b,c} U {a,b,b,b,e,f,f} = {a,a,b,b,b,b,b,c,e,f,f} • add the number of occurrences • Difference: {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b} • subtract the number of occurrences • Intersection: {a,b,b,b,c,c} {b,b,c,c,c,c,d} = {b,b,c,c} • minimum of the two numbers of occurrences • Selection: preserve the number of occurrences • Projection: preserve the number of occurrences (no duplicate elimination) • Cartesian product, join: no duplicate elimination Read the book for more detail

  38. Relational algebra doesn’t include everything you might want to do Example: cannot compute transitive closure. “Find all direct and indirect relatives of Fred”

  39. Summary of relational algebra Basic primitives: E ::= R | sC(E) | pA1, A2, ..., An (E) | E1 E2 | E1 È E2 | E1 – E2 | rS(A1, A2, …, An)(E) Abbreviations: | E1 ⋈E2 | E1 ⋈CE2 | E1 Ç E2

  40. For practice at home or in your groups Extra problems

  41. BarInfobarbeerpricenameaddr Joe’s Bud 2.50 Joe’s Maple St. Joe’s Miller 2.75 Joe’s Maple St. Sue’s Bud 2.50 Sue’s River Rd. Sue’s Coors 3.00 Sue’s River Rd. BarInfo = Sells⋈[Sells.bar = Bars.name]Bars Sellsbarbeer price Barsname addr Joe’s Bud 2.50 Joe’s Maple St. Joe’s Miller 2.75 Sue’s River Rd. Sue’s Bud 2.50 Sue’s Coors 3.00

  42. Test your understanding (at home or in your groups) • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? • Given schemas R(A, B, C), S(D, E), what is another way that you can write R ⋈S ? • Given schemas R(A, B), S(A, B), what is another way that you can write R ⋈ S ?

  43. UNION RENAMER(name) PROJECTname PROJECTbar SELECTaddr = “Maple St.” SELECT price<3 AND beer=“Bud” Find the names of all bars that are on Maple St. or sell Bud for less than $3. Bars(name, addr) Sells(bar, beer, price) Bars Sells

  44. PROJECTbar SELECTbeer != beer1 JOIN RENAMES(bar, beer1, price) Find the bars that sell two different beers at the same price Sells(bar, beer, price) S(bar, beer1, price) = Sells ((S ⋈ Sells) [beer != beer1]) [bar] Sells Sells

  45. Another example schema Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city)

  46. seller-ssn=ssn pid=pid buyer-ssn=ssn Find phone numbers of people who bought gizmos from Fred Person Purchase Person Product Pname PIDs =Product[name = “gizmo”][pid] WhoBoughtFromWho = Purchase ⋈[seller_ssn = ssn] (Person[name = “fred”] [ssn]) PIDs ⋈ [pid = pid] (WhoBoughtFromWho⋈ [buyer_ssn = ssn] Person) [name] Pssn Ppid sname=fred sname=gizmo

  47. Exercises to do before your midterm Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Ex #1: Find people who bought telephony products. Ex #2: Find names of people who bought American products Ex #3: Find names of people who bought American products and did not buy French products Ex #4: Find names of people who bought American products and live in Champaign. Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock price is more than $50.

More Related