960 likes | 1.12k Vues
Relational Algebra and views. Father of Relational Model Edgar F. Codd (1923-2003). PhD from U. of Michigan, Ann Arbor Received Turing Award in 1981. More see http://en.wikipedia.org/wiki/Edgar_Codd. Relational Algebra Defined: Tuple. An ordered set of data values.
E N D
Father of Relational Model Edgar F. Codd (1923-2003) • PhD from U. of Michigan, Ann Arbor • Received Turing Award in 1981. • More see http://en.wikipedia.org/wiki/Edgar_Codd
Relational Algebra Defined:Tuple • An ordered set of data values. { a1 , a2 , a3 , …, an }
Relational Algebra Defined:Relation • A set of tuples. { { a1, a2, a3, … , an }, { b1, b2, b3, … , bn }, { c1, c2, c3 , … , cn }, …………. …………. }
Relational Algebra Defined:Algebra • Any formal mathematical system consisting of a set of objects and operations on those objects. • Based on operators and a domain of values • Operators map arguments from domain into another domain value • Example: x = 3.5 * y
Relational Algebra Defined:Relational Algebra • An algebra whose objects are relations and whose operators transform relations into other relations. • Domain: set of relations, i.e., result is another relation • Basic operators: select, project, union, set difference, Cartesian product (or cross product)
Relational Algebra Defined:Where is it in DBMS? Optimized Relational algebra expression Relational algebra expression Query execution plan Executable code SQL Code generator parser Query optimizer DBMS
Relational Query Languages • Languages for describing queries on a relational database • Structured Query Language (SQL) • Predominant application-level query language • Declarative • Relational Algebra • Intermediate language used within DBMS • Procedural
Motivation • SQL is a declarative language that allows the user to express what information they require from the database. • It does not allow the user to specify how that information will be extracted. This has the benefit from the user's point of view of making SQL much more usable because s/he does not need to get involved in the detailed mechanisms involved in satisfying the query.
Because of this, the responsibility for developing procedures for satisfying queries devolves to the Query Optimiser, which forms part of the DBMS. • Relational algebra is a mathematical formalism that is used to express queries. It forms the basis of the procedural language of the execution plans, which are generated by the Query Optimiser. Execution plans specify how to satisfy the query. They list a sequence of procedures which can be applied to the database in order to satisfy the query.
The Relational Algebra consists of a set of fundamental operators that take relations as their operands and return a relation as their result. This means that operators can be composed, ie the output of one operation may be used as input to another operation. • The relational algebra is a language of expressions that can be evaluated to yield relations. In a similar way, normal algebraic expressions, for example x+5*y, are evaluated to yield numbers. In normal algebra, one expression can be equivalent to another expression, for example, x+5*y = y*5+x.
The operators defined in the Relational Algebra enable the Query Processor of the DBMS to analyse the structure of queries, to generate alternative strategies for solving them, to evaluate the efficiency of competing strategies, and to manipulate the data in the tables in order to generate query results.
What is an Algebra? • A language based on operators and a domain of values • Operators map values taken from the domain into other domain values • Hence, an expression involving operators and arguments produces a value in the domain • When the domain is a set of all relations (and the operators are as described later), we get the relational algebra • We refer to the expression as a query and the value produced as the queryresult
Relational Algebra • Domain: set of relations • Basic operators: select, project, union, setdifference, Cartesianproduct • Derived operators: set intersection, division, join • Procedural: Relational expression specifies query by describing an algorithm (the sequence in which operators are applied) for determining the result of an expression
Relational Algebra Operations Basic Operations: • SELECTION, σ - Selects the rows, or tuples, from a relation, which satisfy the specified constraints or restrictions • PROJECTION, π - Selects the specified columns, or attributes, from a relation • CROSS PRODUCT, X - Combines the tuples from two relations to create new tuples containing attributes from both original relations. Every tuple from one relation is combined with every tuple from the other relation.
Relational Algebra Operations • UNION, U - creates a new relation containing all the tuples from two relations which have the same structure. • SET DIFFERENCE, \ - creates a new relation containing tuples from two relations which have the same structure, and where the tuples exist in the first relation but not in the second relation.
Relational Algebra Operations Some additional operations are: • Intersection “∩ “ • Join “ “ • Renaming “ρ “ Operations take relations as input, and output relations.
Operations (Unary):Selection, Projection Selection: <condition(s)> (<relation>) • Picks tuples from the relation Projection: <attribute-list>(<relation>) • Picks columns from the relation
Operations (Set):Union, Set Difference Union: (<relation>) U (<relation>) • New relation contains all tuples from both relations, duplicate tuples eliminated. Set Difference: R – S • Produces a relation with tuples that are in R but NOT in S.
Operations (Set):Cartesian Product, Intersect Cartesian Product: R x S • Produces a relation that is concatenation of every tuple of R with every tuple of S • The Above operations are the 5 fundamental operations of relational algebra. Intersection: R S • All tuples that are in both R and S
Operations (Join):Theta Join, Natural Join Theta Join: R F S = F (R x S) • Select all tuples from the Cartesian product of the two relations, matching condition F • When F contains only equality “=“, it is called Equijoin • Natural Join: R S • Equijoin with common attributes eliminated
Operations:Outer Join, Semi Join (left) Outer Join: R S • Natural join relations while preserving all tuples from the “outer” side -> NULL values incurred. Semi Join: R F S = A(R F S) • Join two relations and only keeps the attributes seem in relation R • There are Semi-Theta Join, Semi-Equijoin and Semi-Natural Join
Operations:Division • Division: R ÷ S • Produce a relation consist of the set of tuples from R that matches the combination of every tuple in S R S R÷S • T1 ← c (R) • T2 ← c ((SxT1)–R) • T ← T1 – T2
Translation to SQL • FROM clause produces Cartesian product (x) of listed tables • WHERE clause assigns rows to C in sequence and produces table containing only rows satisfying condition ( sort of like ) • SELECT clause retains listed columns ( )
Translation to SQL (Cont.) • SELECT C.CrsName • FROM Course C, Teaching T • WHERE C.CrsCode=T.CrsCode AND T.Sem=‘F2003’ • List CS courses taught in F2003 • Tuple variables clarify meaning. • Join condition “C.CrsCode=T.CrsCode” • eliminates garbage • Selection condition “ T.Sem=‘F2003’ ” • eliminates irrelevant rows • Equivalent (using natural join) to: CrsName(Course Sem=‘F2003’ (Teaching) ) CrsName (Sem=‘F2003’ (Course Teaching) )
Produce table containing subset of rows of argument table satisfying condition condition (relation) Example: Person Hobby=‘stamps’(Person) Select Operator Id Name Address Hobby Id Name Address Hobby 1123 John 123 Main stamps 1123 John 123 Main coins 5556 Mary 7 Lake Dr hiking 9876 Bart 5 Pine St stamps 1123 John 123 Main stamps 9876 Bart 5 Pine St stamps
Selection Condition • Operators: <, , , >, =, • Simple selection condition: • <attribute> operator <constant> • <attribute> operator <attribute> • <condition> AND <condition> • <condition> OR <condition> • NOT <condition>
Selection Condition - Examples • Id>3000OR Hobby=‘hiking’ (Person) • Id>3000 AND Id <3999(Person) • NOT(Hobby=‘hiking’)(Person) • Hobby‘hiking’(Person)
Project Operator • Produces table containing subset of columns of argument table attribute list(relation) • Example: Person Name,Hobby(Person) IdName AddressHobbyNameHobby John stamps John coins Mary hiking Bart stamps 1123 John 123 Main stamps 1123 John 123 Main coins 5556 Mary 7 Lake Dr hiking 9876 Bart 5 Pine St stamps
Project Operator • Example: • Person Name,Address(Person) Id Name Address Hobby Name Address John 123 Main Mary 7 Lake Dr Bart 5 Pine St 1123 John 123 Main stamps 1123 John 123 Main coins 5556 Mary 7 Lake Dr hiking 9876 Bart 5 Pine St stamps Result is a table (no duplicates); can have fewer tuples than the original
Expressions Id, Name(Hobby=’stamps’OR Hobby=’coins’(Person) ) Id Name Address Hobby Id Name 1123 John 9876 Bart 1123 John 123 Main stamps 1123 John 123 Main coins 5556 Mary 7 Lake Dr hiking 9876 Bart 5 Pine St stamps Result Person
Set Operators • Relation is a set of tuples, so set operations should apply: , , (set difference) • Result of combining two relations with a set operator is a relation => all its elements must be tuples having same structure • Hence, scope of set operations limited to union compatible relations
Union Compatible Relations • Two relations are union compatible if • Both have same number of columns • Names of attributes are the same in both • Attributes with the same name in both relations have the same domain • Union compatible relations can be combined using union, intersection, and setdifference
Example Tables: Person(SSN, Name, Address, Hobby) Professor(Id, Name, Office, Phone) are not union compatible. But Name(Person)and Name(Professor) are union compatible so Name(Person) -Name(Professor) makes sense.
A B C D A B C D x1 x2 y1 y2 x1 x2 y1 y2 x3 x4 y3 y4 x1 x2 y3 y4 x3 x4 y1 y2 RS x3 x4 y3 y4 RS Cartesian Product • If Rand Sare two relations, RS is the set of all concatenated tuples <x,y>, where x is a tuple in R and y is a tuple in S • R and S need not be union compatible. • But R and S must have distinct attribute names. Why? • RS is expensive to compute. But why?
Renaming • Result of expression evaluation is a relation • Attributes of relation must have distinct names. This is not guaranteed with Cartesian product • e.g., suppose in previous example A and C have the same name • Renaming operator tidies this up. To assign the names A1, A2,… An to the attributes of the n column relation produced by expression expr use expr [A1, A2, … An]
Example Transcript (StudId, CrsCode, Semester, Grade) Teaching (ProfId, CrsCode, Semester) StudId, CrsCode(Transcript)[StudId, CrsCode1] ProfId, CrsCode(Teaching) [ProfId, CrsCode2] This is a relation with 4 attributes: StudId, CrsCode1, ProfId, CrsCode2
Derived Operation: Join A (general or theta) join of R and S is the expression RcS where join-condition c is a conjunction of terms: Aioper Bi in which Ai is an attribute of R;Bi is an attribute of S; and operis one of =, <, >, , . Q: Any difference between join condition and selection condition? The meaning is: c(R S) Where join-condition c becomes a select condition c except for possible renamings of attributes (next)
Join and Renaming • Problem: R and S might have attributes with the same name – in which case the Cartesian product is not defined • Solutions: • Rename attributes prior to forming the product and use new names in join-condition´. • Qualify common attribute names with relation names (thereby disambiguating the names). For instance: Transcript.CrsCode or Teaching.CrsCode • This solution is nice, but doesn’t always work: consider Rjoin_conditionR In R.A, how do we know which R is meant?
Theta Join – Example Employee(Name,Id,MngrId,Salary) Manager(Name,Id,Salary) Output the names of all employees that earn more than their managers. Employee.Name(EmployeeMngrId=Id AND Employee.Salary> Manager.SalaryManager) The join yields a table with attributes: Employee.Name, Employee.Id, Employee.Salary, MngrId Manager.Name, Manager.Id, Manager.Salary
Equijoin Join - Example Equijoin: Join condition is a conjunction of equalities. Name,CrsCode(Student Id=StudId Grade=‘A’ (Transcript)) Student Transcript Id Name Addr Status 111 John ….. ….. 222 Mary ….. ….. 333 Bill ….. ….. 444 Joe ….. ….. StudId CrsCode Sem Grade 111 CSE305 S00 B 222 CSE306 S99 A 333 CSE304 F99 A The equijoin is used very frequently since it combines related data in different relations. Mary CSE306 Bill CSE304
Natural Join • Special case of equijoin: • join condition equates all and only those attributes with the same name (condition doesn’t have to be explicitly stated) • duplicate columns eliminated from the result Transcript (StudId, CrsCode, Sem, Grade) Teaching (ProfId, CrsCode, Sem) Teaching = Transcript StudId, Transcript.CrsCode, Transcript.Sem, Grade, ProfId (Transcript Transcipt.CrsCode=Teaching.CrsCode ANDTranscirpt.Sem=Teaching.SemTeaching ) [StudId, CrsCode, Sem, Grade, ProfId] Q: but why natural join is a derived operator? Because…
Natural Join (cont’d) • More generally: R S = attr-list(join-cond(R × S) ) where attr-list = attributes (R) attributes (S) (duplicates are eliminated) and join-cond has the form: R.A1 = S.A1AND … ANDR.An = S.An where {A1 … An} = attributes(R) attributes(S)
Natural Join Example • List all Ids of students who took at least two different courses: StudId( CrsCode CrsCode2 ( Transcript Transcript[StudId, CrsCode2, Sem2, Grade2] )) We don’t want to join on CrsCode, Sem, and Grade attributes, hence renaming!
Division • Goal: Produce the tuples in one relation, r, that match all tuples in another relation, s • r (A1, …An, B1, …Bm) • s (B1 …Bm) • r/s, with attributes A1, …An, is the set of all tuples <a> such that for every tuple <b> ins,<a,b> is in r • Can be expressed in terms of projection, set difference, and cross-product
Division - Example • List the Ids of students who have passed all courses that were taught in spring 2000 • Numerator: • StudId and CrsCode for every course passed by every student: StudId, CrsCode (Grade ‘F’ (Transcript) ) • Denominator: • CrsCode of all courses taught in spring 2000 CrsCode(Semester=‘S2000’ (Teaching) ) • Result is numerator/denominator
Schema for Student Registration System Student (Id, Name, Addr, Status) Professor (Id, Name, DeptId) Course (DeptId, CrsCode, CrsName, Descr) Transcript (StudId, CrsCode, Semester, Grade) Teaching (ProfId, CrsCode, Semester) Department (DeptId, Name)