270 likes | 356 Vues
Dive into the world of relational algebra, the theoretical basis for designing queries in databases. Learn about unary and binary operators, and how to write queries accurately. Explore the rules of relations, relational databases, and SQL for database querying.
E N D
Relational Algebra “The beginning…” What is a query? Why use a theoretical basis for designing ‘queries’? What is a relation? What is relational algebra (RA)? Start with some unary operators in RA Then the basic binary operators
A query • What do we mean by a query to a database?
A query • What do we mean by a query to a database? • Given one or more relations, apply operations that ultimately return a new relation. • Usually we assume the new relation will have different information than the original input relations and will combine the original relations in meaningful ways. • We can write a query accurately in English • We can accurately name the resulting relation in English • We can draw queries as a way of planning them out
SQL is the language used for talking to a relational database management system • Allows • changing the schema (“data definition”), • changing the data (“data modification”), and • querying • But the querying part of SQL isn't very good • Confusing syntax that is hard to read • Major differences from one DBMS to the next • We teach you relational algebra (RA) for querying • We use a graphical representation of RA • It's easy to convert RA into SQL
A relational database is a set of relations • A relation is a set of rows and a set of columns • A relation has a name • A relation has an identifier which is a set of columns • We consistently underline columns in the identifier • Rules: • A row has a value for each column in the identifier • (A row may or may not have values for columns that are not in the identifier) • No two rows have the same values in all the identifying columns
Is this a relation? • Oscar
Is this a relation? • Oscar
Is this a relation? • Oscar
More about relations • No two rows of data are duplicated • implied by our rule about identifying columns • No two columns have the same name • No two relations in a given database have the same name
Relational Algebra • Inputs: • one or two relations • Operators: • work on those relations (or more explicitly columns or rows) • Output: • one relation
RA Operators • We'll look at unary operators first These take a relation and return a relation • project • filter • reduce • Group
Project • Project removes and adds non-identifying columns • Adds computed columns – arithmetic, string manipulation, etc. • result: • as wide as number of columns named, • which must include the identifying columns • exactly as tall as original relation • identifier is same • SQL: • Select columns from relation;
Project example oscar Project Id: ??? Carry: Movie Compute: age = currentYear - BirthYear ?????
Project example oscar Project Id: First_name, Last_name, Award Carry: Movie Compute: age = currentYear - BirthYear First_name, Last_name, Award, Movie, Actor_age of Oscar
Filter • Filter removes rows based on a condition • result: • as wide as the original relation • shorter or exactly as tall as the original relation • identifier is same • SQL: • Select * from relation where condition;
Filter example: • Find each oscar award for the movie “The Blind Side”
What algorithm do you think Filter uses? • Trick question: depends on the DBMS and the configuration!
We often combine project and filter • An activity you could try on your own • Go to sqlzoo.net • Try tutorial 0 • Reverse engineer the example queries into relational algebra
Reduce • Reduce changes the identifier, • result: • narrower or the same width as the original relation • possibly shorter than the original relation • unsafe! • SQL: • Select distinctnew_identifier from original relation • Let’s try it with the tiny database: • Practice Query #3
Group • Group changes the identifier, combining duplicate rows using an aggregate function • result: • narrower, the same width, or wider than the original relation • possibly shorter than the original relation • unsafe! • SQL (2 basic types): Select distinct new_identifier, aggregate_function from relation group by new_identifier Select aggregate_function from relation
Tiny database SQL (2 basic types): Select distinct new_identifier, aggregate_function from relation group by new_identifier Select aggregate_function from relation Practice queries 12, 16