1 / 94

732A54 Big Data Analytics 6hp

732A54 Big Data Analytics 6hp. http://www.ida.liu.se/~732A54. Relational databases. Literature. Elmasri, Navathe, Fundamentals of Database Systems, 7 th edition, Addison Wesley, 2016. Chapters 3-6 and 9; section 7.1. Database methods. 1. Representation and storage of data. Databases.

cipriani
Télécharger la présentation

732A54 Big Data Analytics 6hp

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 732A54 Big Data Analytics 6hp http://www.ida.liu.se/~732A54

  2. Relational databases

  3. Literature • Elmasri, Navathe, Fundamentals of Database Systems, 7th edition, Addison Wesley, 2016. Chapters 3-6 and 9; section 7.1.

  4. Database methods 1. Representation and storage of data

  5. Databases • One (of many) way(s) to store data in electronic form • Used in every-day life: bank, reservation of hotel or journey, library search, shops

  6. Databases • Database management system (DBMS): a collection of programs that supports a user to create and maintain a database • database system = database + DBMS

  7. Data Queries Answers Model Data source Processing of queries and updates Data source management system Access to stored data Physical Data source Data Sources

  8. Persons • Data source administrator • Data source designer • ’end user’ • application programmer • DBMS designer • tool developer • operator, maintenance

  9. Issues - this course • What information is stored? • How is the information stored? (high level and low level) • How is the information accessed? (user level and system level)

  10. Other issues • How to optimize performance of a data source? • How to recover a data source after crash? • How to access information from multiple data sources? • How to allow multiple users to access a data source?

  11. Data Queries Answers Model Data source Processing of queries and updates Data source management system Access to stored data Physical Data source Data Sources

  12. Which information is stored? • Model of reality - Entity-Relationship model (ER) - Unified Modeling Language (UML)

  13. ER/EER diagram • structured way to model data, independent of type of data source • notions: - entities and entity types - attributes - key attributes - relationships and cardinality constraints - sub-types (EER)

  14. DEFINITION Homo sapiens adrenergic, beta-1-, receptor ACCESSION NM_000684 SOURCE ORGANISM human REFERENCE 1 AUTHORS Frielle, Collins, Daniel, Caron, Lefkowitz, Kobilka TITLE Cloning of the cDNA for the human beta 1-adrenergic receptor REFERENCE 2 AUTHORS Frielle, Kobilka, Lefkowitz, Caron TITLE Human beta 1- and beta 2-adrenergic receptors: structurally and functionally related receptors derived from distinct genes

  15. protein-id article-id Reference title definition author source PROTEIN ARTICLE Entity-relationship accession m n

  16. protein-id definition source PROTEIN Entity-relationship attribute key attribute accession entity type Weak entity type: type without key attribute

  17. article-id title author ARTICLE Entity-relationship multi-valued attribute

  18. Person-id address PERSON Entity-relationship city composite attribute zip street number birthdate age derived attribute

  19. protein-id article-id Reference title definition author source PROTEIN ARTICLE Entity-relationship accession m m-n relationship binary relationship Also: 1-1, 1-n relationships Also: n-ary relationships n

  20. protein-id article-id Reference title definition author source PROTEIN ARTICLE Entity-relationship accession m partial participation no relationship attributes Also: relationship attributes Also: total participation n

  21. gene-id version-id length locus source GENE Entity-relationship 1 VersionOf Weak entity type n GENE VERSION

  22. name PERSON Birthdate Enhanced Entity-relationship P-id address sub-type relationship department TEACHER

  23. Exercise The university wants to create a database for teaching. The database needs to contain information about the different courses, the different versions of the courses given during different occasions, the teachers, and the students taking the courses. A course contains a unique course code and a name. It also has one or more examination moments such written exam, lab work, project. Each version of a course has a date (when the course is given) and information about course literature. A course version has exactly one examiner which should be a teacher, and one or more teachers teaching the classes. Teachers can be examiner for several course versions as well as teach in several course versions. Students can sign up for many course versions. The database should also store the final grades for students for the different courses.

  24. Data Queries Answers Model Data source Processing of queries and updates Data source management system Access to stored data Physical Data source Data Sources

  25. structure precision How is information stored? (high level) How is information accessed? (user level) • Text (IR) • Semi-structured data • Data models (DB) • Rules + Facts (KB)

  26. Databases • Relational databases: - model: tables + relational algebra - query language (SQL) • Object-oriented, extended-relational, NoSQL databases

  27. Relational Databases • Tables = relations (~ entity type) - row = tuple (~ entity) - column = attribute (~ attribute) • primary keys • foreign keys

  28. ER/EER to database schema

  29. protein-id article-id Reference title definition author source PROTEIN ARTICLE Entity-relationship accession m n

  30. ER/EER to database schema Step 1 For each (strong) entity type E create a table R with the same simple attributes as the entity.

  31. protein-id definition source PROTEIN accession PROTEIN ( PROTEIN-ID, ACCESSION, SOURCE, DEFINITION )

  32. article-id title ARTICLE Author is multi-valued attribute. ARTICLE-TITLE ( ARTICLE-ID, TITLE )

  33. gene-id source GENE GENE ( GENE-ID, SOURCE)

  34. ER/EER to database schema Step 2 For each weak entity type W med owner entity type E, create a table R with the same simple attributes as W and add the primary key attributes from the relation that corresponds to E.

  35. gene-id version-id length locus source GENE GENE ( GENE-ID, SOURCE) 1 VersionOf GENEVERSION ( GENE-ID, VERSION-ID, LENGTH, LOCUS) n GENE VERSION

  36. ER/EER to database schema Step 3 For each binary 1:1 relationship between S and T, add the primary key of the table corresponding to one of S or T as a foreign key to the table corresponding to the other.

  37. emp-id dept-id Manages D-name P-name MANAGER DEPARTMENT MANAGER ( EMP-ID, P-NAME, M-DEPT-ID) MANAGER ( EMP-ID, P-NAME) 1 DEPARTMENT (DEPT-ID, D-NAME) DEPARTMENT (DEPT-ID, D-NAME) OR MANAGER ( EMP-ID, P-NAME) DEPARTMENT (DEPT-ID, D-NAME, M-EMP-ID) 1

  38. ER/EER to database schema Step 4 For each binary 1: n relationship between S and T (every S is related to many T, every T is related to one S), add the primary key of the table corresponding to S as a foreign key to the table corresponding to T.

  39. emp-id dept-id Works-for D-name P-name EMPLOYEE DEPARTMENT EMPLOYEE ( EMP-ID, P-NAME, WORKS-DEPT-ID) EMPLOYEE ( EMP-ID, P-NAME) n DEPARTMENT (DEPT-ID, D-NAME) DEPARTMENT DEPT-ID, D-NAME) 1

  40. ER/EER to database schema Step 5 For each binary m : n relationship between S and T create a table R with the primary keys of the tables corresponding to S and T as foreign keys. If the relationship has attributes, then add these to R.

  41. ARTICLE (ARTICLE-ID) PROTEIN m Reference n ARTICLE REFERENCE ( PROTEIN-ID, ARTICLE-ID ) PROTEIN (PROTEIN-ID)

  42. ER/EER to database schema Step 6 For every multi-valued attribute A in R, create a new table that contains an attribute corresponding to A and the primary key of R as foreign key.

  43. author ARTICLE ARTICLE (ARTICLE-ID) ARTICLE-AUTHOR ( ARTICLE-ID, AUTHOR )

  44. REFERENCE PROTEIN PROTEIN-ID ARTICLE-ID PROTEIN-ID ACCESSION DEFINITION SOURCE 1 1 1 Homo sapiens adrenergic, beta-1-, receptor human NM_000684 1 2 ARTICLE-TITLE ARTICLE-ID TITLE Cloning of the cDNA for the human beta 1-adrenergic receptor 1 Human beta 1- and beta 2-adrenergic receptors: structurally and functionally related receptors derived from distinct genes 2 Relational databases ARTICLE-AUTHOR ARTICLE-ID AUTHOR 1 1 1 1 1 1 2 2 2 2 Frielle Collins Daniel Caron Lefkowitz Kobilka Frielle Kobilka Lefkowitz Caron

  45. EER to database schema Sub-type relationship Assume type C has subtypes Si option 1: Create a table R for C with all attributes in C. Then create table Ri for each Si with all attributes from Si and the primary key from R.

  46. name PERSON birthdate P-id address PERSON ( P-ID, BIRTHDATE, ADDRESS, NAME ) STUDENT TEACHER ( P-ID, DEPARTMENT) program STUDENT ( P-ID, PROGRAM ) TEACHER department

  47. EER to database schema Sub-type relationship Assume type C has subtypes Si option 2: Create a table Ri for each Si with as attributes all attributes from Si and from C. Only works if every C belongs to a Si.

  48. name PERSON birthdate P-id address TEACHER ( P-ID, BIRTHDATE, ADDRESS, NAME, DEPARTMENT) STUDENT STUDENT ( P-ID, BIRTHDATE, ADDRESS, NAME, PROGRAM ) program TEACHER department

  49. EER to database schema Sub-type relationship Assume type C has subtypes Si option 3: Create a table R with all attributes from C and all attributes from the Si. Add an attribute to discriminate between the subtypes. Only works for disjoint subtypes.

  50. name PERSON birthdate P-id address PERSON ( P-ID, BIRTHDATE, ADDRESS, NAME, PERSON-SUB-TYPE, DEPARTMENT, PROGRAM ) STUDENT program TEACHER department

More Related