1 / 29

Multimedia Information Systems

Multimedia Information Systems. CS 4570. Outlines. Introduction to DMBS Relational database and SQL B + - tree index structure. Database Management System (DBMS). Collection of interrelated data Set of programs to access the data DBMS contains information about a particular enterprise

london
Télécharger la présentation

Multimedia Information Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimedia Information Systems CS 4570

  2. Outlines • Introduction to DMBS • Relational database and SQL • B+- tree index structure

  3. Database Management System (DBMS) • Collection of interrelated data • Set of programs to access the data • DBMS contains information about a particular enterprise • DBMS provides an environment that it both convenient and efficient to use

  4. Purpose of Database Systems • To overcome the disadvantages of the typical file–processing systems: • Data redundancy and inconsistency • Difficulty in accessing data • Data isolation  multiple files and formats • Integrity problems • Atomicity of updates • Concurrent access by multiple users • Security problems

  5. Data and Data Models • Data is really the “given facts” • A data model is a collection of tools for describing: • Data • Data relationships • Data semantics • Data constraints • Object-based logical models: • object-oriented model, semantic model… • Record-based logical models: • relational model (e.g. SQL, DB2)…

  6. Entity-Relationship (ER) Model • A database can be modeled as: • a collection of entities, • relationships among entities

  7. Relational Model

  8. Relational Databases • A relational database is a database that is perceived be its users as a collection of tables. ( and nothing but tables ).

  9. Relation Instance • The current values (relation instance) of a relation are specified by a table. • An element t of r is a tuple; represented by a row in a table. tuple attribute

  10. Structured Query Language (SQL) • SQL is based on set and relational operations with certain modifications and enhancements • A typical SQL query has the form: selectA 1 , A 2 , ..., A n fromr 1 , r 2 , ..., r m whereP • Ais represent attributes • ris represent relations • P is a predicate. • The result of an SQL query is a relation.

  11. Banking Example branch(branch-name, branch-city, assets) customer(customer-name, customer-street, customer-city) account(branch-name, account-number, balance) loan(branch-name, loan-number, amount) depositor(customer-name, account-number) borrower(customer-name, loan-number)

  12. The Select Clause • The select clause corresponds to the projection operation of the relational algebra. It is used to list the attributes desired in the result of a query. • Find the names of all branches in the loan relation selectbranch-name fromloan • An asterisk in the select clause denotes”all attributes” select * fromloan

  13. The Select Clause(Cont.) • SQL allows duplicates in relations as well as in query results. • To force the elimination of duplicates, insert the keyword distinct after select. Find the names of all branches in the loan relation, and remove duplicates selectdistinctbranch-name fromloan • The keyword all specifies that duplicates not be removed. Selectallbranch-name fromloan

  14. The Select Clause(Cont.) • The select clause can contain arithmetic expressions involving the operators,+,-,*,/,and operating on constants or attributes of tuples. • The query: selectbranch-name,loan-number,amount * 100 fromloan would return a relation which is the same as the loan relation, except that the attribute amount is multiplied by 100

  15. borrower (customer-name, loan-number) loan (branch-name, loan-number, amount) borrower × loan ( borrower.customer-name, borrower.loan-number, loan.branch-name, loan.loan-number, loan.amount) The from Clause • The from clause corresponds to the Cartesian product operation of the relational algebra. It lists the relations to be scanned in the evaluation of the expression. • Find the Cartesian product borrower loan select * fromborrower, loan • Find the name and loan number of all customers having a loan at the Perryridge branch. selectdistinctcustomer-name,borrower.loan-number fromborrower,loan whereborrower.loan-number=loan.loan-numberand branch-name=“Perryridge”

  16. The where Clause • The where clause corresponds to the selection predicate of the relational algebra. It consists of a predicate involving attributes of the relations that appear in the from clause. • Find all loan numbers for loans made at the Perryridge branch with loan amounts greater than $1200. selectloan-number from loan wherebranch-name=“Perryridge” and amount>1200 • SQL uses the logical connectives and, or, and not. It allows the use of arithmetic expressions as operands to the comparison operators.

  17. The where Clause(Cont.) • SQL includes a between comparison operator in order to simplify where clauses that specify that a value be less than or equal to some value and greater than or equal to some other value. • Find the loan number of those loans with loan amounts between $90,000 and $100,000(that is, $90,000 and $100,000) selectloan-number fromloan where amountbetween 90000 and 100000

  18. Example Query • Find all branches that have greater assets than some branch located in Brooklyn. selectdistinctT.branch-name frombranchasT, branchasS whereT.assets > S.assetsand S.branch-city =“Brooklyn”

  19. B+ -Tree Index Files B+ -Tree indices are an alternative to indexed-sequential files. • Disadvantage of indexed-sequential files: performance degrades as file grows, both for index lookups and for sequential scans through the data. • Advantage of B+ -Tree index files: automatically reorganizes itself with small, local, changes, in the face of insertions and deletions. • Disadvantage of B+ -Trees: extra insertion and deletion overhead, space overhead. • Advantages of B+ -Trees outweigh disadvantages, and they are used extensively.

  20. B+ -Tree Index Files (Cont.) A B+ -Tree is a rooted tree satisfying the following properties: • All paths from root to leaf are of the same length • Each node that is not a root or a leaf has between n/ 2 and n children • A leaf node has between (n-1)/2 and n-1 values • Special cases:if the root is not a leaf it has at least 2 children; if the root is a leaf (that is there are no other nodes in the tree) it can have between 0 and (n-1) values.

  21. B+ -Tree Node Structure • Typical node • Ki are the search-key values • Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets of records (for leaf nodes). • The search-keys in a node are ordered K1 < K2 < K3 < ...

  22. Leaf Nodes in B+ -Trees Properties of a leaf node: • For i = 1, 2, . . . , n-1, i either points to a file record with search-key value Ki , or to a bucket of pointers to file records, each record having search-key value Ki . Only need bucket structure if search-key does not form a primary key. • If Li , Lj are leaf nodes and i < j , Li‘s search-key values are less than Lj’s search-key values • Pn points to next leaf node in search-key order

  23. Non-Leaf Nodes in B+ -Trees • The nonleaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node with n pointers: • All the search-keys in the subtree to which P1 points are less than K1 • For 2  i  n-1 all the search-key in the subtree to which Pi points have values greater than or equal to Ki-1 and less than Ki • All the search-keys in the subtree to which Pn points are greater than or equal to Kn-1

  24. Example of a B+-tree

  25. Queries on B+-Trees (Cont.) • In processing a query, a path is traversed in the tree from the root to some leaf node. • If there are K search-key values in the file, the path is no longer than log n/ 2 (K ). • A node is generally the same size as a disk block, typically 4 kilobytes, and n is typically around 100 (40 bytes per index entry). • With 1 million search key values and n = 100, at most log50 (1, 000, 000) = 4 nodes are accessed in a lookup. • Contrast this with a balanced binary tree with 1 million search key values --- around 20 nodes are accessed in a lookup • above difference is significant since every node access may need a disk I/O, costing around 30 millisecond!

  26. Updates on B+-Trees: Insertion (Cont.) • Splitting a node: • take the n (search-key value, pointer) pairs (including the one being inserted) in sorted order. Place the first n/ 2 in the original node, and the rest in a new node. • let the new node be p, and let k be the least key value in p. Insert (k, p) in the parent of the node being split. If the parent is full, split it and propagate the split further up. • The splitting of nodes proceeds upwards till a node that is not full is found. In the worst case the root node may be split increasing the height of the tree by 1.

  27. Updates on B+-Trees: Insertion (Cont.)

  28. Self-Practice • Construct a B+-tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) with n = 6

More Related