1 / 12

CSC 485E/SENG 480D/CSC 571 Advanced Databases Introduction

CSC 485E/SENG 480D/CSC 571 Advanced Databases Introduction. DB and DBMS. Database (DB): a collection of information that exists over a long period of time. Database Management System (DBMS): a complex software for handling Large data efficiently and safely. DBMS.

amos-baker
Télécharger la présentation

CSC 485E/SENG 480D/CSC 571 Advanced Databases Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 485E/SENG 480D/CSC 571 AdvancedDatabases Introduction

  2. DB and DBMS • Database (DB): a collection of information that exists over a long period of time. • Database Management System (DBMS): a complex software for handling • Large data efficientlyand safely.

  3. DBMS • Allows users to create new databases and specify their schema, using a data-definition language. • Enables users to query and modify the data, using a query and data-manipulation language. • Supports intelligent storage of very large amounts of data. • Protects data from accident or not proper use. • Example: We can require from the DBMS to not allow the insertion of two different people with the same SIN. • Allows efficient access to the data for queries and modifications. • Example: Indexes over specified fields • Controls access to data from many users at once (concurrency), without allowing “bad” interactions that can corrupt the data accidentally. • Recovers from software failures and crashes.

  4. Database Studies • Design of databases. • What kinds of information go into the database? • How is the information structured? • How do data items connect? • Database programming. • How does one express queries on the database? • How is database programming combined with conventional programming? • Database system implementation. • How does one build a DBMS, including such matters as query processing, transaction processing and organizing storage for efficient access? We’ll focus on this part

  5. Fictitious Megatron DBMS • Stores relations as Unix files • Students(name, sid, dept) is stored in the file /home/megatron/students as Smith#123#CS Jones#533#EE • Schemas are stored in /home/megatron/schemas e.g. Students#name#STR#id#INT#dept#STR Depts#name#STR#office#str

  6. Megatron sample session mayne$ megatron WELCOME TO MEGATRON 2006 megaSQL% SELECT * FROM Students; Name id dept ------------------------------------- Smith 123 CSC Johnson 522 ECE megaSQL%

  7. Megatron sample session II megaSQL% SELECT * FROM Students WHERE id >= 500; Johnson#522#EE megaSQL% quit THANK YOU FOR USING MEGATRON 2006 mayne$

  8. Megatron Implementation • To execute SELECT * FROM R WHERE <COND> • Read file schema to get attributes of R • Check that the <COND> is semantically valid for R • Read file R, • for each line • check condition • if OK, display

  9. What’s wrong with Megatron? • Tuple layout on disk: no flexibility for DB modifications. • Change CSC to ECON and the entire file has to be rewritten. • Search Expensive: no indexes; always read entire relation. • Brute­force query processing. • No buffer manager: everything comes off of disk all the time. • No concurrency control: several users can modify a file at the same time with unpredictable results. • No reliability: can lose data in a crash or leave operations half done.

  10. Architecture of a DBMS • The “cylindrical” component contains not only data, but also metadata, i.e.info about the structure of data. • If DBMS is relational, metadata includes: • names of relations, • names of attributes of those relations, and • data types for those attributes (e.g., integer or character string). • A database also maintains indexes for the data. • Indexes are part of the stored data. • Description of which attributes have indexes is part of the metadata.

  11. What will be covered • Secondary Storage Management • Disks Mechanics • Disk Computation Model • Handling disk failures • Index Structures • B-Trees, Extensible Hash Tables, etc. • Multidimensional Indexes (for GIS and OLAP) • Query Execution • Algorithms for relational operators • Join methods. • Query Compiler • Algebraic laws for improving query plans. • Cost based plan selection • Join orders

  12. What will be covered • Parallel and Distributed Databases • Parallel algorithms on relations • Distributed query processing • Distributed transactions • Google’s Map-Reduce framework • Peer-to-peer distributed search • Data Mining • Frequent-Itemset Mining • Finding similar items • Clustering of large-scale data • Databases and the Internet • Search engines • PageRank • Data streams • Data mining of streams

More Related