1 / 43

DATABASE SYSTEMS -- Course Outline

DATABASE SYSTEMS -- Course Outline. Preliminaries Introduction - Architectures Data Modeling and Database Design The Entity-Relationship (E-R) Model The Relational Model and Systems - SQL Physical Database Organization

trang
Télécharger la présentation

DATABASE SYSTEMS -- Course Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATABASE SYSTEMS -- Course Outline • Preliminaries • Introduction - Architectures • Data Modeling and Database Design The Entity-Relationship (E-R) Model • The Relational Model and Systems - SQL • Physical Database Organization • Other Classical Data Models (Hierarchical, DBTG-network) and Systems (IMS, Total,etc.) • Database Administration and Maintenance

  2. Outline - continued • Database Internals and Operational Issues • Integrity and Security • Query Processing • Concurrency Control,Recovery • Emerging Technologies and Systems • Object-Oriented, Multidatabases, Parallel Systems • Logic-Based, Active, Intelligent Systems • The emergence of PC-based DBMSs

  3. Introduction -- Definitions • DATA Known facts that can be recorded and have an implied meaning • DATABASE An integrated collection of related data, stored on secondary storage • MINIWORLD Some part of a real world about which data is stored in a database • DATABASE MANAGEMENT SYSTEM (DBMS) A collection of software modules (a generalized software package) for creating, manipulating, and maintaining the database • DATABASE SYSTEM The DBMS Software together with the data itself

  4. Introduction -- Historical Perspective (1) • 1950s (First Generation or File Systems on Tape) • batch processing, cards and tapes (sequential processing) • 1960s (Second generation or File Systems on Disk) • expanded use of random-access disk technology • database field • early file systems • generalized sorting packages • beginnings of generalized software systems • data definition incorporated into programming language • COBOL • development of in-house database systems

  5. FILE PROCESSING SCENARIO P1 P2 P3 P4 REMINDERS FOR LATE BOOKS ACQUISITION OF BOOKS RECORDING OF READERS BORROWING BOOKS BORROWED BOOKS BOOKS READERS

  6. Introduction -- Historical Perspective (2) • 1970s (Third generation or Pre-Relational) - movement towards standardization with CODASYL • DBTG (Data Base Task Group) - reports in 1969, 1971, 73, 78, 81, 85..., - STORED data definitions AND data - embed general access routines in a HOST language(COBOL) - NETWORK and HIERARCHICAL SYSTEMS defined - RELATIONAL model proposed by Codd (in theory) - Computer Science Interest - Clear separation between “logical” and “physical” organization - Operational Issues examined in a more general and theoretical way - First Relational prototype systems created (SYSTEM-R, INGRESS) - Data Models become prevalent -- 3-level architectures

  7. DATABASE PROCESSING SCENARIO Logical Files (VIEWS) User/Group Application Programs . F1 P1 P2 F2 DBMS DATABASE (Software) (Integrated) F3 P3 F4 P4 . . . .

  8. Example Database • Mini-World for the Example - Part of the University Environment INFORMATION SOME MINI WORLD ENTITIES Students Courses Instructors Departments SOME MINI WORLD RELATIONSHIPS Students Take Courses Courses have Prerequisite Courses Instructors Teach Courses Courses are Offered by Departments

  9. Example Database -- Instance (1) .

  10. Example Database -- Instance (2) .

  11. Introduction -- Historical Perspective (3) • 1980s (Fourth generation or Relational) • Relational Database Systems - Powerful Languages and Interfaces - Established Theory in Databases - Set-oriented vs. Record-oriented management and processing of data - Database Systems integrated into large Transactional Systems (networks, etc.) - Appearance of object-oriented, “intelligent”, and other models/systems

  12. Introduction -- Historical Perspective (4) • 1990s (Fifth generation or Post-Relational) - Emergence of COMPLEX OBJECTS in databases (engineering objects, multimedia, software objects) -- not only structured data! • Object-Relational Database Systems - Multidatabases, Active and Extensible Systems, Massively Parallel - Multimedia Database Systems - A strong showing of PC-based DBMSs. Threat to Capture the Market? - Web Database Systems - Servers

  13. Database System Features and Characteristics • Controlling Redundancy by reduced duplication + Data Consistency • SHARING of data among multiple users • Enforcing Integrity Constraints • Uniform Access and Control of Data • Restricting unauthorized or malicious access to data (Security) • Centralized Control for better operation (Database Administration) • Providing MultipleInterfaces to different Classes of Users • Concurrency Control and Recovery Facilities • Potential for Enforcing Standards • Reduced Application Development Time • Economies of Scale

  14. Simplified Picture of Database System VIEWS of the Database USERS or APPL PROGRAMS . DBMS DATABASE V1 U1 P2 V2 P3 V3 U4 V4 . .

  15. The ANSI/SPARC 3-level DBMS Architecture . . USER INTERFACE EXTERNAL SCHEMA 1 EXTERNAL SCHEMA n ........ INTERFACE: External to Conceptual Schema CONCEPTUAL SCHEMA INTERFACE: Conceptual to Internal Schema Database Internal to INTERNAL SCHEMA DBMS: Responsible for all INTERFACES Database Interface

  16. What Is a DBMS? • A very large, integrated collection of data. • Models real-world enterprise. • Entities (e.g., students, courses) • Relationships (e.g., Madonna is taking CS564) • A Database Management System (DBMS)is a software package designed to store and manage databases.

  17. Why Use a DBMS? • Data independence and efficient access. • Reduced application development time. • Data integrity and security. • Uniform data administration. • Concurrent access, recovery from crashes.

  18. Database System Features and Characteristics • Self-Contained Nature of a Database System - A DBMS CATALOG stores the DESCRIPTIONof the database (called, META-DATA). With that, the DBMS works on different databases • Insulation between Programs and Data - Called PROGRAM-DATA independence. This feature allows to change the data storage structures without having to change the DBMS access programs • Data Abstraction - A Data Model is used to hide storage details and present the user with a conceptual view of the database. • Support of Multiple Views of the Data - Each view describes only the data of interest to that user

  19. Levels of Abstraction • Many views, single conceptual(logical) schemaand physical schema. • Views describe how users see the data. • Conceptual schema defines logical structure • Physical schema describes the files and indexes used. View 1 View 2 View 3 Conceptual Schema Physical Schema • Schemas are defined using DDL; data is modified/queried using DML.

  20. Example: University Database • Conceptual schema: • Students(sid: string, name: string, login: string, age: integer, gpa:real) • Courses(cid: string, cname:string, credits:integer) • Enrolled(sid:string, cid:string, grade:string) • Physical schema: • Relations stored as unordered files. • Index on first column of Students. • External Schema (View): • Course_info(cid:string,enrollment:integer)

  21. The Architecture of a DBMS DBMS DATA MANAGER . DATABASE Dictionary Manager Output Generator Applic. Transaction Manager Recovery Manager I/O Processor LOG USER Query Parser Data Dictionary (Schemas) Precompiler Generation of Executable Code Optimizer Authorization Control Integrity Checker Update Processor Query Processor

  22. DBMS Components versusDatabase Interfaces .

  23. Layered Organization of DBMS Components .

  24. Concurrency Control • Concurrent execution of user programs is essential for good DBMS performance. • Because disk accesses are frequent, and relatively slow, it is important to keep the CPU humming by working on several user programs concurrently. • Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed. • DBMS ensures such problems don’t arise: users can pretend they are using a single-user system

  25. Transaction: An Execution of a DB Program • Key concept is transaction, which is an atomicsequence of database actions (reads/writes). • Each transaction, executed completely, must leave the DB in a consistent stateif DB is consistent when the transaction begins. • Users can specify some simple integrity constraintson the data, and the DBMS will enforce these constraints. • Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed). • Thus, ensuring that a transaction (run alone) preserves consistency is ultimately the user’s responsibility!

  26. Scheduling Concurrent Transactions • DBMS ensures that execution of {T1, ... , Tn} is equivalent to some serial execution T1’ ... Tn’. • Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction. (Strict 2PL locking protocol.) • Idea: If an action of Ti (say, writing X) affects Tj (which perhaps reads X), one of them, say Ti, will obtain the lock on X first and Tj is forced to wait until Ti completes; this effectively orders the transactions. • What if Tj already has a lock on Y and Ti later requests a lock on Y? (Deadlock!) Ti or Tj is abortedand restarted!

  27. Ensuring Atomicity • DBMS ensures atomicity(all-or-nothing property) even if system crashes in the middle of a Xact. • Idea: Keep a log(history) of all actions carried out by the DBMS while executing a set of Xacts: • Before a change is made to the database, the corresponding log entry is forced to a safe location. (WAL protocol; OS support for this is often inadequate.) • After a crash, the effects of partially executed transactions are undone using the log. (Thanks to WAL, if log entry wasn’t saved before the crash, corresponding change was not applied to database!)

  28. The Log -- RECOVERY • The following actions are recorded in the log: • Ti writes an object:the old value and the new value. • Log record must go to disk beforethe changed page! • Ti commits/aborts:a log record indicating this action. • Log records chained together by Xact id, so it’s easy to undo a specific Xact (e.g., to resolve a deadlock). • Log is oftenduplexedand archived on “stable” storage. • All log related activities (and in fact, all CC related activities such as lock/unlock, dealing with deadlocks etc.) are handled transparently by the DBMS.

  29. Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Simplified Structure of a DBMS These layers must consider concurrency control and recovery • A typical DBMS has a layered architecture. • The figure does not show the concurrency control and recovery components. • This is one of several possible architectures; each system has its own variations.

  30. CLASSES OF DATABASE USERS • Database Administrators - Responsible for managing the database (helping users to define views, choosing alternative storage structures and access strategies, authorizing users, validating data, backup and recovery functions, monitoring performance, etc.) • Database Designers - Responsible for designing the database (could be the administrators) • Application Programmers / Systems Analysts - design and implement “canned” transactions (programs) for parametric users • End-Users - use the database for querying, updating, generating reports, etc.. CASUAL USERS (occasional), PARAMETRIC (use pre-programmed transactions to interact -- e.g.., a bank teller), and SOPHISTICATED (use full DBMS capabilities to implement complex applications.)

  31. OTHER IMPORTANT DATABASE USERS • Database Designers and Implementors - The Systems programmers that develop the SOFTWARE PACKAGE itself • Tool Developers - The people that design and implement tools that facilitate the use of the DBMS software (design tools, performance tools, special interfaces, etc.) • Operators and Maintenance Personnel - The people that work on running and maintaining the hardware and systems software environment for the database system.

  32. LANGUAGES ASSOCIATED WITH A DBMS (1) • Data Definition Language (DDL) - Used to express the conceptual schema of the database -- This schema is stored in the Data Dictionary (CATALOG). Often, a DDL is used to express also the internal and external schemas. In some DBMSs, two separate languages are used: SDL - Storage Definition Language (for internal schemas) VDL - View Definition Language (for external schemas) • Data Manipulation Language (DML) - Used to retrieve information and modify the database (insert, delete, update) - There are two major types of DML:Procedural DML, Declarative DML

  33. LANGUAGES ASSOCIATED WITH A DBMS (2) • Query Language - The subset of the DML which is used for RETRIEVAL • Data Sublanguage - The UNION of DML and DDL • Host Language - A programming language (COBOL, C, etc..) in which data Sublanguage statements are embedded

  34. DBMS USER INTERFACES • Stand-alone Query language Interface • Programming Interfaces for embedded DML • Pre-Compiler Approach • Procedure Call Approach (subroutines) • Non-technical User Interfaces • Menu-based, graphics-based, forms-based, natural language, combinations • Parametric Interfacesusing Function Keys • Report Generation Language Interfaces • Interfaces for the DBA • Creating accounts, granting authorizations • setting system parameters • changing schemas and access paths

  35. DBMS UTILITIES • Functions • Loading Data from files into the database • Backing up the database periodically on tape • Reorganizing database file structures • Report Generation Utilities • Performance Monitoring Utilities • Other functions (sorting, user monitoring, data compression, etc.) • Data Dictionary Utilities • For storing schema descriptions, design decisions, user information, usage standards, application program descriptions, etc. • Active data dictionary is accessed by the DBMS software and users • Passive data dictionary is accessed by users only

  36. Want to Use Database Management? • Purchase a DBMS • Train the staff to use it • Define the Schemas for the database • Load the database • Write Application Programs • Continuously Evolve the Database

  37. When NOT to Use Database Management • Main Costs of Using a DBMS - High initial investment and likely need for additional hardware - Overhead for providing generality, security, recovery, integrity, and concurrency control • When a DBMS is unnecessary for your application - The database and application are simple and very stable - There are pressing time requirements which may not be met because of the database system overhead - Access to data by multiple users is not required

  38. Database Management Systems: The OLD Actors • These Systems dominated the market in the before the mid-eighties • Many installations STILL exist, but no new sales happen • IMS (IBM) -- Hierarchical Model (with the language DL/1) • I-D-S (Honeywell) -- Network DBTG (Integrated Data Store) • IDMS (Cullinane) - Network (Integrated Data Mgmnt System) • TOTAL (Cincom) - Network • IMAGE (Hewlett-Packard) - Network • SYSTEM 2000 (Intel-MRI) - Inverted (ad-hoc model) Other Inverted: ADABAS (Software AG), Model 204 (CCA) • ...

  39. Database Management Systems: The Main Actors • DB2, running on all IBM or IBM-compatible platforms • ORACLE8 • SYBASE • INFORMIX • INGRES currently called, Computer Associates-Ask Group • The Other Players: • Rdb, Gupta Quadbase, Ralma, Watcom, XDB, ... • The MPP players (massively parallel): • Terradata (biggest), Tandem (NonStop SQL), Oracle Parallel Server, Informix, Sybase (Navigator), DB2, DEC,… • The Modern Players: ILLUSTRA, O2, etc.

  40. Database Management Systems: The Main Actors (b) • The PC Giants (coming along BIGGER) • Microsoft SQL Server • Powersoft • Gupta • These systems: (a) Bring-in SQL access (gateways) (b) Suitable for Client-Server (DBMS) (c) Look exactly like “bigger” DBMS

  41. Database Management Systems: The Main Actors (c) • The PC-based (coming along BIGGER but still on PC) • Paradox (Borland) • Microsoft Access 2 • Q&A (Symantec) • FileMaker Pro (Claris Corp.) • DataEase Express • Approach (Lotus) • Alpha Four • OLDER: xBASE, dBASE, FoxPro, MicroRIM... • Those systems are either “access packages” or they have minimal (rudimentary) facilities of a traditional DBMS

  42. Database Management Systems: The Main Actors (d) • Another Way to Look at them (PRICE) • 1 - xBASE, dBASE, ... • 2 - MS Access, Alpha, Approach, Paradox, Q&A, ... • 3 - MS SQL Server, Gupta, Powersoft... • 4 - Oracle, Informix, Sybase, DB2/6000, ... • 5 - DB2, Rdb, Tandem, Terradata... The number (1, 2, ...) implies the NUMBER of ZEROS in the list price of the database management system (one license)

  43. SUMMARY • DBMS used to maintain, query large datasets. • Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security. • Levels of abstraction give data independence. • A DBMS typically has a layered architecture. • DBAs hold responsible jobs and are well-paid! • DBMS R&D is one of the broadest, most exciting areas in CS.

More Related