1 / 23

Introduction to Databases

Introduction to Databases. Week 1, Day 1 (based on Ch 1 of Connolly and Begg). Introduction to Databases - Outline. Before Databases Some history not in the text File Based Approach Illustrated with real world problems Database Approach With simplified advantages & disadvantages.

Télécharger la présentation

Introduction to Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Databases Week 1, Day 1 (based on Ch 1 of Connolly and Begg) CMPT 355 Sept-Dec 2010 - w1d1

  2. Introduction to Databases - Outline • Before Databases • Some history not in the text • File Based Approach • Illustrated with real world problems • Database Approach • With simplified advantages & disadvantages CMPT 355 Sept-Dec 2010 - w1d1

  3. Before Databases - Outline • Some Basic Concepts • Record • File • Field • Accessing Data • Sequential Access • Direct Access • Record Keys • Indexed Sequential Access • Random Access • Some Problem Scenarios CMPT 355 Sept-Dec 2010 - w1d1

  4. B db – Some Basic Concepts Record: • The name “record” is based on traditional “recorded” documents. • Earliest “records” were 80 column punch cards • “The card is often called a unit record, because data is restricted to the 80 columns, and the card is read or punched as a unit of information.”1 • Other definitions • “A stored record is an identifiable collection of data elements.”2 • “A record is some collection of attributes that describe some entity or event.”3 1 Introduction to IBM Data Processing Systems, 1964, IBM, F22-6517-2 2 Introduction to Data Management, 1970, IBM, SC20-8096-0 3 J. Carter, Developing e-Commerce Systems, Prentice-Hall, 2002 CMPT 355 Sept-Dec 2010 - w1d1

  5. B db – Some Basic Concepts Record (cont): • A record is the basic unit of stored data that an user recognizes. • E.g. customer record, sales slip record. • Problems • Different users may • have different records for the same data • use different versions of the same record • Currently dealt with as records / rows / views. CMPT 355 Sept-Dec 2010 - w1d1

  6. B db – Some Basic Concepts File: • Name “file” based on traditional file folders and filing cabinets • “A named collection of occurrences of logical records which may be of more than one logical record type; a set of application record values, pertaining to one or more record formats.”1 • Other definitions • “Stored records are grouped on storage volumes as data sets.”2 • “A collection of similar records that may be used individually or together”3 1 Data Base Concepts, 1971, IBM, ZR20-4219-0 2 Introduction to Data Management, 1970, IBM, SC20-8096-0 3 J. Carter, Developing e-Commerce Systems, 2002, Prentice-Hall CMPT 355 Sept-Dec 2010 - w1d1

  7. B db – Some Basic Concepts File (cont): • The basic unit of stored data that an operating system recognizes • E.g. Customer file, sales slip file • Currently dealt with as a tables. CMPT 355 Sept-Dec 2010 - w1d1

  8. B db – Some Basic Concepts Field: • The name “field” is based on traditional “fields” that need to be filled in on forms • “A field is the smallest meaningful unit of information of interest.”1 • Other related definitions • “The smallest unit of logical data of concern to a programmer.”2 1 Introduction to Data Management, 1970, IBM, SC20-8096-0 2 Data Base Concepts, 1971, IBM, ZR20-4219-0 CMPT 355 Sept-Dec 2010 - w1d1

  9. B db – Some Basic Concepts Field: • The basic unit of stored data that a program recognizes • E.g. customer name, sales slip id number • Problems • Does name {first + last} require 1 or 2 fields? • How many fields do you use for an address? • How many fields are needed on a sales slip to record all items purchased? • Currently dealt with as a data attribute. CMPT 355 Sept-Dec 2010 - w1d1

  10. B db – Accessing Data Sequential Access • The method of using tape storage. (Consider accessing a song on a cassette tape.) • Easiest to use if sorted based on some field of information (usually a record key) • “Each file is made up of records, each containing information required to describe completely a single item. The sequence may be by item number, name, account number, or man number, but all files in a single application must be in the same sequence.”1 • Updates + Old File = New File 1 Introduction to IBM Data Processing Systems, 1964, IBM, F22-6517-2 CMPT 355 Sept-Dec 2010 - w1d1

  11. B db – Accessing Data Record keys • According to IBM 1 • “The data element chosen to order the (sequential) data set is called the key. • “The sequence of data may be changed by selecting a different data element to be the key and sorting the stored records according to the values of the new key. • “In some cases, using one data element as a key is not sufficient to identify a given stored record. In this case, one or more additional data elements would be concatenated to form the key.” 1 Introduction to Data Management, 1970, IBM, SC20-8096-0 CMPT 355 Sept-Dec 2010 - w1d1

  12. B db – Accessing Data Direct Access • The first access method designed to make use of the ability to quickly go to any location on a disk. • Records stored in fixed locations based on the values of key fields that can be directly mapped to a physical location on disk. • There must be space for records with each possible record key value. • Usually record key values are allocated sequentially to ensure that all storage locations are used (at least initially). • Records are updated in their original location. CMPT 355 Sept-Dec 2010 - w1d1

  13. B db – Accessing Data Indexed Sequential Access • Optimized access speed with storage space utilization as a major improvement over direct access. • Records stored in FCFS manner are quickly accessed by using an index of pointers from record keys to the locations of the records. • Index needs to be resorted each time it is updated. • Records are updated in their original location. CMPT 355 Sept-Dec 2010 - w1d1

  14. B db – Accessing Data Random Access • Optimized access speed with storage space utilization as a major improvement over direct access. • Records stored in at particular locations based on hashing values of the record key. • If multiple records hash to the same location, need to be able to deal with as small chains of records. • Records are updated in their original location. CMPT 355 Sept-Dec 2010 - w1d1

  15. B db – Some Problem Scenarios • Me as a Grad Student moving from place to place • data redundancy • Me trying to get the Registrars people to work with the residence halls • data availability • Me looking for a book in the library • data sharability • Me answering a survey about my favorite beer  data evolvability CMPT 355 Sept-Dec 2010 - w1d1

  16. File Based Approach - Outline • Definition • Development • Disadvantages CMPT 355 Sept-Dec 2010 - w1d1

  17. File Based Approach Definition A file based system is • A collection of application programs that perform services for the end-users such as the production of reports. Each program defines and manages its own data. Text p.7 Section 1.2.1 CMPT 355 Sept-Dec 2010 - w1d1

  18. File Based Approach Development • Typically developed • bottom-up • to meet the needs of a small group of users • often on local departmental systems • Evolution may be limited by initial design CMPT 355 Sept-Dec 2010 - w1d1

  19. File Based Approach Disadvantages • Separation and isolation of data • Hard to link data in several files - limiting  data sharability • Duplication of data • Waste and inconsistency - due to  data redundancy • Data dependence • Program and data structures are highly interdependent - limiting  data evolvability • Incompatible file formats • Between programs and programming languages - further limiting  data sharability • No standard for queries • You have to develop you own queries - to get  data availability CMPT 355 Sept-Dec 2010 - w1d1

  20. Database Approach - Outline • Definitions • Advantages • Disadvantages CMPT 355 Sept-Dec 2010 - w1d1

  21. Database Approach Definitions A database is • A shared collection of logically related data, and a description of this data, designed to meet the information needs of an organization. Text p.14 Section 1.3.1 CMPT 355 Sept-Dec 2010 - w1d1

  22. Database Approach Advantages • Data integrity • Ensuring the correctness, protection, and security of the data • Data sharability • Ensuring the ability to share data between applications and between users on a need-to-know basis • Data availability • Ensuring the ability to access the data when and where it is needed • Database evolvability • Ensuring that the database can be modified to meet changing needs • Avoiding redundancy • That occurs where multiple (often incompatible and inconsistently updated) copies of data are collected and used independently of one another CMPT 355 Sept-Dec 2010 - w1d1

  23. Database Approach Disadvantages • Complexity • Requires highly trained staff • Requires organizational infrastructure to handle costs, evolutionary planning, hardware and software support • Cost of (and Dependence on) DBMS • Large high performance DBMS have very high costs • Large high performance DBMS are closed source • Cost of Conversion • Interfacing with or ignoring legacy systems • Performance • Additions for new applications may slow down existing applications • High Impact of Failure • Moving from department threatening to organization threatening levels of risk CMPT 355 Sept-Dec 2010 - w1d1

More Related