490 likes | 757 Vues
Database. A database is a collection of related data example: 1. collection of names, home address and telephone numbers 2. collection of words to make paragraph in a page A database represents some aspect of the real world, sometimes called the miniworld. Database Management System.
E N D
Database • A database is a collection of related data • example: • 1. collection of names, home address and telephone numbers • 2. collection of words to make paragraph in a page • A database represents some aspect of the real world, sometimes called the miniworld .
Database Management System • Is a collection of programs that enables users to create and maintain a database. • The DBMS is a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications • Defining: a database involves specifying the data types, structures, and constraints for the data to be stored in the database.
DBMS ….. • Constructing: the database is the process of storing the data itself on some storage medium that is controlled by the DBMS. • Manipulating: a database includes such functions as querying the database to retrieve specific data, updating the database to reflect changes in the miniworld, and generating reports from the data. • Database and DBMS software together a database system
Example • UNIVERSITY database for maintaining information concerning students, courses, and grades in a university environment • define: file (records), data elements, data type ( for each data element) • construct: store data in the appropriate files (note that records may be related between files) • Manipulation: querying, updating • informal queries and updates must be specified precisely in the database system language before they can be processed.
Purpose of Database Systems Drawbacks of using file systems to store data: Data redundancy and inconsistency Multiple file formats, duplication of information in different files Difficulty in accessing data Need to write a new program to carry out each new task Data isolation — multiple files and formats Integrity problems Integrity constraints (e.g. account balance > 0) become “buried” in program code rather than being stated explicitly Hard to add new constraints or change existing ones
Purpose of Database Systems • Drawbacks of using file systems • Atomicity of updates • Failures may leave database in an inconsistent state with partial updates carried out • Example: Transfer of funds from one account to another should either complete or not happen at all • Concurrent access by multiple users • Concurrent accessed needed for performance • Uncontrolled concurrent accesses can lead to inconsistencies • Example: Two people reading a balance and updating it at the same time • Security problems • Hard to provide user access to some, but not all, data • Database systems offer solutions to all the above problems
Characteristics • Self-describing nature of a database system: A DBMS catalog stores the description of the database. The description is called meta-data). This allows the DBMS software to work with different databases. • Insulation between programs and data: Called program-data independence. Allows changing data storage structures and operations without having to change the DBMS access programs. • Data Abstraction: A data model is used to hide storage details and present the users with a conceptual view of the database. • Support of multiple views of the data: Each user may see a different view of the database, which describes only the data of interest to that user.
DBMS components • Hardware The physical computer system that allows physical access to data • Software The actual program that allows users to access, maintain, and update physical data • Data: stored physically on the storage devices • User: include end users and application programs • Procedure A set of procedures (rules) that should be clearly defined and followed by the users of the database
Database Actors • Database Administrators • In a database environment, the primary resource is the database itself and the secondary resource is the DBMS and related software • authorizing access to the database • coordinating and monitoring its use • acquiring software and hardware resources as needed • Database Designers • identifying the data to be stored in the database • choosing appropriate structures to represent and store this data undertaken before the database is actually implemented and populated with data
Database Actors ….. • communicate with all prospective database users, in order to understand their requirements • develop a view of the database that meets the data and processing requirements for each group of users • These views are then analyzed and integrated with the views of other user groups. The final database design must be capable of supporting the requirements of all user groups • End Users • access to the database for querying, updating, and generating reports • Casual end users: • occasionally access the database • need different information each time • learn only a few facilities that they may use repeatedly.
Database Actors ….. • Naive or parametric end users • constantly querying and updating the database, using standard types of queries and updates called canned transactions that have been carefully programmed and tested • Eg :Bank tellers check account balances and post withdrawals and deposits • Reservation clerks for airlines, hotels, and car rental companies check availability for a given request and make reservations
Database Actors ….. • Sophisticated end users • Engineers, scientists, business analysts, and others who thoroughly familiarize themselves with the facilities of the DBMS so as to implement their applications to meet their complex requirements • Stand-alone users • Maintain personal databases by using ready-made program packages that provide easy-to-use menu- or graphics-based interfaces. • An example is the user of a tax package that stores a variety of personal financial data for tax purposes
Database Actors ….. • System Analysts and Application Programmers • Determine the requirements of end users, especially naive and parametric end users, and develop specifications for canned transactions that meet these requirements • Application programmers implement these specifications as programs; then they test, debug, document, and maintain these canned transactions • Workers behind the Scene • Do not use the database for their own purposes • DBMS system designers and implementers • design and implement the DBMS modules (for implementing the catalog, query language, interface processors, data access, concurrency control, recovery, and security. ) and interfaces as a software package
Database Actors ….. • Tool developers • Tools are optional packages that are often purchased separately • include packages for database design, performance monitoring, natural language or graphical interfaces, prototyping, simulation, and test data generation. • Operators and maintenance personnel • system administration personnel who are responsible for the actual running and maintenance of the hardware and software environment for the database system
Advantages of Using DBMS 1. Controlling Redundancy • Problems: • there is the need to perform a single logical update (duplication of effort) • storage space is wasted when the same data is stored repeatedly files that represent the same data may become inconsistent(typos) • in DBMS there has the ability to control redundancy which improve the performance of the query. • the DBMS should have the capability to control this redundancy so as to prohibit inconsistencies among the files
2. Restricting Unauthorized Access users or user groups are given account numbers protected by passwords, which they can use to gain access to the database through security and authorization subsystem. 3. Providing Persistent Storage for Program Objects and Data Structures Programming languages have complex data structures, such as record types in PASCAL or class definitions in C++. The values of program variables are discarded once a program terminates, unless the programmer explicitly stores them in permanent files, which often involves converting these complex structures into a format suitable for file storage When the need arises to read this data once more, the programmer must convert from the file format to the program variable structure.
4. Providing Multiple User Interfaces • DBMS provides a variety of user interfaces:query languages for casual users; programming language interfaces for application programmers; forms and command codes for parametric users; and menu-driven interfaces and natural language interfaces for stand-alone users. • Capabilities for providing World Wide Web access to a database 5. Representing Complex Relationships Among Data: Variety of data interrelated in many ways Enforcing Integrity Constraints • The simplest type involves specifying a data type for each data item • A more complex type involves specifying that a record in one file must be related to records in other files. • Another type specifies uniqueness on data item values • It is the database designers’ responsibility to identify integrity constraints during database design. Providing Backup and Recovery
DBA(Database Administrator) A DBA is a full time manager responsible for maintaining DBMS and ensuring accuracy and the integrity of data. What data should go into the database • What relationships should exist between different data items • Who has permission to read the database information (allocate passwords) • Who has the authority to update the database. • Be involved in the design of the database and make changes if necessary • Keep users informed of any changes • Maintain the data dictionary • Provide training to users
Brief History Early Database Applications: The Hierarchical and Network Models were introduced in mid 1960’s and dominated during the seventies. A bulk of the worldwide database processing still occurs using these models. Relational Model based Systems: The model that was originally introduced in 1970 was heavily researched and experimented with in IBM and the universities. Relational DBMS Products emerged in the 1980’s. Object-oriented applications: OODBMSs were introduced in late 1980’s and early 1990’s to cater to the need of complex data processing in CAD and other applications. Their use has not taken off much. Data on the Web and E-commerce Applications: Web contains data in HTML (Hypertext markup language) with links among pages. This has given rise to a new set of applications and E-commerce is using new standards like XML (eXtended Markup Language)
Data Models Database System Concepts and Architecture • Data Model is a collection of concepts that can be used to describe the structure of a data. • The structure of data is the data types, relationships, and constraints • Data Model may include a set of basic operations (update and retrieve) • Object oriented Data Model can include set of user defined operations within the Data Model [Dynamic behavior of the DB applications]
Categories of Data Models • Many Data Models have been proposed and according to the types of data structure they can be categorized to these three categories: • High-level (Conceptual Data Model): it is very close to how users perceive data. It uses concepts as: • Entities: real world objects as students, courses, projects, etc. • Attributes: properties of the entity as names, scores, Tel, etc. • Relationships: relations among some entities to represent the interaction among them as enrolled-in relation between student and courses entities.
Categories of Data Models • Low-level (Physical Data Model): it provides concepts of how data is stored in computer. • It is used by computer specialists • It is interested in record formats, record orderings and access path to mange the data efficiently. • Representation (Implementation) Data Model: it is a mid point between the previous two data models. • It includes concepts that can be understood by end user but not far from some details of the physical representations. • It hides some physical details but can be directly implemented. • It is used by most of the traditional DBMSs.
Categories of Data Models • Representational Data Models include the widely-used “Relational Data Model” and the legacy Network and Hierarchical Data Models. • It represents data using record structure “record-based Data Model”. • Object oriented Data Models is a higher-level implementation data model and are closer to the conceptual data model.
Hierarchical database model In the hierarchical model, data is organized as an inverted tree. Each entity has only one parent but can have several children. At the top of the hierarchy, there is one entity, which is called the root. An example of the hierarchical model representing a university
Network database model In the network model, the entities are organized in a graph, in which some entities can be accessed through several paths (Figure 14.4). An example of the network model representing a university
Relational database model In the relational model, data is organized in two-dimensional tables called relations. The tables or relations are, however, related to each other, as we will see shortly. An example of the relational model representing a university
Schemas The description of a database is called the DB Schema. • It is specified during the DB design • It is not expected to be changed frequently • It is usually represented by diagrams (Schema diagram) • Each object of the schema is called a schema construct • It specifies some aspect (names of records and data items) and data type and relationships are not specified. • The schema is called the “intension”. • The stored description of the schema is called the “meta-data”
Schema Diagram for the University DB Name StudentNumber Class Major STUDENT COURSE PREREQUISITE SECTION GRADE_RPEORT CourseName CourseNumber CreditHours Department CourseNumber PrerequisiteNumber SectionIdentifier CourseNumber Semester Year Instructor StudentNumber SectionIdentifier Grade
The data in the DB at a particular moment in time is called a Database State or Snapshot or the Current Set of Occurrence or Instances • Database state • The data in the database at a particular moment in time is called a database state • The distinction between database schema and database state is very important • When we define a new database, we specify its database schema only to the DBMS • the corresponding database state is the empty state with no data • initial state of the database when the database is first loaded • From then on, every time an update operation is applied to the database, we get another database state. • Valid State: a state that satisfies the structure and constrains specified in the schema. • The database schema changes very infrequently. • The database state changes every time the database is updated
DBS Architecture and Data Independence The Three-Schema Architecture was proposed to achieve the DBMS characteristics (program-data independence, Multiusers capability , self-description nature of data). • The Internal Level: • Has an internal schema which describes the physical storage of the DB. • Internal schema has a Physical Data Model(describe detail data storage & access paths for database).
DBS Architecture and Data Independence • The conceptual Level: • has a conceptual schema which describes the structure of the DB • hides the details of the physical storage structure. • describes the entities, data types, relationships, constraints and user operations. • high-level conceptual data model or implementation data model can be used at this level. • The External or View Level: • has a number of external schemas or user views. • provides each group of users by a view which describes a part of the DB which is interested to that group & hide rest database to that user. • also high-level conceptual data model or implementation data model can be used at this level.
Three-Schema architecture • Tool for the user to understand and visualize the schema levels of the DB system. • Most of the DBMSs do not separate the three levels completely. • Some DBMSs may include physical details in the conceptual schema. • Most of the DBMSs include external views in the conceptual data model. • DBMSs perform mappings to transform requests and results between the levels. • The processes of transforming requests and results between levels are called mapping. • Mappings between levels is a time-consuming process, so some DBMSs do not support the external views. • Database server only accessible via middle tier Clients cannot directly access database server • The presentation layer displays information to the user. • The intermediate layer or middle layer is sometimes called the application server or Web server • Three-tier Architecture Can Enhance Security:
End Users External Level External View External View Conceptual Level Conceptual Schema Internal Level Internal Schema DB DB DB The Three-Schema Architecture External/Conceptual mapping Conceptual / Internal mapping
Data Independence It is the capacity to change the schema at one level of a database without having to change the schema at the next level. The mapping between the levels is only changed to reflect the change of the schema. • Data dictionary / repository: • Used to store schema descriptions and other information such as design decisions, application program descriptions, user information, usage standards, etc.
There are two type of Data Independence which are Logical data independence and Physical data independence. • Logical data independence: • is the ability of changing the conceptual Schema without having to change the external schema or the application programs. Physical data independence: is the ability of changing the internal Schema without having to change the conceptual schema. Internal schema may be changed due to some file reorganization to meet certain system performance.
DBMS Languages • DDL (data definition language) • It is used by the DBA and the DB designers to define their schemas. • DDL compiler is exist to process the DDL statements and store the schema into the DBMS catalog. • SDL (storage definition language) • It is used in some DBMSs to define the internal schema. • VDL (view definition language) • It is used in some true three-level schema system to define the external schema. • DML (data manipulation language) • It is used in retrieving, inserting, deleting and manipulating data. • Current DBMSs have one comprehensive language contains DDL, VDL and DML. SDL is usually kept separate.
DBMS Languages • SQL (Structured Query Language) is a relational DB language includes a combination of DDL, VDL and DML and also statements for schema updating. • There two types of DMLs: • Nonprocedural DML (high-level)DML) • It is used in constructing a complex DB operations concisely. • It can be entered interactively from terminal or embedded in a general-purpose language. • It can specify and retrieve many records in a single statement (set-at-a-time or set oriented). • If it is used in a stand-alone interactive manner, it is called Query language. • Procedural DML (low-level DML) • It is embedded in a general-purpose-language. • It retrieves individual records and processes each separately (record-at-a time) In both low and high level languages the general-purpose language is called the host language and DML is called sublanguage.
DBMS Interfaces • Menu-based interfaces: • They present the user with lists of menus to formulate request. • The menus generate the request and send it to the DBMS. • Form-based interfaces: • They display forms to users to help them in entering new data or retrieving required data. • DBMS will retrieve the matching data for the remaining entries. • Forms specification language are special languages that help programmers to specify such forms • Graphical user interfaces: • They display the schema in diagrammatic form where users can specify their queries by manipulating these diagrams.
Natural Language interface Refers to words in its schema to interprete the request. If interpretation success,interface generates a high level query . Parametric Users Small set of abbrevation command is included to minimizing the number of key stokes.eg bank teller. Interface for the DBA Commands that can be used only by the DBA’s staff. Eg creating account,granting account authorization etc.
Database System Environment DBMS is a complex software which includes many modules that support all user and system DB functions. • OS controls the disk access functions • DDL compiler performs schema definitions and loads the data description information into the DB catalog. • Run-time DB processor handles the DB accesses at run time.
The Database System Environment • Query Compiler handles high-level queries by parsing, analyzing and compiling their statements. Then it calls the Run-time processor to execute it. • Pre-compiler extracts the DML commands from the host language. • DML compiler compiles the DML to build the code which is passed to the Run-time processor for execution.
DB systems Utilities • Loading data stored in files into a database. Includes data conversion tools. • Backing up the database periodically on tape. • Reorganizing database file structures. • Report generation utilities. • Performance monitoring utilities. • Other functions, such as sorting, user monitoring, data compression, etc
DB systems Utilities Backup: it is the process of creating a backup copy (dump or incremental) in case of failure. File reorganization: it is used to reorganize the database files to improve the performance. Performance monitoring: it is used in monitoring the usage of the database to provide the DBA by statistics to help in the performance analysis studies.
Tools supporting DB users • CASE tools: are used in the design phase of the DB. There are also tools to manage the dictionary. • Application development environment: are used in developing DB applications (DB design, GUI development, querying and updating, and application programs development . • Communication software: are used in providing DB access to users at locations remote from the DB.
Several criteria which are used to classify DBMSs are : • Data Model i) Relational data model ii) Object data model iii) Hierarchical data model iv) Network data model 2. Number of users i) Single user ii) Multi user 3.Number of sites i) Centralized ii) Distributed iii) Homogenous iv) Heterogenous 4. Cost 5. Types of access path : General or Special Purpose