Overview of Data Processing Systems at St. Xavier's College

Chapter – 3 Introduction to Data Base Management System DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Data Processing System • A data processing system takes raw data and, through the power of computer automation, produces information that a set of program applications has validated. Information includes text, arithmetic calculations, formulas and various other types of information and data based on the computer system. A data processing system is also called an automated data processing (ADP) unit or an electronic data processing (EDP) unit. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Serial processing is a system • in which only one step happens at a time (and so the steps go in a series). • Batch processing is used when there is a lot of transactions affecting a high percentage of master file records and the response needed is not immediate, usually until the end of the week or month. A good example of this in a large, national business would be payroll processing, where nearly every master file record will be affected. The data is collected over a period of time, then input and verified by clerks (verified means input by someone else and then both inputs are compared by computer) and processed centrally. The transactions are entered in batches by keyboard and stored in transaction files. These batches consist of thirty or so records, which are given a batch, control ID. The batches are then run through a validation process and to make sure the batches balance a computed total is compared with a manually produced total. This helps to ensure that all data is entered without error or omission. The actual updating of master files only takes place after verification and validation are complete. This means batch processing is often run overnight, unattended. A new master file is produced as a result of a batch-processing run. The original master file is kept along with a previous version. After processing the output is produced, and is usually printed media such as payslips or invoices, although this is changing with the advent of the web. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Real-time processing. The waiting time from input to response is minimum. Unreasonable However such fast systems are used in critical systems that control aircraft or the manufacture of sensitive or dangerous compounds. Online processing means users directly enter information online (usually, online, in this case, means online to a central processor, rather than its modern connotation of the Internet, but it could mean both!), it is validated and updated directly onto the master file. No new file is created in this case. Therefore, there is near immediate input process, and output. Imagine a cash dispenser transaction or booking a holiday at a travel agents or over the Internet. Compared with batch processing the number of transactions will be few. • centralized processing is processing performed in one computer or in a cluster of coupled computers in a single location. Access to the computer is via "dumb terminals," which send only input and receive output or "smart terminals," which add screen formatting. All data processing is performed in the central computer. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Distributed Processing The distribution of applications and business logic across multiple processing platforms. Distributed processing implies that processing will • occur on more than one processor in order for a transaction to be completed. In other words, processing is distributed across two or more machines and the processes are most likely not running at the same time, i.e. each process performs part of an application in a sequence. Often the data used in a distributed processing environment is also distributed across platforms. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Advantages of DBMS (Database Management Systems) are as follows: • A true DBMS offers several advantages over file processing. The principal advantages of a DBMS are the followings: • Flexibility: Because programs and data are independent, programs do not have to be modified when types of unrelated data are added to or deleted from the database, or when physical storage changes. • Fast response to information requests: Because data are integrated into a single database, complex requests can be handled much more rapidly then if the data were located in separate, non-integrated files. In many businesses, faster response means better customer service. • Multiple access: Database software allows data to be accessed in a variety of ways (such as through various key fields) and often, by using several programming languages (both 3GL and nonprocedural 4GL programs). • Lower user training costs: Users often find it easier to learn such systems and training costs may be reduced. Also, the total time taken to process requests may be shorter, which would increase user productivity. • Less storage: Theoretically, all occurrences of data items need be stored only once, thereby eliminating the storage of redundant data. System developers and database designers often use data normalization to minimize data redundancy. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

FILE ORGANIZATION • Concept: - File organization is the methodology which is applied primarily to the logical arrangement of data (which can itself be organized in a system of records with correlation between the fields/columns) in a file system. It should not be confused with the physical storage of the file in some types of storage media. There are certain basic types of computer file, which can include files stored as blocks of data and streams of data, where the information streams out of the file while it is being read until the end of the file is encountered. • We will look at two components of file organization here: • The way the internal file structure is arranged and • The external file as it is presented to the O/S or program that calls it. • Files are presented to the application as a stream of bytes and then an EOF (end of file) condition. A program that uses a file needs to know the structure of the file and needs to interpret its contents. There are four methods of organizing files. They are sequential, relative, indexed-sequential, and direct or hashed access organization. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Sequential Organization • A sequential file contains records organized in the order they were entered. The order of the records is fixed. The records are stored and sorted in physical, contiguous blocks within each block the records are in sequence. • Records in these files can only be read or written sequentially. • Once stored in the file, the record cannot be made shorter, or longer, or deleted. However, the record can be updated if the length does not change. (This is done by replacing the records by creating a new file.) New records will always appear at the end of the file. • If the order of the records in a file is not important, sequential organization will suffice, no matter how many records you may have. Sequential output is also useful for report printing or sequential reads which some programs prefer to do. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Relative Organization • A relative record file contains records ordered by their relative key, that is, the record number that represents the record location relative to where the file begins. For example, the first record in the file has a relative record number of 1, the tenth record has a relative record number of 10, and so forth. The records can have fixed length or variable length. • The record transmission modes allowed for relative files are sequential, random, or dynamic. When relative files are read or written sequentially, the sequence is that of the relative record number. • In this file organization, the records of the file are stored one after another both physically and logically. That is, record with sequence number 16 is located just after the 15th record. • ADVANTAGES of RELATIVE FILES • Quite easy to process, • If you can know the key value of the record that you need to find, there is no need for a search and you can access the record almost instantaneously, • DISADVANTAGE of RELATIVE FILES • Can be only used in conjunction with consecutive numerical keys. This disadvantage (only numerical and consecutive values for the key value) is overcome with a completely different file structure, namely the INDEXED SEQUENTIAL FILE. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Indexed-Sequential Organization • Key searches are improved by this system too. The single-level indexing structure is the simplest one where a file, whose records are pairs, contains a key pointer. This pointer is the position in the data file of the record with the given key. A subset of the records, which are evenly spaced along the data file, is indexed, in order to mark intervals of data records. • This is how a key search is performed: the search key is compared with the index keys to find the highest index key coming in front of the search key, while a linear search is performed from the record that the index key points to, until the search key is matched or until the record pointed to by the next index entry is reached. Regardless of double file access (index + data) required by this sort of search, the access time reduction is significant compared with sequential file searches. • Primary Area:-Contains file records stored by key or ID numbers. • Overflow Area:-Contains records area that cannot be placed in primary area. • Index Area:-It contains keys of records and there locations on the disc. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Direct or Hashed Access • With direct or hashed access a portion of disk space is reserved and a “hashing” algorithm computes the record address. So there is additional space required for this kind of file in the store. Records are placed randomly throughout the file. Records are accessed by addresses that specify their disc location. Also, this type of file organization requires a disk storage rather than tape. It has an excellent search retrieval performance, but care must be taken to maintain the indexes. If the indexes become corrupt, what is left may as well go to the bit-bucket, so it is as well to have regular backups of this kind of file just as it is for all stored valuable data! DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

TYPES OF DATABASES • Hierarchical Model • The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and child data segments. This structure implies that a record can have repeating information, generally in the child data segments. Data in a series of records, which have a set of field values attached to it. It collects all the instances of a specific record together as a record type. These record types are the equivalent of tables in the relational model, and with the individual records being the equivalent of rows. To create links between these record types, the hierarchical model uses Parent Child Relationships. These are a 1:N mapping between record types. This is done by using trees, like set theory used in the relational model, "borrowed" from maths. • For example, an organization might store information about an employee, such as name, employee number, department, salary. The organization might also store information about an employee's children, such as name and date of birth. The employee and children data forms a hierarchy, where the employee data represents the parent segment and the children data represents the child segment. If an employee has three children, then there would be three child segments associated with one employee segment. In a hierarchical database the parent-child relationship is one to many. This restricts a child segment to having only one parent segment. • Hierarchical DBMSs were popular from the late 1960s, with the introduction of IBM's Information Management System (IMS) DBMS, through the 1970s. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Network Model • The basic data modeling construct in the network model is the set construct. A set consists of an owner record type, a set name, and a member record type. A member record type can have that role in more than one set, hence the multiparent concept is supported. An owner record type can also be a member or owner in another set. The data model is a simple network, and link and intersection record types (called junction records by IDMS) may exist, as well as sets between them . Thus, the complete network of relationships is represented by several pairwise sets; in each set some (one) record type is owner (at the tail of the network arrow) and one or more record types are members (at the head of the relationship arrow). Usually, a set defines a 1:M relationship, although 1:1 is permitted. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Relational Model • (RDBMS - relational database management system) A database based on the relational model developed by E.F. Codd. A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints. In such a database the data and relations between them are organised in tables. A table is a collection of records and each record in a table contains the same fields. • Properties of Relational Tables: • · Values Are Atomic • · Each Row is Unique • · Column Values Are of the Same Kind • · The Sequence of Columns is Insignificant • · The Sequence of Rows is Insignificant • · Each Column Has a Unique Name • Certain fields may be designated as keys, which means that searches for specific values of that field will use indexing to speed them up. Where fields in two different tables take values from the same set, a join operation can be performed to select related records in the two tables by matching values in those fields. Often, but not always, the fields will have the same name in both tables. For example, an "orders" table might contain (customer-ID, product-code) pairs and a "products" table might contain (product-code, price) pairs so to calculate a given customer's bill you would sum the prices of all products ordered by that customer by joining on the product-code fields of the two tables. This can be extended to joining multiple tables on multiple fields. Because these relationships are only specified at retreival time, relational databases are classed as dynamic database management system. The RELATIONAL database model is based on the Relational Algebra. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Object/Relational Model • Object/relational database management systems (ORDBMSs) add new object storage capabilities to the relational systems at the core of modern information systems. These new facilities integrate management of traditional fielded data, complex objects such as time-series and geospatial data and diverse binary media such as audio, video, images, and applets. By encapsulating methods with data structures, an ORDBMS server can execute complex analytical and data15 manipulation operations to search and transform multimedia and other complex objects. • As an evolutionary technology, the object/relational (OR) approach has inherited the robust transaction- and performance-management features of it s relational ancestor and the flexibility of its object-oriented cousin. Database designers can work with familiar tabular structures and data definition languages (DDLs) while assimilating new object-management possibi lities. Query and procedural languages and call interfaces in ORDBMSs are familiar: SQL3, vendor procedural languages, and ODBC, JDBC, and proprie tary call interfaces are all extensions of RDBMS languages and interfaces. And the leading vendors are, of course, quite well known: IBM, Inform ix, and Oracle. DEPT. OF INFORMATION TECHNOLOGY,ST.Xavier's College

Overview of Data Processing Systems at St. Xavier's College