File Management Chapter 12
Files and File systems • From user’s point of view, this is one of important parts of OS. • File system provides the resource abstractions typically associated with secondary storage. • It permit user to create data collections, called files with the following properties: • Long-term existence • Files are stored on disk/other secondary storage do not disappear when a user logs off. • Sharable between processes • Files have names and can have associated access permissions that permit controlled sharing
Files and File systems • Structure • file can have internal structure that convenient for a particular applications. It can be organized in hierararchical structure. • A collection of functions that can be performed on files: • Create • New file is define and positioned within the structure of files • Delete • A file is removed from the file structure and destroyed
Files and File systems • Open • An existing file is declared to be “opened” by a process, allowing the process to perform function on the file • Close • The file is closed with to respect to a process, process no longer may perform functions on file • Read • a process reads all /portion of data in a file • Write • Process updates a file , add new data /changing values.
File structure • Terms are commonly used when discussing about files: • Field • Basic element of data • An individual field contains single value, e.g. employee‘s name • It’s characterized by its length and data type • Can be fixed or variable length depending on file design • Can contains subfields
File structure • Record • Collection of related fields • Can be treated as a unit by some application program. • Exp: employee record have fields such as name, social sec number, date hired etc… • Can be fixed/variable length • File • A collection of similar records • Treated as s single entity by users and applications and may be referenced by name • May be created and deleted • Applying access control
File structure • Database • Collection of related data. • Essential aspects of database are that the relationships that exist among elements of data are explicit and the database is designed for use by number of different applications. • May contain all of the info related to an organization. • Consists one/more types of files
File structure • Operations that must be supported when to use files: • Retrieve_All • Retrieve all the record of a file. • Required for an application that must process all of the info in the file at one time. • E.g.: application that produces a summary of the info in the file • This operation is often equated with the term sequential processing • since all records are access in sequence.O
File structure • Retrieve_One: • Just retrieve one record • E.g.: interactive transaction-oriented applications need this operation. • Retrieve_Next • Retrieve the record that is “next” in some logical sequence to the most recently retrieved record. • E.g.: interactive application like filling in forms, performing a search operation.
File structure • Retrieve_Previous • Record that is “previous” to the currently accessed record is retrieved. • Insert_One • Insert new record into the file. • Delete_One • Delete an existing record. • Update_one • Retrieve a record, update one/more of its field and rewrite the updated record back into the file.
File structure • Retrieve_Few • Retrieve a number of record. • The nature of the operations that are most commonly performed on a file will influence the way the file is organized.
File Management Systems (FMS) set of system software that provides services to users an applications in the use of files. • Users/application may access files in through the FMS. • Objectives: • To meet the data mgmt needs and requirements of the user, which include storage of data and the ability to perform the operation required. • To guarantee, to the extend possible, that the data in the file are valid.
File Management Systems (FMS) • To optimize performance, both from the system point of view in terms of overall throughput and from user’s point of view in term of response time. • To provide I/O support for a variety of storage device types. • To minimize/eliminate the potential for lost /destroyed data • To provide a standardize set of I/O interface routines to use processes • To provide I/O support for multiple users.
File Management Systems (FMS) • For objective 1: meeting user requirement • Requirements depends on the variety of applications and the environment in which the computer system will be used. • For an interactive general-purpose system, the following constitute a minimal set of requirements: • each user should be able to create, delete, read,write,modify files. • Each user may have controlled access to other users’s files
File Management Systems (FMS) • Each user may control what types of accesses are allowed to the user’s files • Each user should be able to restructure the user’s files in a form appropriate to the problem. • Each user should be able to move data between files • Each user should be able to back up and recover the user’s files in case of damage • Each user should be able to access the user’s files by using symbolic names
File System Architecture • Need to look at software organization in order to understand file mgmt. • Figure 12.1 show the File system software architecture. • Lowest level: • device drivers communicate directly with peripheral devices • Device driver responsible for starting I/O operations on a device and processing the completion of an I/O request. • Exp: disk and tape. • Part of OS.
File System Architecture • Basic file system/physical I/O: • Primary interface with the environment outside of the computer system. • It deals with blocks of data that are exchanged with disk/tape • Concerns with the placement of those blocks on the 2nd storage • And on the buffering in main memory • Part of OS
File System Architecture • Basic I/O supervisor • Responsible for all file I/O initiation and termination • Control structures are maintained that deals with device I/O, scheduling and file status • Part of OS
File System Architecture • Logical I/O • Enables users and applications to access records • Deals with file records. • Provides a general-purpose record I/O capability and maintained basic data about files. • Access method • Level that closest to the user • Provide standard interface between application and the file system and devices that hold the data • Different access methods reflect different file structures and way of accessing and processing the data
File Management Functions • Another way of viewing the functions of a file system is shown in Figure 12.2 • User and application program interact with the file system by means of commands for creating and deleting files and performing operations on files. • Before performing any operation the file system identify and locate a selected file • Use a directory to describe the location of all files plus their attributes
File Management Functions • On a shared system enforce user access control • Only authorized users are allowed to access files. • Basic operations may perform on a file are performed at record level • Files are viewed as some structure that organizes the record • Sequential structure – employee name stored alphabetically by last name • Thus, to translate user commands into specific file manipulation commands, the access method appropriate to this file structure must be employed.
File Management Functions • I/O is done on block basis. • The records of a file must be blocked for output and unblocked after input. • To support block I/O files: • Secondary storage must be managed • Allocating files to free blocks • Managing free storage for available blocks.
File Organization and access • File organization refer to the logical structuring of the records as determined by the way in which they are accessed. • Criteria need to look when choosing a file organization: • Short access time • Ease of update • Economy of storage • Simple maintenance • Reliability
Continue.. • Focus on 5 organizations : • The pile • The sequential file • The indexed sequential file • The indexed file • The direct/hashed file
The pile Least complicated Data are collected in the order in which they arrive Each record consists of one burst of data Purpose: simply to accumulate the mass of data and save it. Records may have different fields/similar fields in different order Each field should be self-describing, filed name as well as value The length of the field must be implicitly indicated by delimiters No structure to the pile record, record access is by exhaustive search.
Continue.. i.e: need to find record that contains a particular field with a particular value, necessary to examine each record in the pile until found/not found. Pile files are encountered when data are collected and stored prior to processing/when data not easy to organize Uses space well when the stored data vary in size and structure Perfectly adequate for exhaustive searches, easy to update Not suit for most applications.
The sequential file Most common A fixed format is used for records All records are of the same length, consisting of the same number of fixed-length fields in a particular order First field in each record is referred as key field. The key field uniquely identifies the record Usually used for batch application Easily stored on tape/disk For interactive application that involve queries-poor performance
Continue • The Sequential File • New records are placed in a log file or transaction file • Batch update is performed to merge the log file with the master file
The indexed sequential file • Maintains the key characteristic of the sequential file • Records are organized in sequence based on key field. • Add two features: • An index to the file to support random access • An overflow file • Index provides lookup capability to reach quickly • Overflow similar to log file used with sequential file but is integrated so that record in the overflow file is located by following a pointer from its predecessor record.
Continue.. • Comparison of sequential and indexed sequential • Example: a file contains 1 million records • On average 500,00 accesses are required to find a record in a sequential file • If an index contains 1000 entries, it will take on average 500 accesses to find the key, followed by 500 accesses in the main file. Now on average it is 1000 accesses
Indexed File • Uses multiple indexes for different key fields • May contain an exhaustive index that contains one entry for every record in the main file • May contain a partial index – contains entries to records where the field of interest exists. • When new record is added to main file, all of the index files must be updated. • Used in applications where timeliness of info is critical i.e airline reservation system, inventory control system.
The Direct or Hashed File • Directly access a block at a known address • Key field required for each record • Make use of hashing function on the key value. • Often used when very rapid access is required, where fixed length length record sre used and where records are always accessed one at a time. • i.e directories, pricing tables
File Directories • Contains information about files • Attributes • Location • Ownership
Directory Structure • Directory itself is a file owned by the operating system • Provides mapping between file names and the files themselves
Simple Structure for a Directory • List of entries, one for each file • Sequential file with the name of the file serving as the key • Provides no help in organizing the files • Forces user to be careful not to use the same name for two different files
Types of operations on the Directory • Search • When a user references a file, the directory must be searched to find the entry corresponding to that file. • Create file • When a new file is created, an entry must be added to the directory • Delete file: • When a file is deleted, an entry must be removed from the directory. • List directory • All or a portion of the directory may be requested. • Update Directory • A change in one of the file’s attributes requires a change in the corresponding directory entry.
Two-level Scheme for a Directory • One directory for each user and a master directory • Master directory contains entry for each user • Provides address and access control information • Each user directory is a simple list of files for that user • Still provides no help in structuring collections of files
Hierarchical, or Tree-Structured Directory • Master directory with user directories underneath it • Each user directory may have subdirectories and files as entries