Chapter 10: File-System Interface

Chapter 10: File-System Interface

Chapter 10: File-System Interface • Chapter 10.1 • File Concept • Access Methods • Chapter 10.2 • Directory Structure • File-System Mounting • File Sharing • Protection

Storage Management • New block – File Systems a.k.a. “Storage Management” • An Operating System is often described as a program that manages processes, processors, memory, and storage. • Listing these: operating systems control and manage: • Processes (both user and system) • Processors (the CPUs) • Memory management (primary, cache, …) and • Storage management (data, programs, directories used for access, etc. )

Storage Management - more • Disk storage – primary medium for primary, online storage. • Contains files – collections of related items defined by file creator. • Normally grouped into directories for ease of use and reference. • Organized in a variety of structures. • Disk Access – • sometimes character at a time; often blocks at a time. • sometimes access sequentially; sometimes randomly. • Some file systems dedicated; some shared • Some support data transfer data asynchronously; others synchronously. • Differ greatly in speed – many parameters as cited above. • This chapter: the File System Interface.

Objectives of this Chapter: • To explain the function of file systems • To describe the interfaces to file systems • To discuss file-system design tradeoffs, including access methods, file sharing, file locking, and directory structures • To explore file-system protection

File Concept • A File System consists of two parts: • Files – the actual storage of data on a medium • Stored on sequential or some kind of directaccess storage device. • Directory Structure – structures the information for access • Size, location, logical record length, block size, format, ownership, security, paths to files / directories, etc. • A file may be defined as a contiguous logical address space, which is mapped by the operating system onto some kind of physical devices. • Note: ‘logical’ does not mean ‘physical.’ • Almost all storage devices are non-volatile (data remains when power is removed) • Magnetic tapes • Magnetic disks • Optical disks, • Jump drives • CDs / DVDs …. And others…

File Concept • To a user, a file is the smallest allocation of logical secondary storage. All data is written to a ‘file.’ • Data may be numeric, alphabetic, alphanumeric, or binary. • Can be free form (text) • Can be rigidly formatted – records. • Fixed length records; variable length records: Bright Lights application? • Generally, a file is a sequence of • bits, • bytes, • lines, or • records • … whose meaning is interpreted by the creator of the file and how it is used. • “One man’s program is another man’s data.”

File Concept (continued) • Data files – many forms and structures • Differentiate between a file’s organizationand how it may be accessed. • not the same • Program files – • Source programs • Object files • May not be directly executable • May be understandable by a ‘linker.’ • Executable files • May be ready for loader to bring into memory. • Much of the data about programs and data files revolves simply as how they are used!

File Attributes • Name – Typically the only information kept in human-readable form • Usually independent of the process and system that created it. • Save for possible extensions or types, such as .doc or .ppt, etc. • But names often are constrained by the operational environment. • NIHPOO……. Each positions often means something very important in a commercial (non-academic environment.) • Identifier – unique tag (number) identifies file within file system • NIHP00; System Code IH; Source programs: ‘N’; subsystem ‘P’ • Programs within subsystem: 00, 01, …. • Type – needed for systems that support different types • .c, .java. .cpp, .exe, .dll, .dat, .wpd, .doc, etc. .xls, .css. …. • And bringing up certain ‘processes’ to process these files … by type. • Location – pointer to file location on device • Size – current file size - generally in bytes or blocks, especially blocks. • Protection – controls who can do reading, writing, executing • Yes! Read, write, execute, • Time, date, and user identification – data for protection, security, and usage monitoring – • Maybe date last accesses; OPR; security.

File Operations • File is an “abstract data type.” • This means it has data which will be unique to its implementation (realization – how organized, and use – how accessed and processed), and • File operations that can be performed on the data – dependent upon how it is implemented. • Accessed sequentially, randomly, etc. • Let’s look at the six basic functions that can be performed on most files.

Typical File Operations • Create – • Need to allocate space • Adds entry in disk directory; load data onto storage device. • Write – • “System call” supplies name of file and data to be written. • A pointer usually needs to be available to “point” to the place where the next ‘item’ is to be written; pointer updated. • Read – • Another system call specifies file name, location in memory where read data is to be placed, and, using a pointer, locates data to be read. • Pointer needs to be updated to point to ‘next’ item to be read. • Pointer for read and write: called a ‘currentfile-position pointer.’

Typical File Operations – more • Reposition within file – • This refers to moving a file pointer to point to a specific position / record in the file. • Really, this is a file-seek. • Delete – • Using the directory, release the file space for reuse; • Clears directory entry referring to this file. • Truncate – often used in recreating a file… • Delete entries in file but keeps file attributes. • Changed attribute is file length; • File length reset to zero and its file space is released.

Typical File Operations • Other operations include: • Append data to end of a file • Rename a file • Copy a file • Other file utilities: get length of file; get attributes, etc…. • Many OS utilities such as file prints, allocating space, … • Some files open() a file at first reference; others require a specific open() or fopen, (system call) etc. • Some files are automatically close() when program terminates; others suggest an explicit file close(). • My take: always close your files. Keep things clean. • Open() usually validates the desired mode (read. write, append,…), permissions, and more. • Then, open() typically returnsapointer to the entry in the open-file table.

File Operations – The Process Itself • In a multiprogramming environment, there is usually a “process table” (PCB) for each running process. • Most processes will contain current file pointer for each opened file • Interestingly,, there is often a system-wide open file table too, which contains a list of open files for all running processes. • Honeywell – UNISYS • PAT Table overflow (peripheral allocation table)….

Open File Tables • So there’s an entry in a process-dependent tableand a system-wide table • System wide table contains additional information including an ‘open count.’ • When a file is opened for a process, an entry in the open-file table for thatprocess points to the entry in the system-wide table. • The system-wide table also keeps track of who has the same file open, should more than a single process be accessing the file. • Close() decreases this count. When open count reaches zero, this file’s entry is removed form the system-wide table.

Open File Basic Information • Data needed to manage open files: • File pointer - pointer to last read/write location, per process that has the file open • Note: this is needed for systems that do not include a file offset as part of the read() and write() operations. • Needs to keep track of last read / write location as a current file-position pointer. • File-open count: - counter of number of times a file is open – to allow removal of data from open-file table when last process closes it • Disk location of the file: cache of data access information • Access rights: per-process access mode information. • Each process opens a file in some kind of access mode..

“Open File Locking” • Provided by some operating systems and file systems • Particularly useful for files that can be accessed by multiple applications at same time. • Mediates access to a file • Shared locks – used for reading • Exclusive locks – needed for writing. • Only one process at a time can get the exclusive lock. • Some OSs only provide for exclusive locking – which makes sense.

File Locking Example – Java API import java.io.*; import java.nio.channels.*; public class LockingExample { public static final boolean EXCLUSIVE = false; public static final boolean SHARED = true; public static void main(String arsg[]) throws IOException { FileLock sharedLock = null; FileLock exclusiveLock = null; try { RandomAccessFile raf = new RandomAccessFile("file.txt", "rw"); // get the channel for the file FileChannel ch = raf.getChannel(); // this locks the first half of the file - exclusive exclusiveLock = ch.lock(0, raf.length()/2, EXCLUSIVE); /** Now modify the data . . .Needs exclusive access! */ // release the lock exclusiveLock.release();

File Locking Example – Java API (cont) // this locks the second half of the file - shared sharedLock = ch.lock(raf.length()/2+1, raf.length(), SHARED); /** Now read the data . . . */ // release the lock sharedLock.release(); } catch (java.io.IOException ioe) { System.err.println(ioe); }finally { if (exclusiveLock != null) exclusiveLock.release(); if (sharedLock != null) sharedLock.release(); } } }

File Types – Name, Extension We mentioned several file-types earlier. Here are more samples. Common approach for implementing file types is to include the file-type as part of the name: name.extension. File type tells the operating system the types of operations that can be performed on the file. e.g. .com and .exe and .bat can be executed. .com and .exe files are binary executable files; a .bat file is text in ASCII format and consists of a series of commands to the operating system. Certain applications expect files sent to them to be of a certain type, as in . .c, .java or .doc.

File Types – Name, Extension - more We are very familiar with file-types, as we use them all the time. “These” notes have extension .ppt for power point. When I open this file by double-clicking on an icon or hot link representing the file, the specific application (Power Point) is automatically invoked. Windows has default associations of file-types to applications Some OSs don’t require an extension and take an extension as a ‘hint.’

File Structure • File types indicate the internalstructure of the file. • These have structures expected by programs that process them. • Typically, there is information (often up front in the file) needed by the processing program to properly process (load, process, display etc.) the file in question. • It might include where program is to be loaded, key words, location of first instruction, external symbols, and more. • For any file-type the Operating System supports, it needs some code to recognize and support that file type. • But new applications may require information structured in ways not supported by the operating system and problems may occur (book). • This presents some interesting problems.

File Structure – not recognizable formats… • We may develop an application that creates a file-type not compatible with recognized file-types supportable by the operating system. • So, what to do? • Some operating systems support a very limited set of structures and interpret files very simply as, say, a sequence of 8-bit bytes. • So, ‘something’ must interpret these. • The OS allows these, but doesnotsupport these directly. • Thus, each application must include code to interpret such an input file… • Can you think of any? They are all around us! • If you are a Java person, look at all the various I/O options available! They are all a bit different.

Internal File Structure • Most systems usually have well-defined block sizes • These are usually dependent on the organization of the disk: sector size or some derivative of track size. • We always read and write in blocks – physical records. • For a specificfile, all blocks are usually of the same size, with the number of ‘logicalrecords’ as some subset of the block size. • Called ‘blocking factor’ BF = 100  one hundred logical records per physical record (block). • Discuss • Why do we read/write ‘blocks’ in lieu of logical records? • Discuss.

Internal File Structure • Some operating systems define files as simply streams of data bytes. • Here, each byte is individually addressable by its offset from the front (or end) of the file. • Logical record size = 1 byte. • But the system packs and unpacks these bytes into physical disk blocks of, say, 512 bytes per block. • So, • the length of a logical record (a read() operation), • the physicalblocksize (determined by sector size or track length), and • packingtechnique determine the number of logical records in a physical block (record).

Internal File Structure • Files, nonetheless, are considered a series of blocks (whatever their size) and all I/O functions (logical read() and write()) take place withblocks. • The ‘first’ read() or write() does not read a logical record. • It typically reads from IRG to IRG (IBG to IBG) or sector boundary to sector boundary. - much more later • `512 byte sector – contains five 100-character records… • Subsequent read() or write() operations result in (typically) a pointer moving to the next logical record in the block, which is part of a process’s address space. • Thus only the physical read of a block results in a physical disk access. • Naturally, there is likely some internalfragmentation for the last block allocated to a file. • Data in a file can be accessed in several, but restricted, ways often dependent upon the file’s ‘organization.’

Access Methods • Sequential Access of a sequential fileorganization is the simplest form. • Information is processed in order – one logical record after another. • Operations are typically some form of read() or write() • read next – reads a record and advances file pointer. • write next - appends to the end of the file and advances to the new end of file…moves file pointer as ‘writes’ occur. • reset – some can be reset to a certain position • Others….

Access Methods • Direct Access – organization. • File typically consists of fixed-length logical records. • File is viewed as numbered sequence of blocks (records) • Access may be random; sometimes sequential. • Given the need for a retrieval, a ‘key’ of some sort is developed for a logical record and from this a block address is computed and the block (containing the logical record) is read. • Blocks are stored according to some kind of key (like SSAN or Account Number, and others) and the computation of the disk address is often done by a variety of algorithms. • Typically we can read a block randomly – given its disk address.

Access Methods • There are many ways that direct access can be affected. • Some direct access approaches allow the programmer to computer a CCTTRR number; • Others require the application to compute a ‘relative record number’ starting with record 0, the first record in the file. • IBM uses VSAM – Virtual Storage Access Method. Terms: • ESDS – entry sequenced data set – for sequential files • KSDS – Key Sequenced Data Set – for indexed sequential files, • KSDS – uses a primary key such as SSAN, or account number. These then are mapped into physical disk addresses. • RRDS – Relative Record Data Set. • Here we compute algorithmically a relative record number – an integer. • More later.

Sequential-access File cp = current position Here’s a visual for, perhaps, a tape drive. For sequential files, access is always sequential as shown above.

Simulation of Sequential Access on a Direct-access File On some direct access types of files, sequentialprocessingispermitted, but not all.. On file organizations that permit both sequential and random access, both random queries for retrievals and sequential processing for other requirements such as reports, etc. are permitted. Indexed Sequential Files (ahead) support both random and sequential access. Direct Access files normally only support random access. (more ahead)

Example of Index Organization and Random Access This organization requires an index and contains pointers to various blocks. Access requires the search of index followed by the retrieval of a record from the file. Logical records are contained within a block and blocks are read and written. So, when a block is read from disk, this is followed by a sequentialread of the logicalrecords within the block to see if the specific desired record lies within the block Typically the index (above) has keys (primary keys) and block numbers are shown. The highest key in a block is shown. So we are not certain that the desired logical record is actually in the block until it is retrieved and searched for.

Example of Index Organization and Random Access An indexed sequential file is sorted (ordered) on some index or primary key, like name (above) or account number (key must be unique). Then an index of primary keys and disk addresses (kept in memory when file is active) is used to locate a logical record. (Actually disk addresses point to a block where the desired logical record ‘may’ be .) Multiple keys may be used to search the file for a desired record. Example: The file must be ordered on a unique primary key, such as account number. But we may also retrieve on a unique or non-unique secondary key such as name (non-unique) or phone number (unique).

Example of Index Organization and Random Access For very large files, we may have levels of indexes (coarse index and fine index; or index sets, sequence sets, data sets (IBM)). One or more of these indices may be kept in primary memory to reduce I/Os when attempting to access a record. The indices are searched via a binary search; the retrieved block is searched sequentially for the desired logical record..

Example of Relative Files Relative Files are another kind of direct access file that does not allow for sequential access. Due to the way the records in the file are created, sequential access, though possible, makes little sense. This is because we typically us a field within a logical record and – based on that Field within the record, computationally determine (usually, ‘hash’) a relative record number – an integer – that provides the ‘relative’ displacement of the logical record from the beginning of the file. In a Relative File, the key to an individual record is usually computed and is an integer, such as 3, 25, 65, 234, etc. and not related to the order in which it is added to the file.’ Again, there are other direct access file types besides indexed sequential and relative files.

End of Chapter 10.1

Chapter 10: File-System Interface

Chapter 10: File-System Interface

Presentation Transcript

A Universal Smart Transducer Interface

File Management, Virus Protection, and Backup

Chapter 11 Interrupt Interface of the 8088 and 8086 Microcomputer

INTERFACE PROCESSING

File-System Interface

Chapter 12

Chapter 11

Linux Virtual File System

File systems: outline

Modulo V Sistema de Arquivos

Chapter 11

INTERFACE PROCESSING

File Systems

File systems: outline

System Maintenance Training Course

Chapter 11: Storage and File Structure