1 / 72

File Management

File Management. File: Unit of related information (uniquely) named and accessed by users Smallest unit of user storage OS maps file to physical secondary storage device Typically nonvolatile disk Can be text or binary, free form or formatted Can be organized as bits/bytes/lines or records

sumi
Télécharger la présentation

File Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Management • File: • Unit of related information (uniquely) named and accessed by users • Smallest unit of user storage • OS maps file to physical secondary storage device • Typically nonvolatile disk • Can be text or binary, free form or formatted • Can be organized as bits/bytes/lines or records • Can be code, data, or possibly links, devices, directories, etc. File Management Chapter 12

  2. File System Requirements • Persistence • Not terminated by process termination • Not volatile • Remains until owner deletes it • Storage of LARGE amounts of information • Controlled concurrent access by users • Shareable • Access control • Structure (internal and between files) File Management Chapter 12

  3. File operations • Creating/destroying • Allocating/deallocating storage • Placing/removing from tables • Allocating/deallocating file control block • Reading/writing/executing • Maintaining cursor • Protection • Repositioning pointer (seek) • Appending/truncating files • Renaming • Open – return ptr; close File Management Chapter 12

  4. Sample file attributes/properties/metadata • File name • Identifier (for OS) • Location and/or relative path in file system • Current Size • Dates of creation, modification, access • Protection • Owner ID. • File type and/or application association (in Windows- different in UNIX) • Count of users • OS allows multiple users of single file • OS handles locks and consistent usage • Access rights to file (to be verified) • Lock flags if multiple users are allowed • FLAGS: ASCII/binary; hidden; read-only; system/normal; archive; temp File Management Chapter 12

  5. Typical File types • Executable (.exe, .com, within bin library) • Object (.obj, .o) • Source code (.java, .c, .cc, .cxx, .asm, .ada • Batch – to OS (.bat, shell scripts) • Text (.txt) • Word processor (.doc, .wp, .tex) • Compressed (.zip, .tar) • Multimedia (.mpeg) • Image (.gif) • Meaning to OS varies in different systems • Increasing numbers of different (incompatible) formats File Management Chapter 12

  6. Linux Virtual File System • Provides uniform interface with multiple implementations File Management Chapter 12

  7. Format compatibility • Of course, newer versions are almost always backward compatible • Discuss with class why software providers may not be interested in format compatibility in general • More and more issues of applications and different types of formats File Management Chapter 12

  8. Organizing files • Disks are partitioned (organized as volumes) • A disk has at least one partition • Dual boot contains two partitions, each with separate OS; at boot time user decides which partition is active • Multiple partitions are possible • GUIs make partitioning user friendly • System maps logical drive to physical drive • Each partition has its own directory structure File Management Chapter 12

  9. Directories • Each partition has a directory • For each file – keep name, size, date; physical location • Hierarchical directories • MULTICS • Two-level (user id at top level) • Tree structured – relative and absolute path names • Delete directory • UNIX rm –r * .o (comment on space) • Windows delete key (are you sure) • Acyclic-graph directories • Links (hard and soft) • Deletion in UNIX and Windows leaves soft link as invalid access • Hard links may be implemented with reference count for deletion or for creating a copy for the current users File Management Chapter 12

  10. Question • What are the issues in deleting a file that has hard links to it? • The hard links (addresses) are alive. • If the storage is deallocated and reallocated to a new file, users will have access to that storage • If we do not deallocate the storage, will the owner still be charged for storage that he does not use? • When can we safely deallocate storage? • Maintain reference count of links to the file • Issue of circular links • Give user his own copy of the file File Management Chapter 12

  11. File system mounting • Windows 95 and later versions automatically locate and mount attached file systems (on each device) at boot time • Go to top level on Windows Explorer • If device is added, Windows reconfigures automatically • Error can occur if too many compressed file systems are already mounted, or that file system is already mounted, or you attempt to mount over nonempty file system • UNIX mount command is typically placed into system configuration file that is executed at boot time; others must be coded in • usr/sbin/mount [-d] [-r|-u|-w] [-o option, ...] [-t [no]type] file-system File Management Chapter 12

  12. Mounting a disk • At mount time, OS verifies format • Places partition (logical disk) in tables, etc. • In Windows, each partition is given a logical name (ex: l:) • ScanDisk • If system was shut down “abruptly,” at boot time OS checks that files have been closed, checks metadata consistency File Management Chapter 12

  13. File Sharing • Group permissions • Remote systems • Anonymous ftp sites • World Wide Web (files transferred via ftp) • Distributed file systems (DFS) • Remote file system can be mounted as if it were on a local drive • File stored on “server” or workstation • Sharing – Networks and Sharing Center in Windows 7 • User authentication • Extranets, Intranets • Consistency semantics more complex for remote systems • A shared file may be declared read only (immutable) File Management Chapter 12

  14. Protection from improper access • Access rights • Files (r, w, x), append, delete, list • read/write/read & execute/modify/full control • directory access rights (also list, delete) • Access control • owner, group, world • Finer grained rights?? • Access matrices/capability rights File Management Chapter 12

  15. Question • What features in a file interface promote user friendliness? • Uniformity • Browser interface; GUI • Capabilities • Directories for organizing files • Links for navigating directories and sharing • Access methods, such as direct or indexed access • Transparency and abstraction • Virtual file system • SANs • Location transparency in general • Protection • authentication, authorization (access rights) File Management Chapter 12

  16. File System Implementation • File system is located in secondary storage • Typically located on disks • Advantages of Size, Cost, Direct Access, Speed • Nonvolatile • More robust than flash memory • “Virtual disks” - temporary storage in RAM disks, memory mapped files, caches File Management Chapter 12

  17. Storage Allocation • Files are mapped to disk locations • Minimal disk storage allocation is by sector (perhaps 512 bytes) • Operating System may allocate • Variable contiguous portions • Faster access; file allocation table is small • Not flexible (what if we have to increase file size) • Fixed small portions (a few sectors) – we’ll call these blocks • Easier allocation and deallocation as file changes size • Non-contiguous allocation • Larger allocation tables • Blocks may be grouped into clusters, chunks File Management Chapter 12

  18. Some data structures utilized during file management • File control block • List of open files • File Allocation Table (where file is stored) • Process Control block • Contains pointers to files that the process has opened • Directory table File Management Chapter 12

  19. Issues with file mapping to disk • If file is mapped to blocks on disks, internal fragmentation occurs for last part of last block • Mapping logical record to physical block • Packing logical records into a physical block • Numbers may be packed as well to reduce redundancies • OS must locate and unpack record in physical block • OS also supports user compression/decompression • Size of block (larger block gains faster I/0 and smaller table size but fragmentation is greater) File Management Chapter 12

  20. File Storage • File addresses are mapped to (typically) disk addresses • Translation to base address + offset • Unit of disk allocation is a sector • Unit of file allocation is a block (or cluster) • Tables maintained of each file system • Ex: open file table (compare to ready list) • Table maintained for each file (FCB) File Management Chapter 12

  21. Magnetic Disks • Direct access media • Read/write/rewrite/ possibly append • Disk divided into sectors • Smallest unit of allocation • Perhaps 512 bytes; 2048 bytes for optical disks • OS groups sectors into blocks or larger storage units • Issues of allocation of blocks • Contiguous/linked/indexed • Speed versus flexibility File Management Chapter 12

  22. New disk sector size recommended • 27 March 2006 • By Chris Mellor, Techworld • An industry committee has recommended increasing the disk block sector size from 512 bytes to 4096. IDEMA, the International Disk Drive, Equipment, and Materials Association, says the 30-year standard of 512B should be increased eightfold to 4KB. It will mean disk drives can hold more data and hosts access it faster. • click here • IDEMA Long Block Committee member Dr. Martin Hassner of Hitachi GST said: "(The) increasing areal density of newer magnetic hard disk drives requires a more robust error correction code (ECC), and this can be more efficiently applied to 4096 byte sector lengths." A Hitachi paper on the topic can be found here. • That is because, with the increasing areal density of disks, a bad area corrupts more bits. Having error-correction work over longer bit-strings makes more data recoverable. File Management Chapter 12

  23. http://www.tomshardware.com/reviews/wd-4k-sector,2554.html • “Western Digital … has launched a new product line, the EARS-series, and moved 4KB sectors into the mainstream. The reason for this change is an increase in net storage capacity due to decreased amounts of ECC information resulting from the larger sector size.” 2010 • (that means that blocks will not be smaller than 4KB) • fsutil fsinfo ntfsinfo c: (command prompt as administrator) • wmic partition get BlockSize, StartingOffset, Name, Index File Management Chapter 12

  24. File-System architecture • Application programs/ OS kernel • Logical file system • Maintains file control block (FCB, inode)- metadata • System file name, user file name, owner, access rights, time, size • File-Organization module • Maps logical block to physical unit; maintains the free-block list • Basic file system • Issues commands to device drivers • Read/write to location (drive, block) • I/O control • Device drivers and interrupt handlers • Devices File Management Chapter 12

  25. Support of file system is typically part of kernel • Performance improvements from kernel support • During boot time, system checks all mounted devices • OS maintains partition table (table of mounted devices) • Table of recent accesses • Ptrs to partition table for physical locations • Open-File table • Ptrs to FCBs, counters for multiple users, cursor for current position, etc. File Management Chapter 12

  26. Creation of file • User or application program calls OS • OS creates FCB • Information about owner, times, physical location, file name, etc. • Places it in system tables • Reads directory into memory if necessary • Updates directory • Storage is allocated for FCB, file File Management Chapter 12

  27. “Raw” Disk • Standard disk formatting is not always appropriate • Disk may be formatted for specific use • Swap disk (no file system) • Databases (may provide their own structure for data) • RAID systems need additional formatting information • Boot disks have additional format information • Partitioned disks File Management Chapter 12

  28. Directories • Path names • Absolute/ relative • Always use absolute names if you do not know what your working directory will be for a call • Operations • Create, delete, list (requires opening), rename • Link, assign access rights File Management Chapter 12

  29. Directories • Entry storage and retrieval may be by: • Linear or structured list • Sequential (search, add, delete) • Sorted perhaps according to B-tree • Linked • Cache provides efficiency for most recently used data • Hash table • Direct access • Possibility of collisions • Directory size is limited by hash function File Management Chapter 12

  30. File storage on disk Contiguous Advantage: Simple, fast retrieval, file table is minimal • Any block can be directly accessed following computation • Supports direct & sequential access • Used for ROM disks, swap space How to handle appending data to file • Swap in/swap out if there is no room for file to expand, but there is room elsewhere on disk • Preallocation of extra blocks will allow file size to increase • May overestimate needs • Holes accumulate as files are deleted (external fragmentation) • Requires disk compaction (defragmentation) File Management Chapter 12

  31. File storage allocation using chunks • Large contiguous chunks are allocated • First contiguous block set is allocated- more (extent) allocated on demand • Size of file is stored • Each extent linked to next extent • Used for DVDs/ Universal Disk Format • UDF limits file size to 1GB requiring 3-4 extents File Management Chapter 12

  32. File storage – linked allocation • Blocks (small chunks) chained together • Blocks allocated on demand • No external fragmentation • Each block contains pointer to next block (or null ptr) • Slows down sequential access • Sequential search to achieve “direct access” • Note # of disk accesses • Pointers require storage (4-8 bytes) • Group the blocks into clusters for improved performance (more fragmentation) • Reliability issue if pointers are corrupted File Management Chapter 12

  33. Indexed Allocation with FAT • File-allocation table (FAT) • Directory table and FAT are stored at beginning of disk/partition; copied to memory upon first disk access • FAT is searched to find free blocks – entry of 0 • Directory table contains link into FAT for each file • FAT entries link file blocks • Supports direct access • Does not scale well- FAT and directory table are limited in size • For file stored at blocks 4, 88, 95, 10: Index (implicit) 4 ..10.. 88 ..95 Link (cell value) 88 eof 95 10 File Management Chapter 12

  34. File storage with indices & links • Directory with entry for each file • Each file has individual index block that is brought into memory when file is opened • Large index block requires more memory space • Small index block cannot index large file • UNIX inode • File attributes are stored (size, time, ID, etc.) • 12 ptrs to direct blocks • 3 ptrs to 1, 2, 3 levels of indirection • Compromise between fast access for small files, # of levels of indirection (slower access) for larger files File Management Chapter 12

  35. I-node • From D. A. Rusling File Management Chapter 12

  36. Question • How many bytes would an i-node be able to access directly, indirectly, double indirectly, triple indirectly if each i-node contained 13 direct block pointers and 1 pointer for indirect, double, triple, each pointer was 4 bytes and each block was 64k bytes • Direct – 13 (64kB) + • Indirect – 1 (16k) 64kB + • Double Indirect – 1 (16k) (16k) (64kB) + • Triple Indirect 1 (16k) (16k) (16k) (64kB) = almost • 3 * 1017 Bytes File Management Chapter 12

  37. Question • Storage allocation for files may be contiguous, linked, or indexed • Give reasons why each method might or might not be used • Contiguous • Efficient; Good for swap disk • Bad for dynamically increasing file size • Linked • Good for comparatively low cost of pointers • Bad for sequential access thru blocks • Indexed • Good for flexibility of size allocated, (almost) direct access; bad because a possibly large index must be maintained in memory File Management Chapter 12

  38. Free Space Management • DOS File Allocation Table – searched • Bit vector • Shift for first free block (0) • Bit vector is maintained in memory (consider size) • Free Portions are chained together • Pointer between blocks (units) • Multiple I/O accesses for multiple blocks • What if link is corrupted? • Grouping – store n ptrs in first free block File Management Chapter 12

  39. Performance • Performance varies for different types of files and storage methods • Contiguous storage is best for direct access • Linked is best for expanding files • Indexed is good for small files • Can combine methods (i.e., i-nodes) • Changing disk types can hurt hashing performance • Increasing cluster/block size to maximum DMA transfer size can improve performance for sequential access by cutting down on the number of I/Os File Management Chapter 12

  40. Improvements for Efficiency • Vary sizes of clusters to reduce internal fragmentation – larger clusters can be allocated if file grows • Keep i-node (FCB) near file to reduce seek time • Size of pointer to be considered (large ptrs needed if they must link to entire memory but this requires more storage) – perhaps increase size of clusters rather than size of ptrs File Management Chapter 12

  41. Improvements for efficiency (cont.) • Use caches for indexes, directories, recent data • Maintain indexes, directories, data in memory • RAM disks, memory mapped I/O • Interleave storage on multiple disks (striping) • Asynchronous writes • Output is buffered so process can progress • Buffered reads and read-ahead • Anticipatory fetching File Management Chapter 12

  42. Fault Tolerance • Ability of system to continue to function (perhaps in degraded mode) if a fault is activated • Chief mechanism- Redundant Components and Data • Duplication of data (file system backups, etc.) • RAID (mirrored, duplexed, etc) • Store in another location (propagation delay becomes a factor) • Determine frequency of back up • File may have bits indicating updates/ times • Maintain consistency • Consistency check – typically each block has checksum or ICV or ECC at end • Crash or corruption – logs, checkpoints and roll backs • Appropriate for network systems as well • Fast way to recover data, but we need to repeat all transactions since last checkpoint File Management Chapter 12

  43. Fault tolerance for hardware failures – providing reliability and availability • Corruption of disk management information (metadata) or data • Maintain backup for reliability • Provide on-line redundancy for availability (RAID) • Use ECCs, logs, etc. for detecting failures • For remote systems, failure semantics must be defined • Ex: Crash or shutdown of server • Does client delay operations or close and try to recover from all operations? • System maintains state information, including authentication and checkpoints • System may ignore failures File Management Chapter 12

  44. File System consistency • Scandisk and fsck check blocks of each file • Each i-node is used to form a list of blocks in use • Keep track of number of blocks in use • Compare to free block list • All blocks should appear exactly once in either table • Handle inconsistencies • If file block appears in both lists, remove from free list • If block appears in multiple i-nodes (block list contains a value of 2 or more for block), copy block and allocate to separate users, report problem to users • Directories are similarly checked File Management Chapter 12

  45. File System Security • Access Matrix • Rows of users; columns of files, objects • Each user’s access rights to resource is defined • Access Matrices are sparse • Decompose by row or column File Management Chapter 12

  46. Access Matrix • One such rule set is an Access Matrix

  47. Access Control Lists • Access Control Lists decompose access matrix by columns • alpha.fdu.edu> getacl index.html • # • # file: index.html • # owner: levine • # group: faculty • # • user::rw- • group::r-- • other::r-- File Management Chapter 12

  48. Capability Lists • Decomposition by rows yields capability lists (or ticket) • specifies authorized objects and operations for a user.

  49. NTFS • NTFS is the Windows NT equivalent of the Windows 95 file allocation table (FAT) and the OS/2 High Performance File System (HPFS). • NTFS offers a number of improvements over FAT and HPFS in terms of performance, extendibility, and security. References: (Book): MCSE Accelerated Windows 2000 Study Guide Exam 70-240 (Website): http://www.ntfs.com Adapted from presentation by Jenny Villa-Dominguez File Management Chapter 12

  50. NTFS improvements Microsoft created NTFS to improve on some FAT features, such as: • Increased fault tolerance • Enhanced security. • File Compression • Fragmentation File Management Chapter 12

More Related