1 / 15

Jeff's Filesystem Papers Review Part II.

Jeff's Filesystem Papers Review Part II. Review of "The Design and Implementation of a Log-Structured File System". The Design and Implementation of a Log Structured File System. By Mendel Rosenblum and John K Ousterhout UCB

naava
Télécharger la présentation

Jeff's Filesystem Papers Review Part II.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jeff's Filesystem Papers Review Part II. Review of "The Design and Implementation of a Log-Structured File System"

  2. The Design and Implementation of a Log Structured File System • By Mendel Rosenblum and John K Ousterhout • UCB • Ousterhout introduced the idea in an earlier paper with a few different configurations, this describes the concept as it exists after they implemented it in a new FS called Sprite. • Some empirical research was done after the implementation and they did prove that LFS is a good idea. This presentation is an academic review, the ideas presented are either quotes or paraphrases of the reviewed document.

  3. Intro • Why? (problem statement) • CPU's getting faster • Memory getting faster • Disks not • Amdahl's Law • Bottlenecks move around, as CPU gets faster, bottleneck moves to memory or Disk, etc. • We need to find a way to use disks more efficiently • Assumption • Caching files in RAM improves READ performance of Filesystem, significantly more than WRITE performance. • Therefore Disk activity will become more Write-centric in the future.

  4. 2.1 • Disk improvement is in area of Price/Capacity and Physical size, not in seek-time • Even if IO improves, Seek-time will still be killer • Memory getting cheaper/faster.. therefore use memory cache to ease disk bottleneck. • Caches • difference between cache and buffer? • Buffer is used between 2 different speed devices, cache is to speed up subsequent similar or proximate accesses. • Caches can reorder writes to write more efficiently.

  5. 2.2 Workloads • 3 classes of File access patterns (from different paper) • scientific processing - read and write large file sequentially • transaction processing - many simultaneous requests, small chunks of data • engineering/office applications - access large number of small files in sequence • Engineering/office is the killer, and that is what LFS is designed for.

  6. 2.3 Problems with Existing FS's • UNIX FFS (fast file system...also Berkeley Developed) • puts files sequentially on disk • inode data in fixed location on disk • directory data at another location • Total of 5 seeks to create a new file (bad). • file data is written asynchronously so that program can continue w/o waiting for FS, BUT • Metadata is written synchronously so program is blocked when messing with things like inode data.

  7. 3 LFS • Buffer a sequence of FS changes in file cache and then write them sequentially to disk in a chunk to avoid seeks. • Essentially all data is merely a log entry. • Creates 2 problems... • How to read from log • How to keep freespace on disk • I.E. you start writing and writing forward forever eventually you will wrap at end of disk.

  8. 3.1 How to Read • Reads are at same speed as FFS after the inode is located. • Locating the Inode is what slows down FFS and where LFS is better. • FFS has Inode in static portion of disk • unrelated to physical location of data. • LFS stores Inode in proximity to data at the head (end) of log. • Because of this another (but much smaller) map is needed of the inodes. • So small that it is kept in cache all the time to not cause excessive seeks. • called the checkpoint region.

  9. 3.2 Free Space Management • Log Wraparound. • Choices. • Don't defragment, just write to next free block • GC-Style Stop everything and copy • Incremental Continuous and Copy • Solution - Segments • Divide Disk into segments • segment size chosen for optimal usage. • segment is written contiguously, and the disk is compacted in segments to avoid fragmentation. • This defrag is known as segment-cleaning

  10. 3.3 Segment Cleaning • Should be pretty obvious how to do it • 3 steps • Read a number of non-clean segments into memory • get only live (in use portion of segments) data • Write live data back to disk in clean segments • other logistical considerations in segment cleaning • update Inodes • update fixed structures such as checkpoint region • remember these are in cache, and as we will see later they are dumped to disk at predetermined intervals. • There is some other stuff dealt with as each segment has a header and other stuff, Read the paper for details.

  11. 3.4 Segment Cleaning - how to configure • When to do it? • Low priority, or when diskspace is needed • the authors choose when diskspace is needed with watermarks. • How many segments to clean at one time? • The more segments cleaned at one time, the more intelligent the cleaning can be and the better organized the disk. • watermarks chosen above • Which segments to clean?...coming • Since you can write the data back to disk any way you want to, you should write it back in the most efficient manner for its predicted use...coming

  12. 3.5-6 Determination of Segment-Cleaning configuration. • Here the authors went into empirical studies. • Wrote simulator and played with config of segment-cleaning to determine a good policy. • Results/Conclusions • differentiate between hot and cold segments based on past history • A hot segment is one that is likely to be written • A cold segment is one that is unlikely to be written • They came up with a policy called cost-benefit • cleans cold segments that are at least 75% full • cleans hot segments that are at least 15% full • The utilization and the "temperature" of a segment are maintained in an in-memory table.

  13. 4 Crash Recovery • FFS • Major problem is that entire disk must be scanned • most-recently-written data could be anywhere on disk • LFS • Most-recently-written data is at one location on disk. • Uses checkpoints and roll-forward to maintain consistency. • Borrowed ideas from dB technology

  14. 4.1 Crash Recovery • Checkpoint Region • 2 copies maintained at fixed location on disk • written to alternately, in case of crash while updating checkpoint data • At points in time: • IO is blocked • All cache data is written to end of log • Checkpoint data from cache is written to disk. • IOs then re-enabled • could instead be done at points based on amount of data written • note similarity to GC techniques. • Skipping roll-forward techniques as it is very complex and depends on segment header info, read the paper for more info, it just enhances checkpointing

  15. 5 Empirical test results • Comparison to FFS/SunOS • Basically, it is significantly better for small files and it is better or as good for large files in all cases except: • large files that were originally written random and are later accessed sequentially. • Crash-recovery was not rigorously empirically tested against FFS/SunOS.

More Related