1 / 22

Accurate and Efficient Replaying of File System Traces

Accurate and Efficient Replaying of File System Traces. Nikolai Joukov, TimothyWong, and Erez Zadok Stony Brook University (FAST 2005) USENIX Conference on File and Storage Technologies. Presented by Hsu Hao Chen. Outline. Introduction Design Architecture Reproduce original timing problem

stahlman
Télécharger la présentation

Accurate and Efficient Replaying of File System Traces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accurate and Efficient Replaying of File System Traces Nikolai Joukov, TimothyWong, and Erez Zadok Stony Brook University (FAST 2005)USENIX Conference on File and Storage Technologies Presented by Hsu Hao Chen

  2. Outline • Introduction • Design • Architecture • Reproduce original timing problem • Replayfs trace • Threads and their scheduling • Zero copying of data • File system caches • Implementation • Evaluation • Conclusions

  3. Introduction • Trace replaying is useful for file system benchmarking, stress-testing, debugging, and forensics. • File system traces can be captured and replayed at different logical levels: • System calls • Virtual File system (VFS) • Network level for network file systems • Device driver

  4. Design • Architecture(1/2) • Tracefs: replays traces captured using stackable file system

  5. Design • Architecture(2/2) • Replayfs: VFS-level replayer

  6. Design • Reproduce original timing problem If the treplayer > tuser then timeing and I/O rate could not be reproduced correctly

  7. Design • System-call replayers problem: • User mode • Redundant data copying between user and kernel buffers • Page eviction is not completely controlled from the user level • Replaying processes can be preempted by other tasks • Some kernel are not preemptive and have long execution path

  8. Design • Replayfs trace(1/4)

  9. Design • Replayfs trace(2/4) • Tracefs • A trace captured by a tracer is often portable, descriptive, and verbose to offer as much information • Trace compiler • User mode program for conversion and optimization of the Traces raw traces • Splits the raw Tracefs trace into three components: • Command • Resource Allocation Table (RAT) • Buffer

  10. Design • Replayfs trace(3/4)

  11. Design • Replayfs trace(4/4) memory buffers are accessed for reading only because the information read from the disk is discarded.

  12. Design • Threads and their scheduling • Replayfs issues requests to the lower file system on behalf of different threads • Resource contention (disk head repositioning, locks, etc) • Replayfs reuses threads if possible • pre-spin • Increase event precision (standard event timers 1ms) • Clock thread • CPU cycle counters

  13. Design • Zero copying of data • there is no easy way a user-mode program can read data but avoid copying it to user space. • Use kernel-mode benefit • a data page that belongs to the trace file can be simply moved to the target le by just changing several pointers

  14. Design • File system caches • Replaysfs supports three replaying modes for dealing with read operations • Current cache state • Replayfs calls all the captured buffer read operations • Original cache state • reads are invoked on the page level only for the pages that were not found in the cache during tracing. • Reads are not replayed at all

  15. Implementation • Linux kernel and now both Tracefs and Replayfs can be used on either 2.4 or 2.6 Linux kernels. Kernel module Application program Kernel module

  16. Evaluation • Test environment • 1.7GHz Pentium 4 machine with 1GB of RAM • system disk was a 30GB 7200 RPM IDE formatted with Ext3 • the machine had two Maxtor Atlas 15,000 RPM 18.4GB Ultra320 SCSI disks formatted with Ext2 • storing the traces and the Replayfs traces

  17. Evaluation • Evaluation Tools and Workloads • Am-utils build • Building Am-utils is a CPU-intensive benchmark • Postmark • simulates the operation of electronic mail servers • Pread • evaluate Replayfs's CPU time consumption. • It spawns two threads that concurrently read 1KB buffers of cached data using the pread system call. • Pread performed 100 million read operations.

  18. Evaluation • Memory Overheads 56% 70% 45%

  19. Evaluation • Timing Precision of Replaying(1/2) Number of operations Time (seconds) Time (seconds)

  20. Evaluation • Timing Precision of Replaying(2/2) Number of operations Time (seconds) Time (seconds)

  21. Evaluation • CPU Time Consumption 32% 61% User-level replayers cannot replay traces like Pread at the same rate as the original

  22. Conclusions • Trace replaying offers a number of advantages for file system benchmarking, debugging, and forensics • Replaying has three distinct benefits: • Capture and replay all file system operations • Include important memory-mapping • Kernel module • Avoid unnecessary data copying • reduce the number of context switches • Optimize trace data prefetch • Precise control over thread scheduling • Pre-spin

More Related