260 likes | 281 Vues
This paper discusses the need for reliable file system checkers and presents SQCK, a declarative query language-based approach to build robust checkers. SQCK simplifies the complexity of traditional checkers by writing fewer lines of code while ensuring simple, reliable, and flexible checks and repairs. The evaluation shows that SQCK outperforms traditional checkers in terms of simplicity and reliability.
E N D
SQCK: A Declarative File System Checker Haryadi S. Gunawi, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin – Madison OSDI ’08 – December 9th, 2008
Corrupt file systems • File systems • Store massive amounts of data • Must be reliable • Corrupted file system images • Due to hardware errors, file system bugs, etc. • Need to be repaired a.s.a.p.
Who should repair? • Does journaling (write-ahead log) help? • No, only for crashes • Does file system repair itself online? • No, not enough machinery • Fsck: the last line of defense • It’s a “must have” utility • XFS: “no need fsck ever”, but deploys fsck at the end • Must be fully reliable
But … fsck is complex • Fsck has a big task • Turn any corrupt image to a consistent image • E.g. check if a data block is shared by two inodes • How are they implemented? • Written in C hard to reason about • Large and complex • Ext2 fsck: 150 checks in 16 KLOC • XFS fsck: 340 checks in 22 KLOC • Hundreds of cluttered if-check statements • Bottom line: fsck code is “untouchable”
Are current checkers really reliable? If not, how should we build robust checkers? Two Questions
e2fsck is unreliable • Analyze e2fsck (ext2 file system checker) • Findings: • Inconsistent repair • The file system becomes unreadable • Consistent but not “correct” • Fsck deletes valid directory entries • Fsck loses a huge number of files
SQCK • Lesson: Complexity is the enemy of reliability • Big task + bad design complexity unreliability • Need a higher-level approach for simplicity • SQCK (SQL-based Fsck) • Use a declarative query language to write checks • Put simply: write fewer lines of code • Evaluation • Simple and reliable: e2fsck in 150 queries (vs. 16 KLOC of C) • More: Great flexibility and reasonable performance
Outline • Introduction • Analysis of e2fsck • SQCK Design • SQCK Evaluation • Conclusion
Methodology • E2fsck task: cross-check all ext2 metadata • An indirect pointer should not point to the superblock • A subdir should only be accessible from one directory • Inject single corruption • Observe how e2fsck repairs a single corruption • Only corrupton-disk pointers • Corrupt an indirect pointer to point to the superblock • Corrupt a directory entry to point to another directory • Usually, a corrupt pointer is simply cleared to zero
Indirect block 0 … 850 … 851 … 998999 … 853 Superblock 0 … … 0 … … … … 0 … … Inconsistent (Out-of-order) Repair • Check bad indirect pointer 2. Check indirect content Inode *ind Inode *ind Superblock 0 Ideal fsck e2fsck 2. Check indirect content • Check bad indirect pointer Inode *ind Inode *ind Superblock
/ a1 b1 a2 b2 Consistent but Incorrect Repair (1) / / / a1 b1 a1 b1 a1 b1 LF X a2 b2 a2 b2 b2 a2 Ideal fsck Kidnapping problem! e2fsck E2fsck does not use all available information / a1 b1 X b2
Result Summary • Four problems • Inconsistent • Information-incomplete • Policy-inconsistent • Insecure • E2fsck does not handle all corruptions • “Warning: Programming bug in e2fsck! Or some bonehead (you) is checking a mounted (live) filesystem.” • Not simple implementation bugs • Difficult to combine available information • Difficult to ensure correct ordering
Outline • Introduction • Analysis • SQCK Design • SQCK Evaluation • Conclusion
Hundreds of checks Complex cross-checks Taxonomy of checks in e2fsck: Must be ordered correctly struct A { int x int y } A { x y } A { x y } A { x y } B { m n } A { x y } A { x y } A { x y } A { x y } B { m n } B { m n } B { m n } Fsck Properties
A Declarative Approach • Lesson: Complexity is the enemy of reliability • SQCK • Use a declarative query language (e.g. SQL), why? • It is declarative: high-level intent is clear • Fit for cross-checking massive information • Goals achieved • Simple: e2fsck in 150 queries (vs. 16 KLOC of C) • Reliable: Each check/query is easy to understand • Flexible: Plug in/out different queries
Using SQCK • Take a fs image • Load metadata to db tables • Temporary tables • Ex: InodeTable, GroupDescTable, DirEntryTable • Run checks and repairs (in the form of queries) • Flush any modification, and delete tables Database tables Scanner Loader Checks + Repairs Flush File system image
Declarative check (example 1) • Cross-checking asingle instance of a structure • “Find block bitmap that is not located within its block group” first_block = sb->s_first_data_block; last_block = first_block + blocks_per_group; for (i = 0, gd=fs->group_desc; i < fs->group_desc_count; i++, gd++) \{ if (i == fs->group_desc_count - 1) last_block = sb->s_blocks_count; if ((gd->bg_blk_bmap < first_block) || (gd->bg_blk_bmap >= last_block)) { px.blk = gd->bg_block_bitmap; if (fix_problem(BB_NOT_GROUP, ...)) gd->bg_block_bitmap = 0; } ... } SELECT * FROM GroupDescTable G WHERE G.blockBitmap NOT BETWEEN G.startANDG.end
Declarative check (example 2) • Cross-checking multiple instances of the same structure • “Find false parents (i.e. directory entries that point to a subdirectory that already belongs to another directory)” • Must read all directory entries in dir data blocks • Wrong implementation in e2fsck (the kidnapping problem)
Declarative check (example 2) if ((dot_state > 1) && (ext2fs_test_inode_bitmap (ctx->inode_dir_map, dirent->inode))) { // ext2fs_get_dir_info // is 20 lines long subdir = e2fsck_get_dir_info (dirent->inode); ... if (subdir->parent) { if (fix_problem(LINK_DIR,..)) { dirent->inode = 0; goto next; } } else { subdir->parent = ino; } }
Declarative check (example 2) SELECT F.* // returns the // false parent(s) FROM DirEntryTable P, C, F WHERE // P says C is its child P.entry_num >= 3 AND P.entry_ino = C.ino AND // and C says P is his parent C.entry_num = 2 AND C.entry_ino = P.ino AND // F also says C is its child F.entry_num >= 3 AND F.entry_ino = C.ino AND F.ino <> P.ino AND F P C
Running declarative checks is part of the problem Must also perform the declarative repairs A repair = An update query Some repairs simply update a few fields A repair = A series of queries Ex: Reconnect an orphan directory to the lost+found directory Combine a series of queries with C code All repairs are written in SQL C code is only used for connecting them Declarative Repairs ... SET T.field = newValue, T.dirty = 1
Outline • Introduction • Analysis • SQCK Design • SQCK Evaluation • Conclusion
SQCK Evaluation • Complexity • 150 queries in 1100 lines of SQL statements • (compared to 16,000 lines of C in e2fsck) • Reliability • Pass hundreds of corruption scenarios • Flexibility • Add new checks/repairs • Enable different versions of e2fsck • Performance • Introduce some optimizations
SQCK vs. e2fsck • Reasonable • First generation of SQCK (with MySQL) • Within 1.5x of e2fsck • Future optimizations • Hierarchical checks • Concurrent queries
Conclusion • Complexity is the enemy of reliability • Recovery code is complex • SQCK: Build recovery tools with a higher-level approach
Thank you!Questions? ADvanced Systems Laboratory www.cs.wisc.edu/adsl