160 likes | 301 Vues
Deciding when to forget in the Elephant file system. Douglas S. Santry Michael J. Feeley Norman C. Hutchinson Alistair C. Veitch Ross W. Carton Jacob Ofir. Key Idea. Elephant automatically retains all important versions of user files
E N D
Deciding when to forgetin the Elephant file system Douglas S. Santry Michael J. FeeleyNorman C. Hutchinson Alistair C. VeitchRoss W. Carton Jacob Ofir
Key Idea • Elephant automatically retains all important versions of user files • Elephant uses file-grain user-specified retention policies to reclaim storage • Previous file versions are named by combining a traditional pathname with a time when the desired version of a file or directory existed
INTRODUCTION • Modern file systems associate • Deletion of a file with the immediate release of storage • File writes with the irrevocable change of file contents • Users control what is on disk by explicitly creating, updating and deleting files • Best solution when disk space was at a premium
The problem • Key problem with current approach is that user actions have immediate and irrevocableeffect on disk storage • Users are not protected against their own mistakes • Goes against file system objective of protecting data against failure • We can do better today
Current solutions (I) • Cedar protected against accidental overwrites by saving the last few versions of file • Cedar files were immutable: each write created a new version of the file • Does nothing for deleted files • Windows and Mac OS allow users to undelete recently deleted files • Does nothing for files that were overwritten
Current solutions (II) • Many systems are regularly backed up • Can restore the state of any file at backup time • Many users maintain multiple versions of their critical data
Basic issues • Can maintain multiple versions of user filesbut not all versions of all files • Need a retention policy • Should we involve the user in the retention/reclamation decisions? Involving the user means • Less protection from user mistakes • A retention policy that might be better suited to the users’ needs
Not all files are created equal • Read-only files (like application executables) have no version history • Derived files (like object files) can be easily reconstituted • Cached files require no version history • Temporary files might benefit from a short-term history but not from a long-term history • User-modified fileswould benefit most from a long-term and a short-term history
The two objectives • Providing users with the ability of undoing recent changes • Keep the complete history of a file over a short period of time (one hour to one week) • Maintaining a long-term history of important versionsof each file • Keep forever landmark versions of each file
Finding the landmark versions • Could rely on the user • User ability to recognize landmark versions of a file degrades with age of versions • Elephant detects landmark versions bylooking at time line of updates to the file • Can identify groups of updates separated by long periods of stability • Last versions of each group of updates are assumed to be landmark versions
User interface • File versions are • Indexed by their creation time • Named by combining the file pathname with a date and time • Versioning is extended to directories • Allow for recovery of deletes • Previous versions of a file or a directory are read-only
Retention policies (I) • Keep One: only keeps latest version of the file • Keep All:keeps all versions of the file • Keep Safe:keeps all versions of the file during a specific second-chance interval • Keep Landmarks : keeps all versions of the file during a specific second-chance intervaland only landmark versions after that
Retention policies (II) • Keep-Landmarks policy also allows user to group files for consideration • Important for inter-dependent files as their consistency requires viewing all files as of the same point of time • Grouping policy is quite flexible: user can specify • Individual files • Entire directories of subtrees
Implementation (I) • I-nodes of non-versioned files are stored in a special i-node file • I-nodes of versioned files are stored in an i-node log • Versions are stored as an ordered sequence of i-nodes • Changes are detected at the block level • Versions of the same file share identical blocks
Implementation (II) • Elephant use a different mechanism for versioned directories • We did not discuss it in class
Performance • Somewhat slower than conventional file systems • Using HP-UX traces collected at HP Labs one can estimate that Keep-Landmarks files would account for 62.4 % of files but only 15.2% of the disk space