Mobility Victoria Krafft CS 614 10/25/05
General Idea • People and their machines move around • Machines want to share data • Networks and machines fail • Network connections not available everywhere
Solution: Modify network file systems so data can still be accessed when a system is not connected to the network.
Previous Work • Grapevine developed in 1982 • Network File System (NFS) developed in 1984 • Andrew File System (AFS) developed in 1985 • Version control systems
Coda • James Kistler and M. Satyanarayanan in 1992 • Lots of workstations with local disks, laptops, which use distributed file systems • Expand existing cache structures to store all the desired data • Want to continue work through outages
Environment • Lots of untrusted Unix clients • A few trusted servers • High-bandwidth LAN when connected
Basic Design • Shared Unix file system, with volumes mapped to individual file servers • Client cache manager (Venus) • Volume on all servers in Volume Storage Group (VSG) • Client notified when cache no longer valid
Design Decisions • Scalability • Shift work to clients • First Class vs Second Class Replication • Servers know more than clients • Optimistic vs Pessimistic replica control • availability, or consistency?
Hoarding • Gather all useful data in cache • User-specified critical data • Data currently in use • Cache equilibrium maintained by hoard walking • Update file priority & critical directories • Re-evaluate priority of cached files • Re-fetch outdated files
Emulation • Venus acts as a pseudo-server • Create replay log containing all updates • Store critical data in recoverable virtual memory in case of crash
Reintegration • Run replay algorithm on each volume • Replay Algorithm: • Parse log and lock contents • Log operations validated and executed • Perform data transfers • Commit and release locks
Bayou’s Anti-Entropy Algorithm • Karin Petersen, Mike J. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers in 1997 • Weakly consistent replication • Update anywhere model
Environment • Trusted servers • WAN as well as LAN
Algorithm Provides • Support for arbitrary communication topologies • Operation on low-bandwidth networks • Incremental progress • Eventual consistency • Efficient storage management • Propagation through off-line mechanisms • Arbitrary policy choices
How it works • Storage system on each replica (server) contains: • Ordered log of writes • Database which results from those writes • Pairs of servers reconcile by bringing each other up to date • Epidemic behavior ensures spread • Eventually, commit and truncate write log
Server Reconciliation • When server gets write from client, write is logged with accept-stamp • For server S to update server R: • S gets version vector from R • S sends all entries in its write log not in version vector of R. • Enforces property that each server which contains write n from server X has all writes < n from server X.
Write Log Management Or: How to avoid using infinite amounts of disk space • Designated primary replica creates commit sequence number (CSN) when it writes to its database • Each server manages its own log, but discards only stable (committed) writes
Revised update process • If S has committed writes R does not know, send to R • Continue with previous algorithm End result: Write A precedes write B if A has smaller CSN, or if both uncommitted, accepted by the same server, and A accepted before B.
Write Log Truncation • Server can remove committed writes from log file. • If a server has deleted writes needed to reconcile with another server, the database must be copied.
Protocol Extensions • Transportable media • Stable write order provides eventual consistency • Light-weight server creation/retirement
Server Creation/Deletion • Server creates itself by sending a creation write to an existing server • Server retires by: • Sending retirement write • Stops accepting new writes • Reconciles database with at least one other server
Final Questions • Peer-to-peer or server-based architecture? • Conflict reconciliation? • Vulnerability to failures?