40 likes | 136 Vues
Efficiently manage tasks with fault tolerance. Track progress, handle failures, ensure machine availability, and safeguard data with periodic checkpoints for uninterrupted workflow.
E N D
Data Structure • The map keeps track of each task and its state • Idle • In progress • Completed • keeps track of the machines identity
Fault Tolerance • Master pings each worker periodically to make sure each node is still there • If a machine fails then the master sends a command to re-execute the task • Master writes periodic check points if master fails
Questions • ??????