Research Issues in Cooperative Computing

Research Issues inCooperative Computing Douglas Thain http://www.cse.nd.edu/~ccl

Sharing is Hard! • Despite decades of research in distributed systems and operating systems, sharing computing resources is still technically and socially difficult! • Most existing systems for sharing require: • Kernel level software. • A privileged login. • Centralized trust. • Loss of control over resources that you own.

Cooperative Computing Credo • Let’s create tools and systems that make it easy for users to cooperate (or be selfish) as they see fit. • Modus operandi: • Make tools that are foolproof enough for casual use by one or two people in the office. • If they really are foolproof, then they will also be suitable for deployment in large scale systems such as computational grids.

I need ten more CPUs in order to finish my paper by Friday! CSE grads can computehere, but only when I’m not. May I use your CPUs? CPU CPU CPU CPU CPU CPU Is this person a CSE grad? My friends in Italy need to access this data. I’m not root! auth server secure I/O disk disk disk PBs of workstation storage! Can I use this as a cache? If I can backup to you, you can backup to me. disk disk

Storage is a Funny Resource • Rebuttal: “Storage is large and practically free!” • TB -> PB is *not* free to install or manage. • But, it comes almost accidentally with CPUs. • Aggressive replication (caching) can fill it quickly. • Storage has unusual properties: • Locality: Space needs to be near computation. • Non-locality: Redundant copies must be separated. • Transfer is very expensive compared to reservation. • i.e. Don’t waste an hour transferring unless it will succeed! • Managing storage is different than managing data. • All of this gets worse on the grid.

On the Grid • Quick intro to grid computing: • The vision: Let us make large-scale computing resources as reliable and as accessible as the electric power grid or the public water utility. • The audience: Scientists with grand challenge problems that require unlimited amounts of computing power. More computation == Better results. • The reality today: Tie together computing clusters and archival storage around the country into systems that are (almost) usable by experts.

CPU CPU gate keeper Condor Batch System CPU CPU gate keeper gate keeper Maui Scheduler PBS batch system SMP CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU On the Grid job job job job job job job job Work Queue

Grid Computing Experience Ian Foster, et al. (102 authors) The Grid2003 Production Grid Principles and Practice IEEE HPDC 2004 The Grid2003 Project has deployed a multi-virtual organization, application-driven grid laboratory that has sustained for several months the production-level services required by… ATLAS, CMS, SDSS, LIGO…

Grid Computing Experience The good news: • 27 sites with 2800 CPUs • 40985 CPU-days provided over 6 months • 10 applications with 1300 simultaneous jobs The bad news: • 40-70 percent utilization • 30 percent of jobs would fail • 90 percent of failures were site problems • Most site failures were due to disk space.

Storage Matters All of these environments: Office – Server Room – Grid Computing Require storage to be an allocable, shareable, accountable resource. We need new tools to accomplish this.

What are the Common Problems? • Local Autonomy • Resource Heterogeneity • Complex Access Control • Multiple Users • Competition for Resources • Low Reliability • Complex Debugging

Vision of Cooperative Storage • Make it easy to deploy systems that: • Allow sharing of storage space. • Respect existing human structures. • Provide reasonable space/perf promises. • Work easily and transparently without root. • Make the non-ideal properties manageable: • Limited allocation. (select, renew, migrate) • Unreliable networks. (useful fallback modes) • Changing configuration. (auto. discovery/config)

Where can I find 100 GB for 24 hours? storage catalog access control server Is this a member of the CSE dept? status updates Resource Policy storage server Make reservation and access data Members of the CSE dept can borrow 200 GB for one week. Evict user! Who is here? ? basic filesystem

% cp % emacs % vi GET PUT open close read write chirp tool parrot catalog server hostname kerberos GSI filesystem libchirp libchirp libchirp status updates simple ACL The Current Situation chirp server chirp server chirp server chirp server chirp server

Demo Time!

Research Issues storage server storage server storage server Collective Resource Management Coordinated CPU-I/O Distributed Debugging Single Resource Management storage server Space Allocation Dist Access Control Operating Systems Design storage device operating system Visiting Principals Allocation in FS storage server

Space Allocation • Simple implementation: • Like quotas, keep a flat lookaside database. • Update db on each write, or just periodically. • To recover, re-scan entire filesystem. • Not scalable to large FS or many allocations. • Better implementation: • Keep alloc info hierarchically in the FS. • To recover, re-scan only the dirty subtrees. • A combination of a FS and hierarchical DB. • User representation?

Distributed Access Control • Things I can’t do today: • Give access rights to any CSE grad student on my local (non-AFS) filesystems. • (Where Dr. Madey makes the list each semester.) • Allow members of my conference committee to share my storage space in AFS. • (Where I maintain the membership list.) • Give read access to a valuable data repository to all faculty at Notre Dame and all members of a DHS Biometrics analysis program. • (Where each list is kept elsewhere in the country.)

Distributed Access Control • What will this require? • Separation of ACL services from filesystems. • Simple administrative tools. • Semantics for dealing with failure. • Issues of security and privacy of access lists. • Isn’t this a solved problem? • Not for multiple large-scale organizations. • Not for varying degrees of trust and timeliness. • (ACLs were still a research issue in SOSP 2003.) • The end result: • A highly-specialized distributed database. (DNS)

Nested Principals • How do we represent visiting users? • Let visitors use my uid. • Let visitors use “nobody” (root) • Create a new temporary uid. (root) • Sandbox user and audit every action. (complex) • Simple Idea: Let users create sub-principals. • root -> root:dthain • root:dthain -> root:dthain:afriend • The devil is in the details: • Semantic issues: superiority, equivalence… • Implementation issues: AAA, filesystem, persistence • Philosophical issues: capabilities vs ACLs

storage server storage server storage server data data data Coordinated CPU and I/O • We usually think of a cluster as: • N CPUs + disks to install the OS on. • Use local disks as cache for primary server. • Not smart for data-bound applications. • (As CPUs get faster, everyone is data bound!) • Alternate conception: • Cluster = Storage device with inline CPUs. • Molasses System: • Minimize movement of jobs and/or the data they consume • Large-scale PIM! • Perfect for data exploration. CPU CPU CPU job job job

Coordinated CPU and I/O • We usually think of a cluster as: • N CPUs + disks to install the OS on. • Use local disks as cache for primary server. • Not smart for data-bound applications. • (As CPUs get faster, everyone is data bound!) • Alternate conception: • Cluster = Storage device with inline CPUs. • Molasses System: • Minimize movement of jobs and/or the data they consume • Large-scale PIM! • Perfect for data exploration. ? ? ? ? CPU CPU CPU ! storage server storage server storage server data data data

Distributed Debugging debugger kerberos cpu cpu workload manager auth gateway batch system cpu cpu cpu cpu job log file log file log file archival host license manager storage Server storage server storage server log file log file log file log file log file

Distributed Debugging • Big challenges! • Language issues: storing and combining logs. • Ordering: How to reassemble events? • Completeness: Gaps, losses, detail. • Systems: Distributed data collection. • But, could be a big win: • “A crashes whenever X gets its creds from Y.” • “Please try again: I have turned up the detail on host B.”

Research Issues storage server storage server storage server Collective Resource Management Coordinated CPU-I/O Distributed Debugging Single Resource Management storage server Space Allocation Dist Access Control Operating Systems Design storage device operating system Visiting Principals Allocation in FS storage server

Motto: You must build it and use it in order to understand it!

For more information… • Software, systems, papers, etc… • The Cooperative Computing Lab • http://www.cse.nd.edu/~ccl • Or stop by to chat… • Douglas Thain • 356-D Fitzpatrick • dthain@cse.nd.edu

Research Issues in Cooperative Computing