480 likes | 722 Vues
Theory of System Administration. DANSS Seminar Feb 23 rd , 2003 Elliot Jaffe. Outline. What is System Administration Problems in System Administration Theory overview Results Research directions. What is System Administration?. What do you think?. What is System Administration.
E N D
Theory of System Administration DANSS Seminar Feb 23rd, 2003 Elliot Jaffe
Outline • What is System Administration • Problems in System Administration • Theory overview • Results • Research directions Danss - Theory of SysAdmin
What is System Administration? What do you think? Danss - Theory of SysAdmin
What is System Administration In computer technology, a set of functions that provides support services, ensures reliable operations, promotes efficient use of the system, and ensures that prescribed service-quality objectives are met. Synonymsystem management. US Federal Standard 1037C Danss - Theory of SysAdmin
System Administration is The function that provides: Reliability – Stable, consistent service Efficiency – Performance Predictability – Service Level Agreement Danss - Theory of SysAdmin
CS HUJI System Administration • Infrastructure • Operating Systems • Networking • Account Administration • Software Licensing, Installation and Support • Education Danss - Theory of SysAdmin
What you don’t see • Budgets • Cost Benefit Analysis • Vendor Selection • Service Contracts • Long term planning • Policy creation Danss - Theory of SysAdmin
Problems in Sys Admin Strategic Tactical Danss - Theory of SysAdmin
Strategic Problems • Economic costs/benefit analysis • How much disk space should be purchased in the next year? • Should we buy a one new router, or do we need a fail-over pair? • If we get %25 additional students, what resources will we need? Danss - Theory of SysAdmin
Strategic Problems #2 • What is the right level of disk space quotas? • Should we use a VLAN to localize network traffic? Danss - Theory of SysAdmin
Tactical Problems • What is the best way to maintain multiple systems? • How do we apply patches? • How should we rollout an OS change? • How do we support multiple configurations? • How many configurations should we support? • How do we use version control part of system administration? Danss - Theory of SysAdmin
A complete theory should enable • Policy determination and evaluation • Strategic decisions about resource usage and allocation • Interactions between users and system for resources • Productivity considerations (economics of the system) • Empirical verification of strategies and policies • Efficiency of policy and its implementation • Efficiency of the system in doing its job Danss - Theory of SysAdmin
Theory of System Administration A group of computers is an evolving, stochastic system viewable at multiple levels of detail. Danss - Theory of SysAdmin
Configuration Space • The memory state of the computer • The set of bits that define the computer state. • Example: • The state of the bits in primary memory and on secondary media (disks) Danss - Theory of SysAdmin
Time • Time is a discrete value. • For averaging purposes, we allow it to take on real values. • Example: • The system clock is discrete, having values as a multiple of the clock speed Tc. • t=0, Tc, 2Tc,…,nTc Danss - Theory of SysAdmin
Configuration • A pattern of values associated with each point on the configuration space. • Example: • The state of all bits in main memory at time t. • This pattern changes over time. Danss - Theory of SysAdmin
Averaging • Over time scales much larger than Tc, the average properties of the system can be treated as a continuum approximation, i.e. as real functions of time. • Example: • The number of non-zero bits at any real value of time. Danss - Theory of SysAdmin
Scales • Transition from low-level to high-level • Group objects together to form new objects • Refer to state of object over time Danss - Theory of SysAdmin
Closed Dynamical Systems • A closed dynamical system consists of a configuration space, an initial configuration and a rule for subsequent time development • Closed dynamical systems are deterministic • Example: • A standalone computer without any external input is a closed dynamical system Danss - Theory of SysAdmin
Interactions • An interaction between two systems is an endomorphism on the combined systems such that both systems determine the time developments of one another. • Example: • Two standalone computers connected via a network and synchronizing system times. Danss - Theory of SysAdmin
Environment • An ensemble of mutually interacting systems. • Example: • A user interacting with a computer. • People are not standalone! Danss - Theory of SysAdmin
Open Dynamical System • Projection of an ensemble of interacting systems onto the state of a given system. • The configuration state of an open system is unpredictable over any interval dt ~ Tc. • Does this mean that all is lost? Danss - Theory of SysAdmin
Stability • Assume that there exists some time scale on which it is possible to predict the average state of the systems in question. • We are not interested in managing systems which cannot achieve a minimal level of stability, since these system cannot perform any reliable function. Danss - Theory of SysAdmin
Multiple Time Scales • Short term: • Tc the computer clock • Medium term: • human time > 107 Tc • Long term: • months and years > 107 human time Danss - Theory of SysAdmin
Components of System State • The state of a system at any given time is composed of a slowly varying local average and a rapidly fluctuating stochastic remainder. • Are these systems stable? State Time Danss - Theory of SysAdmin
Tasks • A task is a representation of an autonomous process executed on related sets of state. • A task is closed if after execution, it returns the system to the original state. • A task is open if after execution, it has changed the overall system state. Danss - Theory of SysAdmin
Maintenance Tasks • A maintenance tasks is a task which reduces the total rate of change of the average configuration state. • Example: • Deletion of accumulated garbage Danss - Theory of SysAdmin
Policy • A policy is an average specification of equivalent system behaviors. • A set of system states that are equivalent over the given time period. • A policy is neither good nor bad. It does not necessarily lead to stability or chaos. Danss - Theory of SysAdmin
Policy - Examples • Users are restricted to a known quota of file system space. • All computers must run Microsoft Office. • Only port 80 will be open on network servers. • SSH will be used for all remote computer access. Danss - Theory of SysAdmin
Convergence • A convergent average policy is one whose tasks result in an equivalent configuration for all sufficiently large time scales. • A convergent average policy is one whose average behavior in time ends in a fixed average state between two sufficiently different time values. Danss - Theory of SysAdmin
Convergence - Example • Deleting temporary files on a regular basis is a convergent policy since it returns the system to a known state (i.e. a given amount of free file system space). Danss - Theory of SysAdmin
Persistent State • A persistent state is a configuration for which the probability of returning to an equivalent configuration at a later time is 1. • Persistence is reflected in the property that the rate of change of the average state is much slower than the rate of change of fast moving variations. Danss - Theory of SysAdmin
Persistent States • The fast variations extend over several complete cycles before any appreciable change in the average is seem. State Time Danss - Theory of SysAdmin
Theorem • In an open system, a policy specifies a class of equivalent persistent states if and only if the policy exhibits average convergence. • You can maintain the state of the system if and only if your policy consistently returns the system to a similar state. i.e. the average resource usage is constant over the policies time scale. Danss - Theory of SysAdmin
Implications • System Administration is the development, specification and implementation of environments and maintenance tasks with the goal of creating a persistent average state. Danss - Theory of SysAdmin
Strategy • Type I • Stochastic models • Type II • Semantic models Danss - Theory of SysAdmin
Type I - Stochastic models • Analyze what is happening on multiple time scales • Describe locally averaged states • Model known boundary conditions • Empirical measurements of existing systems. • Predictive modeling of systems based on measurements. Danss - Theory of SysAdmin
Problems with Stochastic Models • Statistics measurements are rare • No experimental repeatability • Conditions of measurements are constantly changing • Absolute definitions are impossible • People cannot be described by a small number of characteristics Danss - Theory of SysAdmin
Stochastic modeling -- Uses • Strategic planning • Do we need to buy more file servers? • Problem identification • Why is user X using 300% of the normal disk quota? • Why is computer Y rebooting twice a week when all other systems are stable for months? Danss - Theory of SysAdmin
Strategic models • Analyze what might be changed in a system. • Formulate as a game of strategy • Achieve larger goals than just maintaining a persistent state. Danss - Theory of SysAdmin
Strategic Goals • Sys Admin: Keep the system alive and running so that users can perform a maximum amount of work • Benign User: produce useful work using the system. (consumes resources) • Malicious User: Maximize control of system resources Danss - Theory of SysAdmin
Strategic tools • Game Theory • Contests between System Administrator and malicious users. • System Downtime: Mean time to repair / Mean time before failure • Minimize MTTR or maximize MTBF? • Levels of monitoring: At what point does the cost of monitoring overwhelm the benefit? Danss - Theory of SysAdmin
Current research • Recovering File space • System upgrades • Quota systems Danss - Theory of SysAdmin
Recovering File Space • How do you clean unused files? • Competition between users and admins • Trade off between • having enough space to operate • Users recreating temp files that were deleted • Users “grabbing” space for later use Danss - Theory of SysAdmin
Patch Application • How do you apply changes to a distributed system? • Divergence • Convergence • Congruence Danss - Theory of SysAdmin
Quota application • What is the correct way to set file system quotas? • By category • Dynamically assign users to groups • Set group to lowest maximal value Danss - Theory of SysAdmin
Bibliography • Burgess, M. 2003. On the theory of System Administration, Journal of the ACM. • S. Traugott, L. Brown 2002. Why Order Matters: Turing Equivalence in Automated Systems Administration, Lisa 2002 • M. Gilfix, 2002. Holistic Quota Management: The Natural path to a better, more efficient quota system, Lisa 2002 Danss - Theory of SysAdmin