1 / 47

480 likes | 722 Vues

Theory of System Administration. DANSS Seminar Feb 23 rd , 2003 Elliot Jaffe. Outline. What is System Administration Problems in System Administration Theory overview Results Research directions. What is System Administration?. What do you think?. What is System Administration.

Télécharger la présentation
## Theory of System Administration

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Theory of System Administration**DANSS Seminar Feb 23rd, 2003 Elliot Jaffe**Outline**• What is System Administration • Problems in System Administration • Theory overview • Results • Research directions Danss - Theory of SysAdmin**What is System Administration?**What do you think? Danss - Theory of SysAdmin**What is System Administration**In computer technology, a set of functions that provides support services, ensures reliable operations, promotes efficient use of the system, and ensures that prescribed service-quality objectives are met. Synonymsystem management. US Federal Standard 1037C Danss - Theory of SysAdmin**System Administration is**The function that provides: Reliability – Stable, consistent service Efficiency – Performance Predictability – Service Level Agreement Danss - Theory of SysAdmin**CS HUJI System Administration**• Infrastructure • Operating Systems • Networking • Account Administration • Software Licensing, Installation and Support • Education Danss - Theory of SysAdmin**What you don’t see**• Budgets • Cost Benefit Analysis • Vendor Selection • Service Contracts • Long term planning • Policy creation Danss - Theory of SysAdmin**Problems in Sys Admin**Strategic Tactical Danss - Theory of SysAdmin**Strategic Problems**• Economic costs/benefit analysis • How much disk space should be purchased in the next year? • Should we buy a one new router, or do we need a fail-over pair? • If we get %25 additional students, what resources will we need? Danss - Theory of SysAdmin**Strategic Problems #2**• What is the right level of disk space quotas? • Should we use a VLAN to localize network traffic? Danss - Theory of SysAdmin**Tactical Problems**• What is the best way to maintain multiple systems? • How do we apply patches? • How should we rollout an OS change? • How do we support multiple configurations? • How many configurations should we support? • How do we use version control part of system administration? Danss - Theory of SysAdmin**A complete theory should enable**• Policy determination and evaluation • Strategic decisions about resource usage and allocation • Interactions between users and system for resources • Productivity considerations (economics of the system) • Empirical verification of strategies and policies • Efficiency of policy and its implementation • Efficiency of the system in doing its job Danss - Theory of SysAdmin**Theory of System Administration**A group of computers is an evolving, stochastic system viewable at multiple levels of detail. Danss - Theory of SysAdmin**Configuration Space**• The memory state of the computer • The set of bits that define the computer state. • Example: • The state of the bits in primary memory and on secondary media (disks) Danss - Theory of SysAdmin**Time**• Time is a discrete value. • For averaging purposes, we allow it to take on real values. • Example: • The system clock is discrete, having values as a multiple of the clock speed Tc. • t=0, Tc, 2Tc,…,nTc Danss - Theory of SysAdmin**Configuration**• A pattern of values associated with each point on the configuration space. • Example: • The state of all bits in main memory at time t. • This pattern changes over time. Danss - Theory of SysAdmin**Averaging**• Over time scales much larger than Tc, the average properties of the system can be treated as a continuum approximation, i.e. as real functions of time. • Example: • The number of non-zero bits at any real value of time. Danss - Theory of SysAdmin**Scales**• Transition from low-level to high-level • Group objects together to form new objects • Refer to state of object over time Danss - Theory of SysAdmin**Closed Dynamical Systems**• A closed dynamical system consists of a configuration space, an initial configuration and a rule for subsequent time development • Closed dynamical systems are deterministic • Example: • A standalone computer without any external input is a closed dynamical system Danss - Theory of SysAdmin**Interactions**• An interaction between two systems is an endomorphism on the combined systems such that both systems determine the time developments of one another. • Example: • Two standalone computers connected via a network and synchronizing system times. Danss - Theory of SysAdmin**Environment**• An ensemble of mutually interacting systems. • Example: • A user interacting with a computer. • People are not standalone! Danss - Theory of SysAdmin**Open Dynamical System**• Projection of an ensemble of interacting systems onto the state of a given system. • The configuration state of an open system is unpredictable over any interval dt ~ Tc. • Does this mean that all is lost? Danss - Theory of SysAdmin**Stability**• Assume that there exists some time scale on which it is possible to predict the average state of the systems in question. • We are not interested in managing systems which cannot achieve a minimal level of stability, since these system cannot perform any reliable function. Danss - Theory of SysAdmin**Multiple Time Scales**• Short term: • Tc the computer clock • Medium term: • human time > 107 Tc • Long term: • months and years > 107 human time Danss - Theory of SysAdmin**Components of System State**• The state of a system at any given time is composed of a slowly varying local average and a rapidly fluctuating stochastic remainder. • Are these systems stable? State Time Danss - Theory of SysAdmin**Tasks**• A task is a representation of an autonomous process executed on related sets of state. • A task is closed if after execution, it returns the system to the original state. • A task is open if after execution, it has changed the overall system state. Danss - Theory of SysAdmin**Maintenance Tasks**• A maintenance tasks is a task which reduces the total rate of change of the average configuration state. • Example: • Deletion of accumulated garbage Danss - Theory of SysAdmin**Policy**• A policy is an average specification of equivalent system behaviors. • A set of system states that are equivalent over the given time period. • A policy is neither good nor bad. It does not necessarily lead to stability or chaos. Danss - Theory of SysAdmin**Policy - Examples**• Users are restricted to a known quota of file system space. • All computers must run Microsoft Office. • Only port 80 will be open on network servers. • SSH will be used for all remote computer access. Danss - Theory of SysAdmin**Convergence**• A convergent average policy is one whose tasks result in an equivalent configuration for all sufficiently large time scales. • A convergent average policy is one whose average behavior in time ends in a fixed average state between two sufficiently different time values. Danss - Theory of SysAdmin**Convergence - Example**• Deleting temporary files on a regular basis is a convergent policy since it returns the system to a known state (i.e. a given amount of free file system space). Danss - Theory of SysAdmin**Persistent State**• A persistent state is a configuration for which the probability of returning to an equivalent configuration at a later time is 1. • Persistence is reflected in the property that the rate of change of the average state is much slower than the rate of change of fast moving variations. Danss - Theory of SysAdmin**Persistent States**• The fast variations extend over several complete cycles before any appreciable change in the average is seem. State Time Danss - Theory of SysAdmin**Theorem**• In an open system, a policy specifies a class of equivalent persistent states if and only if the policy exhibits average convergence. • You can maintain the state of the system if and only if your policy consistently returns the system to a similar state. i.e. the average resource usage is constant over the policies time scale. Danss - Theory of SysAdmin**Implications**• System Administration is the development, specification and implementation of environments and maintenance tasks with the goal of creating a persistent average state. Danss - Theory of SysAdmin**Strategy**• Type I • Stochastic models • Type II • Semantic models Danss - Theory of SysAdmin**Type I - Stochastic models**• Analyze what is happening on multiple time scales • Describe locally averaged states • Model known boundary conditions • Empirical measurements of existing systems. • Predictive modeling of systems based on measurements. Danss - Theory of SysAdmin**Problems with Stochastic Models**• Statistics measurements are rare • No experimental repeatability • Conditions of measurements are constantly changing • Absolute definitions are impossible • People cannot be described by a small number of characteristics Danss - Theory of SysAdmin**Stochastic modeling -- Uses**• Strategic planning • Do we need to buy more file servers? • Problem identification • Why is user X using 300% of the normal disk quota? • Why is computer Y rebooting twice a week when all other systems are stable for months? Danss - Theory of SysAdmin**Strategic models**• Analyze what might be changed in a system. • Formulate as a game of strategy • Achieve larger goals than just maintaining a persistent state. Danss - Theory of SysAdmin**Strategic Goals**• Sys Admin: Keep the system alive and running so that users can perform a maximum amount of work • Benign User: produce useful work using the system. (consumes resources) • Malicious User: Maximize control of system resources Danss - Theory of SysAdmin**Strategic tools**• Game Theory • Contests between System Administrator and malicious users. • System Downtime: Mean time to repair / Mean time before failure • Minimize MTTR or maximize MTBF? • Levels of monitoring: At what point does the cost of monitoring overwhelm the benefit? Danss - Theory of SysAdmin**Current research**• Recovering File space • System upgrades • Quota systems Danss - Theory of SysAdmin**Recovering File Space**• How do you clean unused files? • Competition between users and admins • Trade off between • having enough space to operate • Users recreating temp files that were deleted • Users “grabbing” space for later use Danss - Theory of SysAdmin**Patch Application**• How do you apply changes to a distributed system? • Divergence • Convergence • Congruence Danss - Theory of SysAdmin**Quota application**• What is the correct way to set file system quotas? • By category • Dynamically assign users to groups • Set group to lowest maximal value Danss - Theory of SysAdmin**Bibliography**• Burgess, M. 2003. On the theory of System Administration, Journal of the ACM. • S. Traugott, L. Brown 2002. Why Order Matters: Turing Equivalence in Automated Systems Administration, Lisa 2002 • M. Gilfix, 2002. Holistic Quota Management: The Natural path to a better, more efficient quota system, Lisa 2002 Danss - Theory of SysAdmin

More Related