1 / 47

Theory of System Administration

Theory of System Administration. DANSS Seminar Feb 23 rd , 2003 Elliot Jaffe. Outline. What is System Administration Problems in System Administration Theory overview Results Research directions. What is System Administration?. What do you think?. What is System Administration.

zahina
Télécharger la présentation

Theory of System Administration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Theory of System Administration DANSS Seminar Feb 23rd, 2003 Elliot Jaffe

  2. Outline • What is System Administration • Problems in System Administration • Theory overview • Results • Research directions Danss - Theory of SysAdmin

  3. What is System Administration? What do you think? Danss - Theory of SysAdmin

  4. What is System Administration In computer technology, a set of functions that provides support services, ensures reliable operations, promotes efficient use of the system, and ensures that prescribed service-quality objectives are met. Synonymsystem management. US Federal Standard 1037C Danss - Theory of SysAdmin

  5. System Administration is The function that provides: Reliability – Stable, consistent service Efficiency – Performance Predictability – Service Level Agreement Danss - Theory of SysAdmin

  6. CS HUJI System Administration • Infrastructure • Operating Systems • Networking • Account Administration • Software Licensing, Installation and Support • Education Danss - Theory of SysAdmin

  7. What you don’t see • Budgets • Cost Benefit Analysis • Vendor Selection • Service Contracts • Long term planning • Policy creation Danss - Theory of SysAdmin

  8. Problems in Sys Admin Strategic Tactical Danss - Theory of SysAdmin

  9. Strategic Problems • Economic costs/benefit analysis • How much disk space should be purchased in the next year? • Should we buy a one new router, or do we need a fail-over pair? • If we get %25 additional students, what resources will we need? Danss - Theory of SysAdmin

  10. Strategic Problems #2 • What is the right level of disk space quotas? • Should we use a VLAN to localize network traffic? Danss - Theory of SysAdmin

  11. Tactical Problems • What is the best way to maintain multiple systems? • How do we apply patches? • How should we rollout an OS change? • How do we support multiple configurations? • How many configurations should we support? • How do we use version control part of system administration? Danss - Theory of SysAdmin

  12. A complete theory should enable • Policy determination and evaluation • Strategic decisions about resource usage and allocation • Interactions between users and system for resources • Productivity considerations (economics of the system) • Empirical verification of strategies and policies • Efficiency of policy and its implementation • Efficiency of the system in doing its job Danss - Theory of SysAdmin

  13. Theory of System Administration A group of computers is an evolving, stochastic system viewable at multiple levels of detail. Danss - Theory of SysAdmin

  14. Configuration Space • The memory state of the computer • The set of bits that define the computer state. • Example: • The state of the bits in primary memory and on secondary media (disks) Danss - Theory of SysAdmin

  15. Time • Time is a discrete value. • For averaging purposes, we allow it to take on real values. • Example: • The system clock is discrete, having values as a multiple of the clock speed Tc. • t=0, Tc, 2Tc,…,nTc Danss - Theory of SysAdmin

  16. Configuration • A pattern of values associated with each point on the configuration space. • Example: • The state of all bits in main memory at time t. • This pattern changes over time. Danss - Theory of SysAdmin

  17. Averaging • Over time scales much larger than Tc, the average properties of the system can be treated as a continuum approximation, i.e. as real functions of time. • Example: • The number of non-zero bits at any real value of time. Danss - Theory of SysAdmin

  18. Scales • Transition from low-level to high-level • Group objects together to form new objects • Refer to state of object over time Danss - Theory of SysAdmin

  19. Closed Dynamical Systems • A closed dynamical system consists of a configuration space, an initial configuration and a rule for subsequent time development • Closed dynamical systems are deterministic • Example: • A standalone computer without any external input is a closed dynamical system Danss - Theory of SysAdmin

  20. Interactions • An interaction between two systems is an endomorphism on the combined systems such that both systems determine the time developments of one another. • Example: • Two standalone computers connected via a network and synchronizing system times. Danss - Theory of SysAdmin

  21. Environment • An ensemble of mutually interacting systems. • Example: • A user interacting with a computer. • People are not standalone! Danss - Theory of SysAdmin

  22. Open Dynamical System • Projection of an ensemble of interacting systems onto the state of a given system. • The configuration state of an open system is unpredictable over any interval dt ~ Tc. • Does this mean that all is lost? Danss - Theory of SysAdmin

  23. Stability • Assume that there exists some time scale on which it is possible to predict the average state of the systems in question. • We are not interested in managing systems which cannot achieve a minimal level of stability, since these system cannot perform any reliable function. Danss - Theory of SysAdmin

  24. Multiple Time Scales • Short term: • Tc the computer clock • Medium term: • human time > 107 Tc • Long term: • months and years > 107 human time Danss - Theory of SysAdmin

  25. Components of System State • The state of a system at any given time is composed of a slowly varying local average and a rapidly fluctuating stochastic remainder. • Are these systems stable? State Time Danss - Theory of SysAdmin

  26. Tasks • A task is a representation of an autonomous process executed on related sets of state. • A task is closed if after execution, it returns the system to the original state. • A task is open if after execution, it has changed the overall system state. Danss - Theory of SysAdmin

  27. Maintenance Tasks • A maintenance tasks is a task which reduces the total rate of change of the average configuration state. • Example: • Deletion of accumulated garbage Danss - Theory of SysAdmin

  28. Policy • A policy is an average specification of equivalent system behaviors. • A set of system states that are equivalent over the given time period. • A policy is neither good nor bad. It does not necessarily lead to stability or chaos. Danss - Theory of SysAdmin

  29. Policy - Examples • Users are restricted to a known quota of file system space. • All computers must run Microsoft Office. • Only port 80 will be open on network servers. • SSH will be used for all remote computer access. Danss - Theory of SysAdmin

  30. Convergence • A convergent average policy is one whose tasks result in an equivalent configuration for all sufficiently large time scales. • A convergent average policy is one whose average behavior in time ends in a fixed average state between two sufficiently different time values. Danss - Theory of SysAdmin

  31. Convergence - Example • Deleting temporary files on a regular basis is a convergent policy since it returns the system to a known state (i.e. a given amount of free file system space). Danss - Theory of SysAdmin

  32. Persistent State • A persistent state is a configuration for which the probability of returning to an equivalent configuration at a later time is 1. • Persistence is reflected in the property that the rate of change of the average state is much slower than the rate of change of fast moving variations. Danss - Theory of SysAdmin

  33. Persistent States • The fast variations extend over several complete cycles before any appreciable change in the average is seem. State Time Danss - Theory of SysAdmin

  34. Theorem • In an open system, a policy specifies a class of equivalent persistent states if and only if the policy exhibits average convergence. • You can maintain the state of the system if and only if your policy consistently returns the system to a similar state. i.e. the average resource usage is constant over the policies time scale. Danss - Theory of SysAdmin

  35. Implications • System Administration is the development, specification and implementation of environments and maintenance tasks with the goal of creating a persistent average state. Danss - Theory of SysAdmin

  36. Strategy • Type I • Stochastic models • Type II • Semantic models Danss - Theory of SysAdmin

  37. Type I - Stochastic models • Analyze what is happening on multiple time scales • Describe locally averaged states • Model known boundary conditions • Empirical measurements of existing systems. • Predictive modeling of systems based on measurements. Danss - Theory of SysAdmin

  38. Problems with Stochastic Models • Statistics measurements are rare • No experimental repeatability • Conditions of measurements are constantly changing • Absolute definitions are impossible • People cannot be described by a small number of characteristics Danss - Theory of SysAdmin

  39. Stochastic modeling -- Uses • Strategic planning • Do we need to buy more file servers? • Problem identification • Why is user X using 300% of the normal disk quota? • Why is computer Y rebooting twice a week when all other systems are stable for months? Danss - Theory of SysAdmin

  40. Strategic models • Analyze what might be changed in a system. • Formulate as a game of strategy • Achieve larger goals than just maintaining a persistent state. Danss - Theory of SysAdmin

  41. Strategic Goals • Sys Admin: Keep the system alive and running so that users can perform a maximum amount of work • Benign User: produce useful work using the system. (consumes resources) • Malicious User: Maximize control of system resources Danss - Theory of SysAdmin

  42. Strategic tools • Game Theory • Contests between System Administrator and malicious users. • System Downtime: Mean time to repair / Mean time before failure • Minimize MTTR or maximize MTBF? • Levels of monitoring: At what point does the cost of monitoring overwhelm the benefit? Danss - Theory of SysAdmin

  43. Current research • Recovering File space • System upgrades • Quota systems Danss - Theory of SysAdmin

  44. Recovering File Space • How do you clean unused files? • Competition between users and admins • Trade off between • having enough space to operate • Users recreating temp files that were deleted • Users “grabbing” space for later use Danss - Theory of SysAdmin

  45. Patch Application • How do you apply changes to a distributed system? • Divergence • Convergence • Congruence Danss - Theory of SysAdmin

  46. Quota application • What is the correct way to set file system quotas? • By category • Dynamically assign users to groups • Set group to lowest maximal value Danss - Theory of SysAdmin

  47. Bibliography • Burgess, M. 2003. On the theory of System Administration, Journal of the ACM. • S. Traugott, L. Brown 2002. Why Order Matters: Turing Equivalence in Automated Systems Administration, Lisa 2002 • M. Gilfix, 2002. Holistic Quota Management: The Natural path to a better, more efficient quota system, Lisa 2002 Danss - Theory of SysAdmin

More Related