300 likes | 355 Vues
Network Administration. An Analytical Approach. Theory of Network Admin Burgess – Ch.11. Science vs. Technology Studying Complex Systems Purpose of Observation Evaluation Methods and Problems Evaluating a Hierarchical System Deterministic and Stochastic Behaviour Observational Errors
E N D
Network Administration An Analytical Approach
Theory of Network AdminBurgess – Ch.11 • Science vs. Technology • Studying Complex Systems • Purpose of Observation • Evaluation Methods and Problems • Evaluating a Hierarchical System • Deterministic and Stochastic Behaviour • Observational Errors • Strategic Analyses FIT2018 (c) Monash University
A Scientific Basis for System Administration • System admin has always involved experimentation • Development of Networks has lead to exponential increase in system complexity and corresponding increase in difficulty of Management • A purely mechanical approach may no longer be adequate: time for a theoretical basis…. • World-wide interest, encouraged by professional organisations (SAGE, USENIX, ACM, IEEE, ACS) FIT2018 (c) Monash University
Science vs Technology • System Admin studies mostly “Applied Research” which result in the development of a specialised toolset that solves local/specific problem • Some workers have attempted to collate results to form a more general technology of more permanent or global value. • But this is not Science ! Science uses a well defined method of theory development and experiment… FIT2018 (c) Monash University
Scientific Method • Knowledge advanced by series of studies that either verify/falsify a hypothesis • Study may be theoretical or practical but all contribute to a larger on-going discussion that leads to progress • A single study is rarely the end of the discussion • Each study is usually repeated and verified or challenged by other researchers • Reproducibility is very important FIT2018 (c) Monash University
Scientific Method • Motivation – statement of context and objectives • Appraisal of problems • Theoretical Model - used to understand or solve problems and provide a framework for comparison and measurement • Design an experiment – the Approach • Perform an Experiment – obtain Results • Evaluation or Verification of Approach and Results FIT2018 (c) Monash University
Scientific Method • Science is a dialog of Theories • Science proceeds by Experiment • Need Theory to interpret observations • Need observations to disprove Theory FIT2018 (c) Monash University
Studying Complex Systems • Areas of study in System Admin have been Technical and/or Behavioural and include: • Reliability studies • Finding and evaluating methods for system integrity • Observation which apply to non-linear behaviour • Issues related to strategy and planning • Mostly Empirical or Qualitative case study FIT2018 (c) Monash University
Purpose of Observation • Gather Info about a Problem to enable development of a Technology which solves it • To evaluate the Technology for effectiveness (ie whether it fulfils it’s design goals) • But evaluation of SysAdmin experiments is difficult due to Vested Interests and lack of clearly defined metrics FIT2018 (c) Monash University
Evaluation Methods and some Problems • Ideally there should be a repeatable test yielding measurements • The trouble is that while a good system administrator could do this heuristically, these are • Very difficult to quantify… subjective… • Different SysAdmins work in different ways • Extreme variability in systems and users FIT2018 (c) Monash University
A common Research topic and the problems with a scientific study… Eg. Ways to relieve Administrators of tedious work, so they can use there talents better in other ways. What sort of experiment is needed? • Measure time spent working on a system but the time required usually expands to occupy the time available! • Record actions of an automatic system and compare with those of a human administrator but depends on the person - different people do things in different ways FIT2018 (c) Monash University
Other problems: Vested Interests… • SysAdmins require tools… • Such tools often acquire a dedicated following of users who grow to like them regardless of what the tools allow them to achieve • Marketing skills of one developer/vendor might be better than others and create a bias in the userbase that effects the perceived usefulness of the tool • So one cannot estimate the effectiveness of a tool based just on the number of those who use it FIT2018 (c) Monash University
Other problems: Evaluating a Hierarchical System • What level of detailed decomposition of levels within the hierarchy is appropriate? • Building a model of the hierarchy is often the best way to address complexity – focus on what’s important or practical • Experiments based on this model might then involve • Measurements • Simulations • Case studies • User surveys FIT2018 (c) Monash University
O/S crash Program hang Program crash Input problem Output problem Failed required performance Perceived total failure System error message Service Degraded Wrong output No output A sample study: IEEE Fault Management IEEE classify software anomalies as: FIT2018 (c) Monash University
Most common faults for SysAdmin are: • Input Problem • Missing or inappropriate configuration • Failed performance • Usually through loss of resources • Software problems can be eliminated by revaluation of individual software components FIT2018 (c) Monash University
Another sample Study: Reliability • Average (Mean) time before failure • With parallel or redundant components • With serial or dependent components • Probability of Failure FIT2018 (c) Monash University
MTBF and Computers Computer system MTBF doesn’t account for: • Dependency – Not all systems have same attachments • Fail-over and Latency of service Systems may fail, then recover after a single delaythis may occur repeatedly !! • Patterns of usageUser behaviour may bias the outcome • True parallelism is not usually implemented. Although hardware may be parallel, it is usually accessed one unit at a time (ie fail-over) FIT2018 (c) Monash University
System Usage Studies:Some Metrics • Net • Total number of packets • Amount of IP fragmentation • Density of Broadcast messages • Number of Collisions • Number of Sockets(TCP) in and out • Number of malformed packets FIT2018 (c) Monash University
Some Metrics • Storage • Disk Usage in Bytes • Disk Operations per Second • Paging rate (free memory and thrashing) FIT2018 (c) Monash University
Fig 11.2 Daily paging data Error bars exceed variation of data!
Fig 11.3 Weekly paging data Also showing extreme variation
Some Metrics • Processes • Number of privileged processes • Number of non-privileged processes • Maximum percentage CPU used in processes FIT2018 (c) Monash University
Some Metrics • Users • Number logged on • Total Number • Average time spent logged on per user • Load Average • Disk Usage rise per session per user per hour • Latency of Services FIT2018 (c) Monash University
Distributions • Delta – constant X • Uniform – constant Y • Gaussian or Random • Normal – “bell curve” • Black-Body or Planck – approx exponential • Poisson – random arrival with mean rate • Pareto – Power Law FIT2018 (c) Monash University