1 / 30

COMP2221 Networks in Organisations

COMP2221 Networks in Organisations. Richard Henson May 2013. Week 11 – Troubleshooting & Optimisation. Learning Objectives: Explain the principles of troubleshooting as a means of mitigating against failure

Télécharger la présentation

COMP2221 Networks in Organisations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP2221 Networks in Organisations Richard Henson May 2013

  2. Week 11 – Troubleshooting & Optimisation • Learning Objectives: • Explain the principles of troubleshooting as a means of mitigating against failure • Use the various tools available on a named operating system to identify potential faults and problems • Take appropriate action to stop a fault becoming a failure

  3. “A stitch in time saves nine”

  4. Business - Worst Possible Scenario (1) • There is an interruption in the power supply • UPS is invoked • the interruption continues… • servers all have to be shut down • Power supply restored… • but main domain controller doesn’t reboot • no other domain controllers therefore connect to it • the domain tree fails

  5. Business - Worst Possible Scenario (2) • Organisation cannot do business with the network down… • server can’t be persuaded to boot • new main domain controller has to be commissioned • whole directory tree has to be rebuilt!!! • word spreads very rapidly… • Business loses so much custom, trust, and credibility that even when it starts doing business again customers choose to go elsewhere • without a flourishing customer base… the business folds

  6. Analysis: This scenario shouldn’t have occurred… • Unlikely that the server would fail to boot without prior warning… • warnings would have been presented… • but were clearly not acted upon! • Disaster recovery plan!?! • not formulated? • not tested? • not effective (in the event of a domain tree controller failure…)

  7. But it does… • Actual example (15th Feb 2010): • root domain controller [on the network] had not been backed up for 10 months, when it crashed (well… at least it had been backed up at some time…) • http://searchwindowsserver.techtarget.com/generic/0,295582,sid68_gci1381567,00.html • The consultant called in to fix it reported that: • “I had never seen a case where the forest root domain had to be recovered -- and I couldn't find anyone who had.”

  8. Analysis: Who is to blame? (1) • In this example, the organisation said they were following Microsoft guidelines • they set up an empty root domain • the root domain controller had a RAID-5 (best) disk configuration • Was true, to some extent… • Microsoft did espouse this as best practice… (in the year 2000!) • guidelines had changed since then…

  9. Analysis: Who is to blame? (2) • The disaster that struck was: • two RAID drives failed on the same day! • unlucky? possible to prepare for this? • The recovery process took about three weeks • most of the time was spent studying logs, doing the restore, etc. • In this case, the tree was still able to function without a root domain • business was able to continue • customer base wasn’t compromised…

  10. Fault Tolerance and Risk Assessment • General “common sense” principle: • always have a backup • ESPECIALLY for the most important computer on the network… • Q: • How can you tell what needs backing up? • A: • Risk Assessment and Risk Management

  11. Why not Risk Management? • Time consuming! • However, without proper risk management… • how does the organisation know what processes are most important to its functioning? • how can an organisation provide resources to protect aspects of its network?

  12. Risk Management and Risk Assessment • Risk Assessment is an essential first step • requires putting a “value” on assets • more valuable… greater protection • Do information assets have value? • organisations still failing to acknowledge that they do… • categorisation of information assets therefore potentially problematic • need to look at the consequence to the organisation of losing that asset…

  13. How do you back up a Domain Controller? • The Windows “Backup” program works, and can easily be scheduled • but heavily criticised… • even the 2008 server version… • Third Party products give more flexibility and protection e.g. : • Recovery Manager • http://www.quest.com/recovery-manager-for-active-directory • Backup Exec • http://www.symantec.com/business/products/family.jsp?familyid=backupexec

  14. Prevention is Better than Cure • A server shouldn’t crash unexpectedly! • should be kept cool (environmental unit mustn’t break down!) • monitoring should show that unexpected things are happening • action can then (usually) be taken to take care of the unexpected • Many tools available to: • Check/monitor the system on a regular basis • Provide stats/ to administrators • could also be used for security purposes • Generate alerts if something is starting to go wrong…

  15. Troubleshooting Tools for a Windows Server: Task Manager • Applications tab: • shows which applications are running • enables changing of process priority • use view/update speed • used to • open new applications • shut rogue applications down

  16. Task Manager (continued) • Processes tab: • all system processes • Memory usage of each • % CPU time for each • total CPU time since boot up • also used to close a process down • careful! (but you get a warning…)

  17. Task Manager (continued) • Performance tab: • total no. of threads, processes, handles running • Graph: % CPU usage • User mode • Kernel mode (optional: view menu) • graph per CPU (optional: view menu) • physical (Page File) memory available/usage • virtual memory available/usage

  18. Event Viewer • Events recorded into “event log” files • System log • Auditing log (customisable) • Application log • customisable - additional files • New files recorded daily; old ones archived • time before archiving also customisable

  19. Event Viewer • Three types of events recorded in log: • Information • Warning • Error • More information on each event obtained by double-clicking • make note of event code • heed and take action if necessary

  20. Using Event Viewer • Wise to check all event logs regularly • take time/trouble to find out that those messages really mean… • The action is needed that it • sort out potential problems now • Make sure they don’t become real ones later…

  21. Auditing Further Events • Any “object” can be audited • Objects to audit, and processes audited can be set through audit (group) policy • Using MMC & relevant snap-in • Types of process audited: • access • attempt to access

  22. Security auditing • Same principles as general auditing • Refers to “restricted” objects • Events appear in separate security log

  23. Event Management software (SIEM) • Who’s going to look at all these log files? • in practice, often no-one.. • Solution – SIEM software to analyse and present information from: • network and security devices • identity & access management applications • vulnerability management/policy compliance tools • os, database & application logs • external threat data http://www.focus.com/briefs/how-select-security-information-and-event-management-siem

  24. Performance Monitor • Not available on disk • To obtain and download Performance Monitor Wizard (PerfWiz), visit the following Web site: • http://www.microsoft.com/downloads/details.aspx?FamilyID=31fccd98-c3a1-4644-9622-faa046d69214&displaylang=en

  25. What if the machine doesn’t boot… • Tools available: • The boot error itself • blue screen? driver software • constant reboot? motherboard • Last Known Good… • Gives machine a chance to go back to the previous (usually last but one) configuration

  26. What if the machine doesn’t boot… (continued) • Safe Mode • includes VGA Mode or boot logging • Debugging mode also available • output difficult to decipher for non-experts • Recovery Console • “DOS-type prompt” for performing minor repairs

  27. What if the machine doesn’t boot… (continued) • System Configuration Utility (Msconfig.exe) • automates the routine troubleshooting steps relating to Windows configuration issues • can be used to modify the system configuration and troubleshoot the problem using a process-of-elimination method

  28. What if the machine doesn’t boot… (continued) • Emergency Repair Disk (ERD) • reboot machine using different media • e,g. floppy disk (yes… still possible) • media should be generated BEFORE it needs to be used! • option to create the ERD during the set up process…

  29. What if the machine doesn’t boot… (continued) • Full restore • assumes a full backup has already been made • still have to: • reformat hard disk from scratch… • and then restore the backup files using backup/restore option…. • but better than losing all your data!

  30. Optimisation… • All about improving the performance of system resources… • A network manager should never have “nothing to do…”

More Related