1 / 8

BEST PRACTICES FOR RELIABLE CARRIER GRADE TELEPHONY Alistair Cunningham, Integrics Ltd.

BEST PRACTICES FOR RELIABLE CARRIER GRADE TELEPHONY Alistair Cunningham, Integrics Ltd. Reliability. Think people and culture, not technology. Complexity is the enemy. Discipline is the answer. Management must be willing to sacrifice features.

nardo
Télécharger la présentation

BEST PRACTICES FOR RELIABLE CARRIER GRADE TELEPHONY Alistair Cunningham, Integrics Ltd.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BEST PRACTICES FOR RELIABLE CARRIER GRADE TELEPHONY Alistair Cunningham, Integrics Ltd.

  2. Reliability Think people and culture, not technology. Complexity is the enemy. Discipline is the answer. Management must be willing to sacrifice features. Reliability for all customers is more important than winning one new customer.

  3. Staff Responibilities Assign a senior engineer as system manager. System manager has ultimate responsibility for whole system. Can delegate tasks to others.

  4. Cluster Architecture Duplicate all important functions. Use heartbeat, DRBD/GFS, application level load balancing. Remember utilities. Consistency between machines is vital. Virtual machines have more outages. Monitor all machines, services, and resources. Daily and monthly backups.

  5. Upgrades and Changes Risk is unpredicable and cumulative. Many small changes are riskier than a few large changes. Test all changes on a staging machine first. Keep records of changes. Consider change management system. Keep customizations to a minimum.

  6. Dealing with Vendors Vendors can never substitute for system manager. Give vendors access to staging machines but not production. Your staff must have debugging skills. Subscribe to security mailing lists.

  7. Causes of Outages Most outages are caused by one of: Untested changes – use staging. Hard disks filling up – use monitoring. Power and network outages – redundancy or split cluster. Avoiding these three is usually sufficient to achieve good reliability.

  8. Questions?

More Related