1 / 4

Fabric Management at CERN BT July 16 th 2002 Tony.Cass@ CERN .ch

Fabric Management at CERN BT July 16 th 2002 Tony.Cass@ CERN .ch. The Problem. ~6,000 PCs. Only 1/3 rd of the total capacity is at CERN… Grid Computing. Another ~1,000 boxes. c.f. ~1,500 PCs and ~150 disk servers at CERN today. The Past.

Télécharger la présentation

Fabric Management at CERN BT July 16 th 2002 Tony.Cass@ CERN .ch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fabric Managementat CERNBTJuly 16th 2002Tony.Cass@CERN.ch

  2. The Problem ~6,000 PCs Only 1/3rd ofthe total capacityis at CERN… Grid Computing. Another ~1,000 boxes c.f. ~1,500 PCs and ~150 disk servers at CERN today.

  3. The Past • Automated management tools developed to handle multi-architecture clusters with few tens of nodes. • Good points • Much automation • Solid set of tools • Much accumulated experience • Bad points • Can’t cope with number of systems we have today • Configuration information stored in multiple locations • Monitoring at system level, but users see service failures.

  4. Where we are going • Use Linux standards • RPM, LSB, … • Single location(/interface) for configuration information • Which nodes in which clusters • Node roles, states, required software • Personnel roles (who is allowed to perform what) • Better Installation tools • Guaranteed reproducibility across nodes and over time • Making use of configuration information • Multiple distinct system “images” • Service level monitoring • Making use of configuration information • State Management for • System reconfiguration requests • Both system upgrades and reconfigurations to reflect workload changes • Automatic recovery procedures (and non-automatic if necessary…)

More Related