What is Fabric Management?

Fabric ManagementCCDB2 RTAGApril 23rd 2002Tony.Cass@CERN.chwith much help from German Cancio Melia

What is Fabric Management? Maintaining Large clusters of servers In specific desired state In specific desired state(s)

What does this mean/involve? • Maintain • Large clusters • In desired state

What does this mean/involve? • Maintain • Install • Upgrade • Verify • Large clusters • In desired state

What does this mean/involve? • Maintain • Install: Two options • Image • Pro: All systems identical by construction • Con: Building & storing images • Con: Inflexible; reboot almost always required on change; this is disruptive: imagine impact of urgent security patch to application code or updating routing tables for tierX<->tierY transfers. • “Known Process” • Pro: Flexible; reboots only when essential • Con: guaranteeing reproducibility, especially over time. • Upgrade • Verify • Large clusters • In desired state

What does this mean/involve? • Maintain • Install: Two options • Image • Early approach: no standard installation procedures: easy to build image then replicate, very hard to define “known process” except on paper. • “Known Process” • Standardised s/w installation systems, e.g. RPM, bring known process fabric management comes to the fore---define which packages to install, then the installation tool handles the rest, including dependency issues. • Upgrade • Verify • Large clusters • In desired state

What does this mean/involve? • Maintain • Install • Upgrade • Clearly follows from choice of installation mechanism. • For image systems, upgrade is essentially installation of the new image • For known process systems, software package management and/or configuration systems adjust node to match change in desired state. • Verify • Large clusters • In desired state

What does this mean/involve? • Maintain • Install • Upgrade • Verify • As we’ve seen, verification that software is as desired is essential in known process systems: Did we get what we wanted? • But also, “do we still have what we want”? And this is equally needed for image installs: has anything changed, especially wrt security. • Software monitoring systems should be well integrated with the overall system monitoring • Raise alarms for exceptions and ensure they are followed just as for file system full errors. • Large clusters • In desired state

What does this mean/involve? • Maintain • Large clusters • Many boxes, so need to worry about • System errors & failures (what if system out for repair during upgrade?) • Mundane box related issues: arrivals, departures, repairs • Workflow for system upgrades (drain, upgrade, restart, …) • … • Most site dependent part of fabric management • In desired state

What does this mean/involve? • Maintain • Large clusters • In desired state • Need a way to • specify • update • recover • the desired state for each system. • This is fairly easy (well, apart from recover…); you just need a database associating some key (host name, MAC address) with the software packages & required configuration.

What does this mean/involve? • Maintain • Large clusters • In desired state(s) • The ease of specification of multiple states is the harder and more important part • define characteristics for clusters, not systems • host configuration defined by cluster membership, but should be able to override any aspect • inheritance especially useful • many system configuration details (ntp, name servers, …) are independent of system function; define these once and propagate to all clusters • allow similar clusters to share definition of the common configuration definition---avoid potential for drift if only one cluster definition is updated.

Standards Interlude • There are none. • Software installation tools exist for many platforms and distributions but all differ • Still, a good Fabric Management system should have a high level interface allowing free choice at this level • e.g. quattor: interfaced to both RH & Solaris installation tools • No widely acknowledged standards for defining system configuration. • Choices in this area generally define the different fabric management suites • “rules based” systems (cfengine) • “configuration language” systems (LCFG(ng), quattor) • There is work in this area, but obvious common standards are still far away. • CIM, HP/IBM work to define web services based standards, DCML

Some Systems • ELFms • Rocks • Cfengine • LCFG(ng) • OSCAR/SIS • Ganglia • MonALISA

Some Systems • ELFms • A complete package with • quattor (aii/spma/ncm) known process installation • Lemon monitoring integrated • Leaf for workflow management of software hardware processes • Rocks • Cfengine • LCFG(ng) • OSCAR/SIS • Ganglia • MonALISA

Some Systems • ELFms • Rocks • RH specific system, kickstart based but reinstalls nodes for configuration changes. • Limited config capabilites • No support for multiple packages versions (either in repository or on a node) • Cfengine • LCFG(ng) • OSCAR/SIS • Ganglia • MonALISA

Some Systems • ELFms • Rocks • Cfengine • A set of tools to administer and configure systems • Rules based approach • state maintained in set of rule files; cfengine tools read these, check the status and update systems accordingly • LCFG(ng) • OSCAR/SIS • Ganglia • MonALISA

Some Systems • ELFms • Rocks • Cfengine • LCFG(ng) • Known process installation and configuration • Key feature is introduction of “language” for description of required system configuration. • this approach adopted and enhanced by EDG/WP4 for quattor • OSCAR/SIS • Ganglia • MonALISA

Some Systems • ELFms • Rocks • Cfengine • LCFG(ng) • OSCAR/SIS • Image based installation (SIS) • Ganglia • MonALISA

Some Systems • ELFms • Rocks • Cfengine • LCFG(ng) • OSCAR/SIS • Ganglia • “a scalable distributed monitoring system for high-performance computing systems” • can monitor many standard parameters for systems • but not integrated with s/w installation systems for verification • MonALISA

Some Systems • ELFms • Rocks • Cfengine • LCFG(ng) • OSCAR/SIS • Ganglia • MonALISA • Distributed monitoring system • Aimed at performance issues, not integration with installation frameworks • Can collect input from other monitoring systems (e.g. Lemon) as well as directly from nodes.

Summary • Fabric Management is concerned with maintaining large clusters in defined states, handling evolution over time. • Installation/Upgrade can be via disk image or a more flexible “known process” • No standards (yet) for definition of system configuration • Installation toolkits mostly differ in approach in this area. • Many monitoring systems, but these are independent developments, mostly concentrating on performance related metrics. • ELFms integrates quattor installation and configuration toolkit with the Lemon monitoring system to provide tight control over node status • and adds a (CERN specific) package to manage software and hardware workflows.

What is Fabric Management?

What is Fabric Management?

Presentation Transcript

powerpoint presentation

Powerpoint presentation

PPT Presentation

Risk Management PowerPoint Presentation

PowerPoint presentation

PowerPoint Presentation.

talk-ppt - PowerPoint Presentation

What Makes an Effective PowerPoint Presentation?

Vendor Management PowerPoint Presentation

WHAT IS PRESENTATION?

PowerPoint Presentation

This is a powerpoint presentation

This is a powerpoint presentation

This is a test ppt presentation

PowerPoint Presentation

PowerPoint Presentation

Fabric Management

What is PowerPoint?

What is Microsoft PowerPoint?

Full Service Moving Plano TX - PowerPoint PPT Presentation

What if the Assessment is not Fair (PowerPoint Presentation)

IEinfosoft.Pvt.Ltd Powerpoint PPT Presentation.