1 / 20

Causeway—The Drew Cluster Project

Causeway—The Drew Cluster Project. Mike Richichi (mrichich@drew.edu) Paul Coen (pcoen@drew.edu) Drew University TTP 2002. Drew at a glance. ~2200 students All students receive a laptop as part of tuition Technology relatively well integrated in the curriculum

quynh
Télécharger la présentation

Causeway—The Drew Cluster Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Causeway—The Drew Cluster Project Mike Richichi (mrichich@drew.edu) Paul Coen (pcoen@drew.edu) Drew University TTP 2002

  2. Drew at a glance • ~2200 students • All students receive a laptop as part of tuition • Technology relatively well integrated in the curriculum • eDirectory is standardized authentication mechanism • NetWare, Netmail, ZENworks, BorderManager . . .

  3. The Problem • All faculty and staff dependent on a single file server (NW 5.1) for network storage and client applications. • Little downtime, but when it happened, people were, strangely, irritated. • At least 20 minutes of downtime for a failure (mounting volumes). More if things are really messed up. • Any downtime is a hit to our credibility no matter how good we are at our jobs

  4. The Solution • December, 2001: Convince managers to spend $$ on a Compaq SAN • Try to get delivery before end of calendar year, to install in January • Have equipment delayed until early January—window of opportunity closes for install • Have all spring to play around with configuration and testing • Actually a much better solution overall

  5. The Configuration • 3 NW6SP1 file servers: • Compaq DL360G2, 2x1.4GHzPIII, 2.25GB RAM • Compaq ML370G2, 2x1.266GHzPIII, 2.125 GB RAM • Compaq ML370, 1GHzPIII, 1.125GB RAM • All with Gigabit Ethernet (DL360G2 2 onboard), 2 Fiber Channel cards

  6. SAN Hardware • Compaq MA8000 controller (dual controllers, each dual ported, dual power supplies) • 6 disk shelves (dual power supplies) • 26 x 36 GB disks: • 5 arrays (3x6 disks, 2x3 disks) • 1 disk for spool volume • 1 hot spare • 2 Storageworks SAN Switch 8-ELs • In 2 separate buildings • Each controller attached to each switch • Multimode fiber connections • Compaq Modular Data Router (MDR) • Supports SCSI attached MSL5026SL dual SDLT tape library

  7. Configuration • All servers dual connected to each fiber switch • SAN array and controllers have dual power supplies—one side connected to local UPS, other connected to building UPS. • 2 servers in one building, one in the other • Redundant network core • Basically, everything keeps going if we lose a building (except for what’s connected directly to that building, or unless the disk array itself dies) • MDR and tape library in second building, away from the SAN array.

  8. SAN implementation • SAN brought online in February • First server added at that point • Second server and clustering added on Spring Break (second server in cluster was existing server.) • Third server added early June • Moved backups to new tape library unit with Backup Exec 9 in May, backing up old servers plus new cluster node.

  9. Migration Issues • Look at network in terms of services: • Course files • Home directories • Departmental/group directories • Academic and general applications • Network printing • Provide cluster services (volumes) for each

  10. Performing migration • Create new volumes • Update login scripts • Provide drive letters for each cluster volume • Abstraction • Ease of use • Added mappings to old login scripts to ease migration • Educate users to use drive letters or new Causeway volume names

  11. The big night • Tell people to stop messing with stuff • Use JRB Netcopy to get all the trustees, volume and directory quotas, etc right. • Wait • Decide to go home at 3am so you can get at least a few hours sleep before you have to pack for vacation the next day • Users log in the following morning and have everything mapped to the new services, with no loss of service.

  12. Gotchas • New login scripts use home directory attribute of user object—some not set (old accounts) • Migration to NDPS • Had legacy queues serviced by NDPS, but had to move queue volumes to new virtual server volume • This is not really supported, but it seems to work • Some files didn’t copy • Residual volume weirdness

  13. More gotchas • Nomenclature • \\servername\volumename? \\tree\.volume.organization? Directory maps? Win2K/XP or 9x? Argh! • Drive letters, while so ’80s, were our only practical solution to the consistent nomenclature problems across 9x and XP • Other clients? Have fun! • CIFS and failover

  14. Current status • All users now using the cluster • Old servers still up, but data volumes renamed to OLD_volumename • Can still get files if necessary • Some users still running apps off of old application volumes (UNC path issues) • Search and destroy

  15. Backup configuration • Using Veritas Backup Exec 9 with Additional Drive option and SAN Shared Storage option. • Using 4 once-a-week tapes for each primary volume, half a dozen daily differential tapes, plus smaller numbers of more frequently rotating tapes for SYS volumes, NDS, server-specific information and Linux and NT/2000 application servers.

  16. Backup Limitations • Only one node in cluster is acting as a media server • Cost was a factor. We would have had to buy another server license, plus options per media server. • Having had the multi-server edition of BE 8.5 with two years of upgrade protection, we received three remote Netware and three remote NT/2000 agent licenses, enough for our current needs • Largest data volumes are usually attached to media server • Backups and cluster virtual servers • Few (if any) support virtual servers and can find a volume that has failed over • Edit sys:nsn/user/smsrun.bas and change nlmArray line, replacing “TSA600” with “TSA600 /cluster=off” to access volumes as standard server-attached volumes, as per TID 10065605 • Setting cluster volume pools to migrate back when the original server becomes available again helps prevent backup problems, when it works.

  17. Cluster enabled LDAP • 2 virtual IP address resources • LDAP-1 and LDAP-2 • DNS round robins “ldap.drew.edu” and also has “ldap-1.drew.edu and “ldap-2.drew.edu” • Clients configured to use ldap.drew.edu • LDAP will bind to all addresses on server, so NLM doesn’t need to be reloaded • Client timeouts hide most cluster failovers

  18. What’s in a name? • Drew is acronym crazy • Wanted an easy to remember name to brand the project, but didn’t stand for anything or really mean anything • Causeway implies things in a sort of abstract way without actually meaning anything • People can refer to “Causeway” and it means something, but nothing too specific, which is actually good in this case

  19. Cluster-enabled? • What does it mean? • Can I cluster enable: • iFolder • NetStorage • Any product Novell sells • Service availability criteria

  20. Discussion • Problems, issues, concerns? • Other cluster sites? Issues? • NW 5.1 versus 6?

More Related