1 / 20

OptiPortal Configuration Considerations

Ashley Wright High Performance Computing and Research Support (QUT). OptiPortal Configuration Considerations. Our OptiPortal. Our Optiportal. 6x Dell Precision T3500 Intel Xeon E5520 (2.27GHz) 4GB RAM nVidia FX 1800 Onboard 1Gb/s network PCIe 1Gb/s network card (supports Jumbo Frames)

jamil
Télécharger la présentation

OptiPortal Configuration Considerations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ashley Wright High Performance Computing and Research Support (QUT) OptiPortal Configuration Considerations

  2. Our OptiPortal

  3. Our Optiportal • 6x Dell Precision T3500 • Intel Xeon E5520 (2.27GHz) • 4GB RAM • nVidia FX 1800 • Onboard 1Gb/s network • PCIe 1Gb/s network card (supports Jumbo Frames) • 300GB HDD • 22x Dell 24” Monitors (4x5 configuration)

  4. Considerations • Wish to be able to keep the cluster in a known state. • To be able to recover quickly when something goes wrong. • Need to be able to install applications fast. • Compile code on the OptiPortal. • Fast. • Easy to use.

  5. ROCKS with Viz Roll • Fairly easy to install. • Used initially to test OptiPortal and software which can run on a Vis Wall. • Software was out of date • (CentOS 5 vs Fedora 12) • Difficult to customise. • Difficult to install our own software.

  6. Similarities to HPC clusters. • Lots of applications. • Each node of the cluster is identical. • Need performance. • Need to minimise downtime.

  7. HPC Cluster • Network boot and install. • Shared file system across nodes. • Nodes are generally identical. • Multiple networks for different uses(ie management vs MPI)

  8. Installing nodes • Network boot and auto install scripts, make reinstalling easy. • Fedora 11 & 12 used. • Cobbler (https://fedorahosted.org/cobbler/) • HTTP/PXE/TFTP • DHCP/DNS • Yum mirror • Also customisation of the install process.

  9. Installing nodes - cobbler • #install nvidia driver • pushd /root/ • wget http://$http_server/files/NVIDIA-Linux-x86_64-190.53-pkg2.run -O /root/NVIDIA-Linux-x86_64-190.53-pkg2.run • chmod +x /root/NVIDIA-Linux-x86_64-190.53-pkg2.run • wget http://$http_server/files/nvidia-install.sh -O /etc/init.d/nvidia-install.sh • chmod +x /etc/init.d/nvidia-install.sh • chkconfig --add nvidia-install.sh • chkconfig nvidia-install.sh on

  10. File Server • Hosts non-volatile, shared home directories (/home), • software directories (/pkg), • and fedora mirror. • Built with an old Dell 2900 Server: • 6x1.5TB HDD (RAID 0+1). • 4x 1Gb/s aggregate network. • 250MB/s throughput.

  11. Keeping nodes in 'sync' • When you change something on one node you want it the same on the other nodes. • Having a shared home and application directory makes this easy. • Puppet to manage files in /etc (http://www.puppetlabs.com/) • Automated configuration management. • Makes sure files and services are in a known state. If they are not puppet fixes them. • Updates every 30mins (default).

  12. Nodes in 'sync' - Puppet • class sshd { • file { "/etc/ssh/sshd_config": • owner => root, • group => root, • mode => 600, • ensure => present, • source => "puppet:///files/ssh/sshd_config" • } • exec { "/etc/init.d/sshd reload": • subscribe => File["/etc/ssh/sshd_config"], • refreshonly => true, • } • service { "sshd": • status => "/etc/init.d/sshd status", • ensure => running, • } • }

  13. Network • One network for management (dns/dhcp). • Onboard network, can network boot. • One network for Internet. • PCIe network card, can jumbo frame. • Internet network outside QUT firewall.

  14. Performance • Aim to render 10-25 frames per sec. • 9600x4800 pixels = 175MB/frame. • Bottlenecks everywhere, mostly I/O (bus, disk and network). • 1x PCIe (Gen 2) = 500MB/s • 1Gb/s network = 120MB/s • 1.5TB hard disk = 150MB/s (maximum)

  15. Performance - Disk • First file server. • Open Solaris + ZFS • RAID5z (across 6 disks) • ZFS makes all reads random seeks • <100 MB/s read performance • Single 1Gb/s network.

  16. Performance - Disk • Second Server • Fedora 12. • SW RAID 0 (3 pairs) across HW RAID 1 (2 disks). • Reads mostly sequential. • 250 MB/s read performance. • 4x 1Gb/s network.

  17. Performance - Compression • Compressing data files reduces disk I/O. • CPU time to decompress negligible. • Better use of I/O cache. • Decompress straight to memory. • Can get you over the line. (2x-5x improvement)

  18. Issues • SSH and puppet security keys change on rebuild. • Upgrading major OS versions still a lot of work. • More RAM in file server (IO Cache). • 1 Gb/s is not enough (at times). • Need to remember to add changes to build scripts.

  19. Issues - Multiple Networks • Some software does not like multiple networks. • Looks up hostname and will only use that IP address. • Should be able to overwrite in a config file.

  20. Questions?

More Related