1 / 6

Atlas Status Update

Atlas Status Update. Chris Fuson. Atlas Update - Timeline. March 06, 2014 Installed patches that targeted memory contention on the meta data server to address server side performance problems February 26, 2014

cybil
Télécharger la présentation

Atlas Status Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Atlas Status Update Chris Fuson

  2. Atlas Update - Timeline • March 06, 2014 • Installed patches that targeted memory contention on the meta data server to address server side performance problems • February 26, 2014 • Installed patch to reduce impact of close operations to address server side meta data performance problems • February 10, 2014 • Titan’s Lustre client rolled back to 1.8.6 to address client side performance problems • January 28, 2014 • Titan’s Lustre client upgraded to 2.4. Un-mounted Widow. • January 10, 2014 • As the user load from this transition increased, we began to see problems with both the Lustre server and client (compute node) performance • January 07, 2014 • Widow[1-3] became read-only • December 05, 2013 • Atlas was mounted on all OLCF systems, announced, and opened for use

  3. Atlas Update - Current • Following the March 06, 2014 change to reduce memory contention on the metadata server, we continue to see qualified improvements in the interaction with Atlas. • Improvements have been substantial for several applications that were negatively affected before. • We encourage users to continue testing their application performance in light of these changes and report their results. • We will continue to pursue the remaining issues, and will intentionally address them outside of the production environment as to minimize further interruption to the Atlas file systems. • Your feedback is incredibly valuable. Please continue to report problems related to the file system, including any specific timings for I/O operations, to help@olcf.ornl.gov.

  4. Atlas Update – Stripe Count Warning • Warning: Stripe Counts Greater than 160 Not Currently Supported • Warning: “-1” should NOT be used while setting up striping patterns • The 1.8 Lustre clients running on Titan do not support stripe counts greater than 160. Interaction from Titan (including ‘lfs getstripe’) with files that have a stripe greater than 160 is problematic. • If ‘lfs setstripe’ was used to set the stripe of a directory or file and the stripe count was set to a value greater than 160 or ‘-1′, you should reduce the stripe value. titan-ext3 1004> lfssetstripe -c -1 test.file titan-ext3 1005> lfsgetstripetest.file | grepstripe_count lmm_stripe_count: 1008 *** glibc detected *** lfs: munmap_chunk(): invalid pointer: 0x000000000067fed0 *** • Please note the stripe count is only an issue on Titan; the count is not an issue on Eos, Rhea, or the Data Transfer Nodes due to the more recent Lustre client version in use on those systems.

  5. Atlas Update – Reduce Stripe Count • Create new directory with reduced striping • Copy data into new directory • cpfor small data amounts • dcpfrom the Data Transfer Nodes for larger amounts of data dtn04 115> mkdirNewDir dtn04 116> lfssetstripe -c 128 NewDir dtn04 117> cptest.fileNewDir/. dtn04 118> lfsgetstripeNewDir/test.file | grepstripe_count lmm_stripe_count: 128 dtn04 119>

  6. Questions? • More information: • www.olcf.ornl.gov/kb_articles/atlas-update/ • www.olcf.ornl.gov/kb_articles/lustre-basics/ • Email: • help@olcf.ornl.gov

More Related