1 / 14

The DPM Testing Framework

The DPM Testing Framework. Gilbert Grosdidier LAL-Orsay/IN2P3/CNRS & LCG. Required Resources. Build machine (managed by LP) Used for both MySQL and Oracle flavors Requires special shareable libs, often temporary ones lcg-build-sl3 Two main test clusters (mostly managed by GG)

yuma
Télécharger la présentation

The DPM Testing Framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The DPM Testing Framework Gilbert Grosdidier LAL-Orsay/IN2P3/CNRS & LCG DPM Testing Suites @ CERN - AllHands - GG

  2. Required Resources • Build machine (managed by LP) • Used for both MySQL and Oracle flavors • Requires special shareable libs, often temporary ones • lcg-build-sl3 • Two main test clusters (mostly managed by GG) • Installed through Yaim + manual upgrades • MySQL flavor : 1 pool, 4 filesystems in total on 3 nodes • Master node : lxb1727 • Disk servers : lxb1903 + 04 • Oracle flavor : 1 pool, 2 filesystems in total on 2 nodes • Master node : lxb1902 • Disk server : lxb1901 • Additional production installation • MySQL flavor : 1 pool, 3 filesystems, all servers and pools on same node • FIO managed, through Quattor + GG for DPM stuff, through Yaim • lxdpm01 DPM Testing Suites @ CERN - AllHands - GG

  3. Miscellaneous Resources (2) • Rather specific setup : • All servers are allowed to core dump (requires special tricks for GridFTP) • Tests are also using MySQL LFC:lxb1941 (CTB) • And eventually Oracle LFC:lxb1782 (CTB ?) • The central CTB BDII is not used any more • But each master node is required to run an up-to-date Information System • Of course, a fully installed UI is also required to run the tests • And also the GSIPWD of the operator ! DPM Testing Suites @ CERN - AllHands - GG

  4. Building Process • For each major update provided by JPB • RPMs are rebuild for both DB flavors • But there is an additional fasttrack build tree available • To allow for rebuilding each MySQL flavor server separately • And also rebuilding each of the test areas • Then the RPMs are reinstalled • For MySQL only most of the time (lxb1727) • The pool nodes (disk servers) are reinstalled much less often • Only RFIOD and GRIDFTP are involved in this case • When fixes are provided for a single server, only this one is rebuilt • This provides a very short life cycle • Except for the DPM enabled GRIDFTP server which is a pain … • Servers can be reinstalled/restarted several times an hour • The current build system is the ‘good ol’one’ (no gLite, no Etics) • It is not a provocation: I actually tried to move to each of the above, but either it was a dead end, or it was far toooooooo long. • Got no time to investigate why it was like that ;-( last try was August DPM Testing Suites @ CERN - AllHands - GG

  5. Brief overview of current DPM engine • The test install is made of 7 servers currently • The Name Server (DPNS) • The DPM itself (the main stuff) • The SRMv1 server • old, frozen, simplistic, but used by current lcg-XXX and GFAL) • The SRMv2.1 server • more sophisticated, but already deprecated • will be soon replaced by the next one • The SRMv2.2 server • the state of the art, but not yet in production • (the pseudo-standard is not even frozen yet) • The RFIOD server (with GSI authentication) • The GRIDFTP server (idem) • The latter 2 are replicated on each disk server • An additional SLAPD server is used for the Information System • There is roughly one test suite for each of these servers • Excluding the DPNS (and SLAPD) DPM Testing Suites @ CERN - AllHands - GG

  6. Testing Suites Contents • The structure of each test suite is more or less identical • Each method to be tested is getting a companion module in C • A Perl driver then merges these C modules into various combinations • It is easier to add new use cases and to build a simili-job • Inside of a given suite, a command failure ought to break the suite • Because next commands in the flow need to reuse the results/objects created upstream • Most of the test suites are plugged into yet another Perl module • globalSuite • They are almost independant from each other • They are individually allowed to fail in which case control goes to the next one • The main suite is callable with only 2 arguments from the command line • globalSuite node-name proxy-type • The result is a simple score displayed by the suite • Score must be 31 for a standalone DPM, 34 if there are additional pool servers • ADMIN command issue (DPM socket) • They are now required to run on the server node itself (not from a UI) • Meaning they are not systematically tested inside of these suites • Ex: dpm-modifyfs, dpm-drain DPM Testing Suites @ CERN - AllHands - GG

  7. Testing Suites timing • The globalSuite timing split: • Operation: rfioSuite with DPM = [OK] Duration: 28 sec. • Operation: rfioSuite with NODPM = [OK] Duration: 18 sec. • Operation: gsiftpSuite with DPM = [OK] Duration: 81 sec. • Operation: gsiftpSuite with NODPM = [OK] Duration: 64 sec. • Operation: gfal_test = [OK] Duration: 77 sec. • Operation: srmv1Suite with RFIO = [OK] Duration: 56 sec. • Operation: srmv1Suite with GSIFTP = [OK] Duration: 61 sec. • Operation: srmv2Suite = [OK] Duration: 265 sec. • Operation: socketSuite = [OK] Duration: 115 sec. • The overall suite is lasting about 14 min. • The SRMv2.2 suite is now requiring about 400 sec. • It is NOT included in the previous one • It will supersede the current srmv2Suite in the above globalSuite rather soon DPM Testing Suites @ CERN - AllHands - GG

  8. The gory details • The SRMv2.2 suite • Is performing 240 operations (more to come) in about 400 sec. • 36 different C modules implement 39 methods (no more coming) • + 3 miscellaneous methods (rfcp, GUC, diff) • All available methods implemented and tested • The SRMv2 suite (elder brother of the above one) • Is performing 160 operations in about 250 sec. • 26 different C modules implement 28 methods • + 6 miscellaneous methods (rfcp, GUC, diff, dpm-xx, dpns-xx) • The SRMv1 suite (Jiri Kosina) • Is implemented in one single C module merging 9 methods • The socket suite • Is performing 60 operations in about 120 sec. • 9 different C modules are used in the suite and implement 10 methods • + 2 miscellaneous methods (rfcp, GUC) • 13 additional modules implement 13 more methods, but are not tested regularly • The relevant functionalities are however tested thru the SRM frontends above DPM Testing Suites @ CERN - AllHands - GG

  9. More details about globalSuite • It also includes • RFIO suites for standard and DPM-like transfers • Standard transfer is: • rfcp stand.flxb1706S1 lxb1727.cern.ch:/tmp/grodid/fil135732S1 • DPM-like transfer is: • rfcp some.lxb1706 /dpm/cern.ch/home/dteam/ggtglobrfil135732 • GridFTP suites for standard and DPM-like transfers • A GFAL suite merging in one C module 6 different methods • Two lcg-util commands (lcg-cr and lcg-gt) • The log file is rather extensive • For each command (C module call), it displays • A short help about the module call • The full command line actually used, with every argument • The output of the command and a timestamp • The status and duration of the command • This allows for digging into the server log files to spot the origin of the failure, when required :-) DPM Testing Suites @ CERN - AllHands - GG

  10. What is not covered in these tests ? • The DPNS is not tested per se, only indirectly • The LFC is not covered either, only briefly through indirect commands • The LCG-UTIL package is not tested per se • only for a few commands which involve the DPM back-end • mostly to check that the DPM is smoothly integrated into the Info System • However the GFAL package is tested extensively • In its current version, connected with SRMv1 back-end • The relevant test module is recompiled in place during the tests • Recompilation is part of the test DPM Testing Suites @ CERN - AllHands - GG

  11. Where to find the source code ? • Everything is available into official CVS • http://glite.cvs.cern.ch:8180/cgi-bin/glite.cgi/LCG-DM/ • merged within the LCG-DM package • Heavy dependencies with other DPM stuff • Useful directories are: • LCG-DM/socket/dpmcli • LCG-DM/test/dpm, LCG-DM/test/srmv1, LCG-DM/test/srmv2 • The tests are not packaged in any RPM • The last commit includes all material required for testing up to DPM-1.5.10 • The lastest released version • Nothing about SRMv2.2 was committed yet • Should come along with DPM-1.6.x DPM Testing Suites @ CERN - AllHands - GG

  12. How to build the test stuff ? • Here are the commands required to setup the DPM testing area : • log on lcg-build-sl3 (ask LP about it if you're not allowed yet) • cd to an AFS public area of yours • point your CVS env to the new lcgware area (glite.cvs.cern.ch:/cvs/glite) • - cvs checkout -r LCG-DM_R_1_5_10 LCG-DM • - cd LCG-DM • - setenv LIBRARY_PATH /opt/lcg/lib • - setenv LD_LIBRARY_PATH /opt/globus/lib • - make -f Makefile.ini Makefiles • - make clobber • - make • - cd socket • - make • - cd ../test • - make • The main suite perl script is: LCG-DM/test/dpm/globalSuite • Should run out of the shelf :-) • Ex: globalSuite node-name [globus|voms|vomsR] DPM Testing Suites @ CERN - AllHands - GG

  13. Stress testing • For the socket, srmv1 & srmv2 suites, an upper layer was built to allow for stress testing a specific server type at a time • It launches in one shot several tens (from 10 up to 40-50) of generic suites of the selected type, to make them run in parallel • BTW, it also stresses the UI, not only the target servers :-) • This type of test was very useful to spot weaknesses within inter-server communications • It is not advisable to submit more than 50 suites at the same time • The TCP stack on the target node will be “overflowed” • One can submit these stress tests from several UIs at a time • but one has to follow the 50 job limit • In addition, a single UI seems sufficient to feed up the target node • It is often required to restart the DPM servers after such bombarding • Debugging server logs after such a storm is rather painful … ;-) • This is not required for functional testing • The relevant Perl drivers are: socketStress, srmv1Stress & srmv2Stress DPM Testing Suites @ CERN - AllHands - GG

  14. Are these tests tested ? • They have been running towards • My own install nodes, + lxdpm01 • Most of the CTB DPM nodes, at various times (lxb1921 currently) • Very useful to spot misconfiguration issues • The LAL-Orsay DPM site (GRIF) which was in addition a multi-domain installation • It is usually not a problem to target a remote DPM, after the firewall issues have been cleared • QUESTIONS ? DPM Testing Suites @ CERN - AllHands - GG

More Related