1 / 24

Distributed Computing for Crystallography

STFC. Distributed Computing for Crystallography. experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory. Expensive State of the art Good results Dedicated. Cheaper Can easily expand Dedicated. Cluster. Supercomputer.

tod
Télécharger la présentation

Distributed Computing for Crystallography

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STFC Distributed Computing for Crystallography • experiences and opportunities • Dr. Kenneth Shankland & Tom Griffin • ISIS Facility • CCLRC Rutherford Appleton Laboratory

  2. Expensive State of the art Good results Dedicated Cheaper Can easily expand Dedicated Cluster Supercomputer Cheaper Increase with time Can expand Not dedicated Many separate machines Easy to use Can easily expand May be dedicated Distributed Grid Background – Parallel Computing Options

  3. Spare cycles concept • Typical PC usage is about 10% • Usage minimal after 5pm • Most desktop PCs are really fast • Can we use (“steal?”) unused CPU cycles to solve computational problems?

  4. Suitable apps • CPU Intensive • Low to moderate memory use • Not too much file output • Coarse grained • Command line / batch driven • Licensing issues

  5. The United Devices GridMP System • Server hardware • Two, dual Xeon 2.8GHz servers RAID 10 • Software • Servers run RedHat Linux Advanced Server / DB2 • Unlimited Windows (and other) clients • Programming • Web Services interface – XML, SOAP • Accessed with C++ and Java • Management Console • Web browser based • Can manage services, jobs, devices etc • Large industrial user base • GSK, J&J, Novartis etc.

  6. Docking MyDockTest GOLD ligands proteins Ligandfit GOLD 2.0 molec 1 protein 1 Linux Windows molec m protein n gold20win.exe gold20_rh.exe GridMP Platform Object Model

  7. Adapting a program for GridMP • Fairly easy to write • Interface to grid via Web Services • So far used: C++, Java, Perl, C# (any .Net language) • Think about how to split your data • Wrap your executable • Write the application service • Pre and Post processing • Use the Grid

  8. Package your executable DLLs Standard data files Executable Environment variables } PROGRAM MODULE EXECUTABLE Compress? Encrypt? Uploaded to, and resident on, the server

  9. Create / run a job Proteins Molecules Pkg3 Pkg4 Pkg2 Pkg1 Client side https:// Datasets Create job, generate cross product Server side Workunits Start job

  10. Job execution

  11. Current status at ISIS • 218 registered devices • 321 total CPUs • Potential power ~300Gflops (cf HPCx @ 500Gflops)

  12. Application: Structures from powders • CT-DMF2: Solvated form of a polymorphic pharmaceutical from xtal screen • a=12.2870(7), b=8.3990(4), c=37.021(2), β= 92.7830(10) • V= 3816.0(4) • P21/c, Z’=2 (Nfragments=6) Asymmetric unit

  13. DASH • Optimise molecular models against diffraction data • Multi-solution simulated annealing • Execute a number of SA runs (say 25), pick the best one

  14. Grid adapt - straightforward • Run GUI DASH as normal up to SA run point, create .duff file • Submit SA runs to GRID from own PCc:\dash-submit famot.grd • uploading data to server… your job_id is 4300 • Retrieve and collate SA results from GRID to your own PCc:\dash-retrieve 4300 retrieving job data from server… results stored in famot.dash • View results as normal with DASH

  15. Example of speedup • Execute 80 SA runs on famotidine with #SA moves set to 4 million • Elapsed time 6hrs 40mins on 2.8GHz P4 • Elapsed time on grid 27 mins • Speedup factor = 15 with only 24PCs

  16. Think different… or Think big… Z’=4 72 non-H atoms / asu 13 torsions + disordered benzoate

  17. HMC structure solution < single molecular dynamics trajectory >

  18. Algorithm ‘sweet spot’ Calculations embody ca. 6 months of CPU time. On our current grid, runs would be completed in ca. 20 hours.

  19. MD Manager • Slightly different to previous ‘batch’ style jobs • More ‘interactive’

  20. Instrument simulation • Large run - McStas • Submit program breaks up –n##### • Uploads new command line + data + executable • Parameter scan, fixed neutron count • Send each run to a separate machine

  21. Full diffraction simulation for HRPD Calc Obs (full MC simulation) Diff 5537 hours = 230 days Elapsed time =2.5 days

  22. Problems / issues • Hardware – very few • Software – a few, but excellent support • Security concerns – encryption and tampering • System administrators are suspicious of us ! • End user obtrusiveness • Perceived • Real (memory grab with povray) • Unanticipated

  23. runs ok test computer pool all connected PCs side fx clinical trial patients { general population program interactions { drug interactions Programs in the wild

  24. Top tips • Get a ‘head honcho’ involved early on • Test, test and test again • Employ small test groups • of friendly users • Know your application • Don’t automatically • dismiss complaints about • obtrusiveness

More Related