1 / 29

Scientific Data Management at Diamond Light Source

Scientific Data Management at Diamond Light Source. Nick Rees with slides from Alun Ashton (Group Leader – Data Analysis Software) and Dave Corney (Head of Systems Division, STFC Science & Computing Division). Beamline Development. 7 Phase 1 beamlines 15 Phase 2 beamlines

betsy
Télécharger la présentation

Scientific Data Management at Diamond Light Source

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific Data Management at Diamond Light Source Nick Rees with slides from Alun Ashton (Group Leader – Data Analysis Software) and Dave Corney (Head of Systems Division, STFC Science & Computing Division)

  2. Beamline Development • 7 Phase 1 beamlines • 15 Phase 2 beamlines • 10 Phase 3 beamlines

  3. Beamline standard DLS computing • Standard acquisition structure (EPICS -> GDA) • Linux environment – RH6 • Central support groups with specialist skills • Central computer resources • Standard data and user structure • User -> Proposal -> Visit • All data is collected into a ‘visit’ • All data collected is read only • Each user uniquely identifies themselves. • Standard archiving

  4. Diamond Raw Data Rates

  5. A facility site scientific data management process.

  6. Unique set of user credentials

  7. Authentication

  8. Today at Diamond • Each user has a unique login ID (FedID) issued at time of beamtime allocation and lasting for at least 2years for future beamtime. • FedID shared between all STFC and Diamond facilities • Including Windows, Linux and many Web page logins • Passwords are difficult to remember and some say the FedID is even harder to remember. • Wouldn't it be nice if I could use my University ID or not have to log in at all?

  9. Many emerging education and facility Single Sign On initiatives • eduroam • World-wide RADIUS infrastructure for education + research orgs • Provides physical network access for participants within the network • Umbrella • Web-based authentication system based on Shibboleth single-sign-in technology • Rolling out to several European synchrotrons (PSI, ILL, ESRF, ISIS, DLS) • Moonshot • JANET (UK NREN) initiative for federating network logins • Intends combine simplicity of RADIUS with flexibility of Shibboleth • Internet standard within next 2 years, pilot phase now running

  10. Authorisation

  11. Visits and proposals • Proposal: <experiment type><number> e.g. si325. • Visit number is Proposal-<visit> e.g. si325-1 • Some proposal visits are > 100 • Experiment type is the ‘usual’ kind of experiments on the beamline/village grouping.

  12. Permissions and location of the data • Data being collected on a beamline always be stored in: • /dls/”beamline”/data/”year”/”visit-number” • The directory is automatically generated before beamtime • Only the PI(s) of the PROPOSAL and the users on the VISIT have permissions to SEE the data stored in the visit area • This ensures data integrity and privacy. • Access to visit on Diamond disks ~180 days • Access can be restored on request to PBS • One sub-directory remains writable but is not archived. • Archived data can also be retrieved from Web portal.

  13. Local computing and files

  14. Computing resources 10Gbit/s 1 Gbit/s Detector Computer Beamline Switch 2x10 Gbit/s Central Switch Central computing • 40x 8 core nodes • 40 x 12 core nodes • 56 GPU cards • 16 new GPU cards Central file systems 940TB and 427TB usable ~15 GB/s amd 5GB/s read and write

  15. Transferring Data • Users encouraged to copy data during experiment onto USB drives or equivalent. • Data Dispenser • Dedicated machine on beamline with sockets • Web interface to facilitate process. • Users can ‘push’ data off Diamond disks to their hosts. • rsync • Early attempts on irods and pushing to other user hosted frameworks.

  16. File Formats I02 I03 I04-1 I24 I04 B23 I05 Diamond has a policy of, where feasible, to standardise on file formats, the choice being NeXus/HDF5 B22 I06 I22 B21 I07 Green: predominantly using NeXus. Orange: Mixed NeXus and other formats or considering NeXus in the next 12 months. I20 I09 I19 B18 I10 I18 I11 B16 I12 I16 I15 I13 Files can be generated by Detector, EPICS or Data Acquisition

  17. NeXushttp:///www.nexusformat.org

  18. Diamond Sample Management

  19. Diamond Sample Management Sample type not sample instance Sample Instance, locations, shipping, compositions, structures, plans....

  20. A facility site scientific data management process.

  21. Diamond Data Archive • Provided by STFC Scientific Computing Division • We have a copy of > 99% of all Diamond data and data processed at Diamond on tape, 80% of which available from a web site https://icat.diamond.ac.uk. • Data copying triggered by data acquisition. • Copied from visit directories with a few exclusion rules. • Potential to publish data with a DOI soon (initially for diamond staff data).

  22. STFC Data Storage and Management Technologies • Data management and archive systems: • CASTOR (HSM, ORACLE, Scalable, Tier-1) • DMF (HSM, NFS access via LAN) • Atlas Data Store (Dark Archive) • Tape Libraries: • 2 x SD8500 10,000 slot robots • 64 tape drives • 9940B (0.2 TB) • T10KA (0.5 TB, ~ £200K/PB) • T10KB (1 TB, ~ £100K/PB) • T10KC (5TB, ~ £30K/PB) • Total of ~ 7PB tape storage • Potential capacity 100PB “User ” Technologies Storage-D Storage Resource Broker I-RODS Storage Resource Manager ICAT Top CAT

  23. A Common set of Components • Independent components • Which can interface to each other. • Can be adopted and adapted into different situations, and different services • ICAT, for example, used by many facilities across Europe.

  24. Secure access to user’s data • Flexible data searching • Scalable and extensible architecture • Integration with analysis tools • Access to high-performance resources • Linking to other scientific outputs • Data policy aware Central Facility Example ISIS Proposal H2-(zeolite) vibrational frequencies vs polarising potential of cations B-lactoglobulin protein interfacial structure GEM – High intensity, high resolution neutron diffractometer Proposals Once awarded beamtime at ISIS, an entry will be created in ICAT that describes your proposed experiment. Experiment Data collected from your experiment will be indexed by ICAT (with additional experimental conditions) and made available to your experimental team Publication Using ICAT you will also be able to associate publications to your experiment and even reference data from your publications. Analysed Data You will have the capability to upload any desired analysed data and associate it with your experiments.

  25. Accessing the archive Mounted on /dls/archive/$Beamline/$year/$visit at diamond and now (testing) other central computing facilities or

  26. Offline Analysis at Diamond • Remote desktop service for reprocessing data at Diamond. • Using off the shelf NX solution for remote desktop www.nomachine.com • 4 x 8 CPU clustered server.

  27. Summary of scientific data management at DLS • A common platform (using industry standard software) for data acquisition and scientific software developments • This facilitates exchange of ideas and implementations across beamlines and sciences and enables collaborations with other facilities • Exploiting new ways of handling and recording data to maximise the software and hardware capabilities • Dependant on reliable and productive beamlines and a solid computing and network infrastructure.

More Related