Paul Scherrer Institut

Paul Scherrer Institut Timo Korhonen Improvements to Indexing Tool (Channel Archiver)‏ EPICS Meeting, BNL 2010

Channel Archiver at PSI • Currently four different archive servers are in use. • SLS Accelerator data: slsmcarch (machine archive server; HP, Xenon quadcore 2.66 GHz, 32 GB RAM)‏ • Long Term: since January 2001; 10314 channels; 70 GB • Medium Term: 6 months; 66883 channels; 120 GB • Short Term Archiver: 14 days; 70381 channels; 114G GB • Post Mortem Archiver: Stores the last famous words • Total available disc space for data: 500 GB • SLS Beamline data: slsblarch (beamline archive server; HP, AMD Opteron dualcore 1.8 GHz; 6 GB RAM)‏ • Long and short term archivers for every beamline (total 29 Engines)‏ • Short term archivers store data up to 12 months • Total amount of data: 163 GB / 384 GB

Channel Archiver at PSI • archive servers (cont) • PSI (office) data: gfaofarch‏ • Long Term Archiver: Stores data since January 2006 • Medium and Short Term Archivers • ZHE Cyclotron High Energy • Long (since April 2008) • Medium and short term • SwissFEL: felarch1 (HP, Quadcore 2.66 GHz, 10 G RAM) • Small teststand OBLA • 638 channels, 2.1 Terabytes! • Waveforms, images • FIN250 test injector • LT, MT and ST (.6, 7.9 and 464 GB)

Channel Archiver at PSI • The archive engines are running stable • The problems we have had are on the retrieval side • Indexing is used to speed up retrieval • Indexes on daily files • Master index on the whole archived data • We need the performance • The SwissFEL test machine is going to produce a lot of data • Waveforms, images • We need to archive more than in a production machine • For us, there is no need for (immediate) change • We would like to keep the channel archiver going • Updates, bugfixes • Retrieval tools • Waveform viewer, etc have been developed • Matlab export would be welcome • Indexing tools need work

Index Tool improvements • Background • The ArchiveIndexTool is used at PSI in the night between Saturday and Sunday each week to create master indexes for the midterm archive. • Indexing is essential for good retrieval performance • The tool produces many errors when run on the EPICS archive indices to produce or to update the master index. • Disclaimer: I know very little about this, I just tell what the people who work on this have reported. • Involved people: • Gaudenz Jud (archiver maintenance, operation and development)‏ • Hans-Christian Stadler (PSI IT, Scientific Computing) is investigating the issue together with Gaudenz

Index Tool improvements • Findings so far: • After investigating an error log: • From the code it is clear that the ArchiveEngine and the ArchiveIndexTool are not supposed to be used concurrently on the same indices. • Running them concurrently does produce errors – but not those we see in production. • the errors seem to only occur on the production machine, when there is a high load and a lot of disk activity. • try a quick fix: a retry mechanism on the highest level. All index files are closed and reopened after a delay. This quick fix seems to work so far.

Observations: • The RTree implementation does not allow concurrent read/write access. It might be possible to arrange the file operations in a way that allows concurrent access when the index is stored on a strictly POSIX compliant file system. • The RTree implementation has a RTree node "cache" that only grows. Nodes are never evicted from the cache. I'm implementing a new LRU node cache with a fixed number of entries to see if this reduces system load. • The RTree implementation uses many small disk operations (see example code above). A reimplementation should use large disk transfers. • The RTree implementation is like a B-Tree, but does not adjust the node size to the disk sector size for improved I/O performance.

Observations (continued): • The RTree implementation is not optimal for the use case seen at SLS, where data is inserted at the end only. This leads to a reduced fill level of the nodes. The RTree maintains the invariant, that only the root node may be filled less than 1/2. In addition to that data is moved between nodes too often, leading to many random accesses on disk. A reimplementation should feature a datastructure that is optimal for appends at the end.

Conclusions so far: • Finding out the real reason for the errors is a time consuming process. The real reason for the errors has not yet been identified. • the offsets zu Data structures in index get corrupted. However, it is not clear where. • Because the corruption only happens when the load on the production system is high, logical errors in the normal execution path can be almost certainly excluded. • The experience so far suggests that a new implementation of the RTree Code could solve a number of problems

Thank you for your attention!

Paul Scherrer Institut