Channel Archiver: Problems & Solutions for Data Management

Channel Archiver Stats & Problems Kay Kasemir, Greg Lawson, Jeff Patton Presented by Xiaosong Geng (ORNL/SNS) March 2008

Channel Archiver • 1997 design goal: Write 10,000 samples/sec • Raw 'write' test on today's computers: >50,000 samples/sec • Used by several EPICS sites worldwide

SNS Stats • Need only ~2,000 samples/secon average. • Configuration for ~80,000 channels split into ~70 sub-archives • … so that maintenance or problems on one sub-archive won't affect others • Means users have to retrieve from correct sub-archive • CSS DataBrowser makes that a bit easier (next talk) • Disk space: ~170 GB/month • Keep running into disk space limitations

Data Management Limitations • Difficult and time consuming • Moving data around requires manual index updates (takes hours, could be days) • Few Informational Tools • Nothing prevents duplication • Which channels contribute the most to data growth? • Storage only supports "Append new samples" • Removal of selected channels impossible • Removal of older data limited to complete 'sub archives' • No practical way to use Java or Matlab code to replace original samples with reduced sample count, • .. or to insert computed data like daily statistics into "archive"

Question • Continue to invest in Channel Archiver's data management tools? • End up implementing a relational database? • Why not use existing RDB?

JLab: MySQL Transition (Chris Slominski) • Very promising performance tests! • Limitations by design • Stores every update from IOC. No 'sampling'. • 'Double' stored as 'float' to save space. • Only small arrays. • Metadata: Units. No limits, precision. No status/severity. • MySQL Issues • Table size is limited • Need one table per channel • Table count is limited • Custom code implements 'clustering' • SQL "DELETE" doesn't free disk space, or is very slow (delete ONE week data --> take TWO weeks)

SNS Oracle Tests • Basic JDBC test code:up to 8,000 inserts/second via network • Tricks • "Batching" ~500 inserts • "Partitioning" spreads one big "sample" table over disk partitions • Currently one partition each day, automatically added. Details are still being developed. • Expensive, but looks like the way to go • Avoid MySQL workarounds • SNS committed to Oracle anyway,but will need partitioning license

Great! But what about SLAC? • Reported promising performance results for Oracle-based data storage • Lee Ann Yasukawa, Robert Hall:"Archiving Into Oracle", ICALEPCS2001 • End of 2004: No more. • What do we need to learn from that?

Basic Sample Table Design • What data types to support? • Time stamp detail, enumerated values, arrays, meta data? • One table per channel (JLab)? • One table per data type (SLAC)? • SNS:One table for all scalar samples • Possibly wasting space, but best touse SQL across various channelsof different types • Array elements in additional table

Archive Engine Prototype • Developed in Java • Eclipse/CSS command-line app • Reads existing engine config files • "Scanned" and "monitored" sampling as before • "Disabling"/"enabling" of groups of channels • Scalars, arrays, enumerated samples • Web-server interface for status which is similar to the original engine • Writes into Oracle • Write performance OK for scalar tests • Reaches the ~8000 samples/sec from original tests

Engine's Web Server

Archive Engine Prototype… • JNI vs. CAJ • Only JNI dependably handles many Channel Access connections (>10,000) • JCA/JNI Performance improved on Mar. 11, use “yield” call not “sleep” 10 msec, approximately factor of 30 improvement (from 45 seconds down to 1.5 seconds for 1460 non-blocking connection requests on SNS physics server) • Can run multiple engines, but RDB enforces that they handle different channels • Arrays are slow • One N-element array comparable to N scalars • Only "Double" type array elements • Need to implement meta data handling (units, limits, …) • Plan is to support MySQL, but "you get what you pay for" • No custom code to handle table size/count limitations • MySQL "Timestamp" only handles seconds, no nanoseconds

Summary • Testing Oracle setups and prototype sampling engine • Oracle space currently only ~30GB • Data retrieval via CSS DataBrowser • RDB is just another data source • Compared to current SNS ChannelArchiver setup, performance expected to be • A little less • … but sustainable in the long run.

Channel Archiver: Problems & Solutions for Data Management