PVSS Oracle archiving

PVSS Oracle archiving • Involved people in the latest developments: • Ronald Putz (ETM) • Chris Lambert (IT-DES) • Eric Grancher (IT-DES) • Lothar Flatz (Oracle consultant) • Luca Canali (IT-PSS) • Manuel Gonzalez (IT-CO) • Nilo Segura (IT-DES) • Piotr Golonka (IT-CO) • Laura Fernandez Nocelo (IT-CO) • Svetozár Kapusta (PH-AIT) BOLOGNA ALICE Week Svetozár Kapusta

History • 2004 • ETM (company developing PVSS) promises to its customers to be able to archive to different flavors of database systems • 2005 • First prototype comes to CERN with many functionality issues, mainly loosing data and low client performance (100 changes/s) • Q4 2005 • Oracle consultant (paid by CERN) comes to ETM to boost client performance • Q1 2006 • IT-CO puts enormous effort to test the new RDB Mgr (Relational DataBase Manager) • insertion rate of 1500 changes/s for a few minutes • Stability issues, memory leak, slow Event Mgr – Data Mgr communication, server shows low performance • Huge client memory consumption (up to 1.5GB RAM) BOLOGNA ALICE Week Svetozár Kapusta

History 2 • April 2006 • Tuning session at CERN with Oracle consultant, 23 clients, 2 node RAC (Real Application Cluster) Oracle server • Many enhancements, new issues discovered • Archive groups not working – partitioning on server instead • ~24 000 changes/s inserted continuously • Bottleneck: Network • May 2006 • Tuning session at CERN with fixed RDB Mgr (also Linux version available), 56 clients, 2 (later 4) node RAC server • lower memory consumption (300MB RAM) • ~56 000 (later 100k) changes/s inserted continuously • Bottleneck: Server BOLOGNA ALICE Week Svetozár Kapusta

History 3 • End of May 2006 • Tuning session at CERN with ETM employee, 150 clients, 6 node RAC server • Schema enhancements, new issues discovered and fixed • ~150 000 changes/s inserted continuously • Last Week & weekend • Tuning session at CERN, 170 clients, 6 node RAC server • ~70 000 changes/s inserted continuously over the weekend • Right Now • The tests with 170 clients still continue until end of the month to tune queries BOLOGNA ALICE Week Svetozár Kapusta

Test Configuration • Client side • mixture of Windows • ALICE DCS lab (~20 PC) bldg 12 • IT-CO lab (~4 PC) bldg 13 • and Linux clients • LHCb (44 PC) Point 8 • LxBatch (120 PC) Computing Center • PVSS 3.1, cumulative patch • patches 167, 203 (available from IT-CO web page) • framework 2.3.6 • data generated with simulator drivers configured to change the values of DPEs (DataPoint Elements) at the required rate. • Server side • Oracle RAC server with 1, 2, 4 and 6 nodes (PCs) respectively • 3 GHZ Dual Processor Xeon, 2GB RAM, Oracle Real Application Cluster 10.2.0.2 on Linux Red Hat Enterprise • Fiber channel RAID10, 32 SATA disks (20 write, 32 read) BOLOGNA ALICE Week Svetozár Kapusta

Performance Improvements BOLOGNA ALICE Week Svetozár Kapusta

Disable job to analyze tables and indexes • PVSS server job running every 3 hours • gathers statistics on objects in the DB to improve performance • degrades performance considerably (one order of magnitude) if ran at inappropriate times • this is a typical task for the DB administrator BOLOGNA ALICE Week Svetozár Kapusta

Disable indexes on TS and element_ID • Indexes are used to boost queries which filter on the timestamp and/or element_ID (unique number for each DP element) • Slowed the server considerably • Consumed a lot of disk space (2/3 of data volume was in the indexes (+extra data in the redo logs)) • Useless for ALICE • Got disabled, let only the primary key (composite index on columns TS and element_ID) • Once the data are inserted, if required the disabled indexes could be enabled • More efficient in CPU and storage usage BOLOGNA ALICE Week Svetozár Kapusta

Use direct path for data insertion instead of merge • Used to move data from the client temporary tables to the history table • Has a big impact on performance • Changes completely the current approach • Previously: check is done before inserting a new value to see if it is a correction value or a new value • Currently: insert everything in the history table and treat the errors afterwards • More efficient since new values will occur much more frequently in comparison to correction values • This way of treating errors can only be used since Oracle 10.2. BOLOGNA ALICE Week Svetozár Kapusta

Index Organized Tables • Data and index data are kept together • Fits the design (many inserts, few deletes) • Transparent for client • Not as performant as INSERT /*APPEND/ BOLOGNA ALICE Week Svetozár Kapusta

INSERT /*APPEND/ • Oracle hint • Don’t search for an empty place, just append it at the end of the table • Boosts performance in high insert rate environments (more than IOT) • But only 1 client can write into 1 table in time BOLOGNA ALICE Week Svetozár Kapusta

Partition of the history table per client • Comes as an implication of the previous point • The history table has to be partitioned • Each client accesses only one partition and doesn't interfere with other clients • Due to the way insert append works • To allow for scalability in a RAC server • Special feature that doesn't come in the standard Oracle version BOLOGNA ALICE Week Svetozár Kapusta

Initial size and extend of tablespaces • It is important to preallocate enough space • Space allocations and extensions were happening very often and penalizing performance • We used the PVSS data driven mechanism to tune the parameters for allocations and extensions BOLOGNA ALICE Week Svetozár Kapusta

Requests & Open Issues • Password has to be introduced in config file in plain text • It is just a workaround to be able to easily start the RDB for the first time • one should rely on the internal DP • Negative system ids • system numbers bigger than 127 appear negative in the database • Problem fixed (new version of patch 203). • Archive groups • We spent a lot of time trying to understand what the clients do when groups are used • We have not been able to make the Archive groups work • Remains an open issue BOLOGNA ALICE Week Svetozár Kapusta

Requests & Open Issues 2 • Queries separated per DPE • One SQL statement, let the DB do the processing • Don’t issue several separated SQL statements with processing in client • ETM needs to modify the dpGetPeriod function (also used by the IT-CO trending tool) • use dpQuery in your code, so PVSS will not split the SQL queries • Memory allocation at OCCI level • OCCI 10.1 has a bug that makes the RDB Mgr crash if the first block sent over OCCI is big • Current workaround: The RDB Mgr starts with a small block size, increased afterwards • Solutions • Put a logic in the RDB limiting the size of the first block sent through OCCI • Use OCCI 10.2 • ETM claims it has a bug when running queries • requires newer version of compilers that the ones ETM is using) BOLOGNA ALICE Week Svetozár Kapusta

Requests & Open Issues 3 • Number of blocks retained in overload/DB not reachable condition • A parameter limits the number of blocks in memory of the RDB Mgr when it is not able to send all data • However the RDB Mgr buffers blocks without any limit • Dangerous since it could make the PVSS project crash • Happens because the thread that writes to the DB is the same that manages the blocks in memory • Solution would be to do the management of the blocks in the same thread that receives data from the DM and writes data to the blocks BOLOGNA ALICE Week Svetozár Kapusta

Requests & Open Issues 4 • Very slow start up of clients • RDB Mgr does several queries at start up: • Query to get maximum Time Stamp in the history table • Query to get all elements for events • Query to get all elements for alerts • The bigger the history table, the longer it takes run an initial query • The more clients were connected the slower the start of each client was • Last 2 queries a client is iterating over all the elements but keeps only his ones (this filtering is much better done by the server and was fixed) • First query still an issue • Used to decide which values stored in the local lastval archive have to be sent to the DB • ETM claims it is faster than to send all the values and see if there are duplicates • Query the whole history table =>more than 1 client, it will not be coherent (client should ask for the latest timestamp of a value inserted by him and not any client) • History tables will be big => more efficient to insert all the local lastval values and the correct duplicates afterwards • Not clear how this will work in 3.5 (local lastval archive has been disabled) BOLOGNA ALICE Week Svetozár Kapusta

Requests & Open Issues 5 • Slow/Deadlock when starting clients after reconfiguration • concurrent start of the clients was very slow and in some cases lead to a deadlock • solution is that each client takes a lock on all the tables it requires (this serializes the operations and will not produce any deadlock situation) • Site configuration in RAC servers • There seems to be a problem when having a RAC as DB server. It is not foreseen to easily reflect the RAC configuration in the internal table arc_site • This is an issue because there is a check (SQL join) between the information in arc_site and a system view (v$instance) in the code • Algorithm to calculate element_id limited to 255 systems • The current algorithm to calculates the element_id in the DB from the dpe_id, dp_id and system_id, is limited to 255 systems • PVSS 3.5 will not be limited to 255 systems in a distributed system • Querying data from a remote system requires local connection to DB • When querying data from a remote system, the local system needs to be connected to the DB even though the query is executed by the remote system BOLOGNA ALICE Week Svetozár Kapusta

Requests & Open Issues 6 • Error monitoring & handling • monitor the number of blocks in the RDB with an internal DPE • know which periods of time were stored to file instead of the DB • timeout in the threads that interact with the DB. If for any reason they get blocked, their operation should finish gracefully BOLOGNA ALICE Week Svetozár Kapusta

Future ETM improvements • PVSS Set up script for Oracle server • Will be run from a panel in PVSS • First version available • Future release will contain advanced configuration parameters for CERN that set up the database to • Disable the job gathering statistics • Disable useless indexes • Use direct path for data insertion instead of merge • Use INSERT /*APPEND/ • Partitioning (it is not a standard feature and most of the ETM customers will not have it) • Default parameters for allocation and extension of tablespaces BOLOGNA ALICE Week Svetozár Kapusta

Future ETM improvements 2 • Queries in distributed systems • Query on data from a remote system goes via the remote system • Degrades performance of the remote system • Some PVSS systems will be mostly writing to the DB, others will be mostly querying (load will be on the systems that write data to the DB) • If remote systems are not available it is not possible to query their data, even if all the information is in the DB • We would like to have the following behavior • Queries go always through the local system, even if they require data from a remote system • Remote system is running, use its DP identification to query the DB • Remote system is not running, use the information stored in the DB to query the data. (Given a DP element name the query to get the element id for the DB is very straight forward and more efficient that manipulating the dpe id, dp id and system id to calculate the element id BOLOGNA ALICE Week Svetozár Kapusta

Future ETM improvements 3 • Buffer to local disk when not possible to send data to DB • If the buffers in the RDB exceed a given size either because of an overload or because the DB is not reachable, it writes to files in the local disk • separate file for each block that could not be handled • format of the files is ASCII • File contains SQL statements that are required to insert data into Oracle • Scheduled to be released in August. • CERN Comments on the concept: • The insertion from the files is done value by value. This is very bad for performance, as we already saw in the early versions of the RDB manager. • Bind variables are not used. This is dangerous because it could delete the parsed queries in the Oracle cache. • It was suggested to explore other possibilities: • Use Oracle SQL loader to import files with the values. This is the fastest way to insert the data. Error handling is a bit more complex because one has to wait for some report files to be generated. • Use the same method as for the normal insertion: write to the client temporary table and let the PL/SQL code in the DB do the job. This may require some changes in the PL/SQL code because right now it can only insert in the current history table, not in previous ones BOLOGNA ALICE Week Svetozár Kapusta

CERN tests results • Inserts of 170 clients each at 1000 changes/s • DB was not able to handle fully the load • Clients were inserting the data at the same time because of their simultaneous start • Starting clients randomly has increased performance • Long term stability: 170 clients each inserting at 500 changes/s during 14 hours • Currently the impact of queries is tested BOLOGNA ALICE Week Svetozár Kapusta

Other News And Reminders • SPD are configuring their FERO using the Oracle DB sitting in the DCS lab • At CERN • Use CERN DB service for PVSS configuration DB • Use CERN DB service for FERO configuration DB or ask ACC team • Ask ACC team or IT for an account • At home institute • install your own (Oracle versions to use) BOLOGNA ALICE Week Svetozár Kapusta

Oracle Versions to use • The latest! • Server 10.2.0.2: • Standard and Enterprise editions (advanced users) • http://www.oracle.com/technology/software/products/database/oracle10g/index.html • Express edition 10.2.0.1 • http://www.oracle.com/technology/software/products/database/xe/index.html • Any other development PCs: • Oracle instant client 10.2.0.2 • http://www.oracle.com/technology/software/tech/oci/instantclient/index.html BOLOGNA ALICE Week Svetozár Kapusta

SPDVersion MCMVersion SPDVersion Number(10)PK DACVersion MCMVersion Number(9) PK Side Char(1) PK DACVersion Number(9) PK DACVect Raw(440) Sector Number(1) PK ACO Blob Halfstave Number(1) PK APIVect Raw(6) MCMVer FK DPIVect Raw(8) DACVer FK GOLVect Raw(4) MBRVer FK Current organization of SPD FERO configuration data MBRVersion MBRVersion Number(9) PK MBRVect Varchar2(2460) BOLOGNA ALICE Week Svetozár Kapusta

Conclusions • Huge improvements in performance of the PVSS Oracle archiving accomplished • RDB Manager is getting mature • Application can and will scale • No “showstopper” identified • Full functionality and tests expected this summer BOLOGNA ALICE Week Svetozár Kapusta

Thank you for your attention BOLOGNA ALICE Week Svetozár Kapusta

Appendix BOLOGNA ALICE Week Svetozár Kapusta

Organization of the Configuration Database • The ALICE Running mode (e.g. lead-lead, cosmics, p-p, etc.) is defined by the ECS • The mode is sent to the DCS at every change • A valid configuration must exist for every running mode • The same configuration might be re-used for several modes • FERO Configuration: lookup tables define the relationship between the ALICE Running mode and the corresponding configuration version for each sub-detector BOLOGNA ALICE Week Svetozár Kapusta

Hierarchical organization of DCS FERO configuration data A view created from the individual detector versions Actual configuration modes (Fixed number of rows, only updates) ALIVersion TypeOfRun PK Table Name SPDMode SPDVersion PK SPDVersion PK Column Name TypeOfRun PK SPDversion FK SDDVersion PK SDDVersion PK SPD Data SSDVersion PK SSDVersion PK SDD Data Oracle Datatype number(4,0) number(7,0) … SSD Data ppRunID xxxxx … … CosmicsRunID xxxxx … PbPbRunID xxxxx TRDVersion PK TRDVersion PK … xxxxx TRD Data … xxxxx … xxxxx … xxxxx SPDRemark SPDversion FK Remark number(7,0) varchar2(1000) xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx BOLOGNA ALICE Week Svetozár Kapusta

Organization of the Configuration Database • Separate schemas (users and their tables) are created for each detector • Data organization is hierarchical and follows the intrinsic detector segmentation • The “atomic” data records represent a group of parameters, which should be loaded together (e.g. chip settings, sector settings) • A new configuration data for any of the detector components results in a new configuration set for the whole detector • Hierarchical organization allows for re-using the information already stored in the database BOLOGNA ALICE Week Svetozár Kapusta

Configuration DB hierarchical organization Configuration Version Group 1 Version Group 1. Version … Sub-group 1 Version Sub-group 1 Version Group n Version … DATA Sub-group n Version Sub-group n Version DATA Group n Version Sub-group 1 Version Sub-group 1 Version … DATA Sub-group n Version Sub-group n Version DATA BOLOGNA ALICE Week Svetozár Kapusta

New configuration data Configuration DB example: SPD Configuration Version Side A Version Side A Version Side C Version Sector 0 Version Sector 0 Version … Half-Stave 0 Version … Sector 9 Version Half-Stave 0 Version … DATA Half-Stave 5 Version Side B Version Sector 0 Version … Sector 9 Version Half-Stave 5 Version DATA BOLOGNA ALICE Week Svetozár Kapusta

There are two options for writing the data record type • Table containing individual parameters • Configuration data stored in a BLOB • It is the responsibility of detector groups to define the database schema and implement the client code in the FED server(s) • The DCS team will • Provide assistance (we can help you to create the schema and implement the client code) • Verify the schema (in collaboration with IT) • Install, operate and maintain the database BOLOGNA ALICE Week Svetozár Kapusta

General recommendations • Development of Oracle applications -General advices from IT • https://twiki.cern.ch/twiki/bin/view/PSSGroup/GeneralAdvices • Reference slideshow • https://twiki.cern.ch/twiki/bin/view/PSSGroup/ReferenceSlideshow • Database design • http://agenda.cern.ch/askArchive.php?base=agenda&categ=a044825&id=a044825s0t2/moreinfo BOLOGNA ALICE Week Svetozár Kapusta

Define keys, constraints and indices • Primary key: • Column or set of columns that uniquely identifies table rows • Every table should have a primary key defined • Primary key protects from entering duplicated rows to a table • Oracle implicitly creates a unique index on primary key columns • Foreign key: • Relation (e.g. master-detail) between 2 tables should be formalized by creating a foreign key constraint • Foreign key helps to keep data integrity and facilities cascade operations • Oracle DOES NOT create implicitly indices on foreign key columns so if related tables are supposed to be joined one needs to create such index by hand. BOLOGNA ALICE Week Svetozár Kapusta

Define keys, constraints and indices (2) • Primary and foreign keys syntax examples: • ALTER TABLE employees ADD CONSTRAINT employees_pk PRIMARY KEY (id); • ALTER TABLE departaments ADD CONSTRAINT employees_pk PRIMARY KEY (dept_id); • ALTER TABLE employees ADD CONSTRAINT departaments_fk FOREIGN KEY (dept_id) REFERENCES departaments (dept_id); BOLOGNA ALICE Week Svetozár Kapusta

Define keys, constraints and indices (3) • Unique keys • Column or set of columns other then primary key that uniquely identifies rows in the table • Oracle implicitly creates index on unique keys • In contrary to primary keys, unique keys allow NULL values • ALTER TABLE employees ADD CONSTRAINT employees_unq UNIQUE (lname); BOLOGNA ALICE Week Svetozár Kapusta

Define keys, constraints and indices (4) • Other constraints • Usually it is better to enforce data integrity constraints on database server side than on client/middle tier side: • Central management • Easier maintenance • Better performance • Allowed constraints: • NOT NULL • CHECK • ALTER TABLE employees ADD CONSTRAINT salary_chk CHECK (salary BETWEEN 1000 AND 50000); BOLOGNA ALICE Week Svetozár Kapusta

Define keys, constraints and indices (5) • Indices: • They are like indices in books -> they speed up lookups • BUT: insert into a table with 3 indices is ~10 times slower than into table without indices • It is important to find for each table a minimal set of indices • It is usualy better to not create indexed on small tables (< 50 rows) • CREATE INDEX emp_dept_id_idx ON employees (dept_id); BOLOGNA ALICE Week Svetozár Kapusta

Use Bind Variables • Place-holder variables in SQL statements that are substituted with concrete values before statement execution e.g: • SELECT "schedulerJobId" FROM CMS_DASHBOARD.JOB WHERE "jobId"= 2001; • SELECT "schedulerJobId" FROM CMS_DASHBOARD.JOB WHERE "jobId” = :job_id; • Bind variables help: • To reduce CPU consumption (less hard parses) • 5000 executions of query above take either 2.4 or 18.1 s depending on whether bind variable has been used or not. • To improve application scalability (less Oracle internal locking) • To decrease DBAs’ anger • To improve code security • Bind variables protect from SQL injections. • To make code more readable and easier to maintain • Concatenated strings with hard-coded values are usually less readable then strings with bind variables BOLOGNA ALICE Week Svetozár Kapusta

Connection management • Opening a connection to a database is very expensive from Oracle point of view: • New OS process has to be forked • Session context has to be created inside Oracle instance • User has to be authenticated and authorized • Client-server applications: • Database session should be created at the beginning of the client program execution and reused during its lifetime • EXCEPTION: if client is idle for a long period of time it is better to close the session and free server-side resources BOLOGNA ALICE Week Svetozár Kapusta

Recapitulation of General Recommendations • Use bind variables • Use primary keys • Define foreign keys (and add indices) • Add indices to columns which are referred to often in the WHERE clause of queries • Use constrains to enforce the data integrity • Use a normalized schema • Focus on representing the modularity of the detector when creating the tables BOLOGNA ALICE Week Svetozár Kapusta

Application of General Recommendations for FERO • Implement constrains on table columns • Use Primary keys PK on columns which contain the version number • Use Foreign keys FK on the columns which reference the primary keys. • Use “NOT NULL” keywords for columns in order to have for every version the corresponding data • Use constrains with “CHECK” in order to have the configuration data within reasonable values • Since no deletes and no updates will be run- • Consider using IOTs (Index Organized Tables) (CREATE TABLE… …ORGANIZATION INDEX…) to assure for future performance BOLOGNA ALICE Week Svetozár Kapusta

Once the schema is ready… • When populating the tables (e.g. from configuration files) you might use the /*+APPEND/ hint (example: insert /*+append / into mytable values (1, 255);) • You can let the version column increment itself by creating a sequence and a trigger • create or replace sequence myseq increment by 1 start with 1 nomaxvalue nocycle; • create or replace trigger mytrigger before insert on mytable for each row begin select myseq.nextval into :new.version from dual;end;/) BOLOGNA ALICE Week Svetozár Kapusta

Side C Side A y x z SPD BOLOGNA ALICE Week Svetozár Kapusta

Half-stave Sector Half-stave SPD BOLOGNA ALICE Week Svetozár Kapusta

SPD BOLOGNA ALICE Week Svetozár Kapusta

Hierarchical organization of DCS SPD FERO configuration data version 2 1 2 20 120 SIDAVersion SIDVersion PK SECA0Version SECA1Version SEC0Ver FK Halfstave 0 SECA2Version Halfstave 0 SEC1Ver FK Halfstave 1 … HASA90Version Halfstave 0 Halfstave 1 SEC2Ver FK Halfstave 2 SPDVersion … Halfstave 0 Halfstave 1 HASVersion PK Halfstave 2 … … SPDVersion PK SECA9Version Halfstave 0 Halfstave 1 Halfstave 2 … … … MCMVer FK SIDAVer FK SECVersion PK Halfstave 1 Halfstave 2 … … Halfstave 5 SEC9Ver FK DACVer FK SIDCVer FK HAS0Ver FK Halfstave 2 … … Halfstave 5 MASVer FK SIDCVersion HAS1Ver FK … … Halfstave 5 SIDVersion PK … … Halfstave 5 SEC0Ver FK … Halfstave 5 SEC1Ver FK HAS5Ver FK SEC2Ver FK … … SEC9Ver FK BOLOGNA ALICE Week Svetozár Kapusta

PVSS Oracle archiving

PVSS Oracle archiving

Presentation Transcript

Archiving 101

Oracle Data Archiving Taming the Beast

Archiving

Data Archiving

Archiving

Archiving

PVSS Oracle scalability

PVSS Alert Screen

Archiving

Scaling Up PVSS

CMS online database model and PVSS to ORACLE conditions writing

Web Archiving

ROOT-PVSS Connection

PVSS SOAP Interface

Scaling Up PVSS

PVSS State Machine

PVSS Navigator

Archiving

Oracle Solutions for Data Archiving