1 / 25

LCG – Databases - Meeting

LCG – Databases - Meeting. 25 March 2008. Presences: Miguel Anjo, John Shade, Paolo Tedesco, Phool Chand, David Collados, Judit Novak, James Casey, Steve Traylen. Outline. Main issue during the power cut .

maia
Télécharger la présentation

LCG – Databases - Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LCG – Databases - Meeting 25March 2008

  2. Presences: Miguel Anjo, John Shade, Paolo Tedesco, Phool Chand, David Collados, Judit Novak, James Casey, Steve Traylen Presentation title - 2

  3. Outline Presentation title - 3

  4. Main issue during the power cut • Ethernet network switches in RAC6 were not connected to the critical power (wrong connection of the power bar) • The public and cluster interconnect networks went down Presentation title - 4

  5. Impact of the power cut on the DB Services

  6. Service Changes • Announce and schedule interventions • Have a main contact that keeps plan and progress, contact all parts and announces restart of all services • Move from dbms_job to dbms_Scheduler EGEE_PPS_SAM rmTDLOneWeekOld; LCG_FTS_PROD begin fts_history.movedata; end; LCG_FTS_PROD begin fts_servicestate.runjob; end; LCG_FTS_PROD_T2 begin fts_history.movedata; end; LCG_SAM_PPS p_testdef_autodel; LCG_SAM_PPS rmTDLOneWeekOld; http://oracle-documentation.web.cern.ch/oracle-documentation/10gr2doc/server.102/b14231/jobtosched.htm • End synonyms • Use Schema.TABLE_NAME • Select from LCG_GRIDVIEW.SITES (from lcg_gridview_r or lcg_same_wor …) • Only need to grant the privileges • Check usage outside CERN (Miguel Anjo) LCG meeting - 6

  7. Developer tasks Presentation title - 7

  8. Points to improve • ServiceMapaccount • Need reader/writer • LCG_Gridmap service • User using LCG_SAM service • Cleanup/partitioning of SAM (sam meeting?) • GridviewMerge/partitioning (gridview meeting?) • Weekly report checkup LCG meeting - 8

  9. AOB • Lemon alarm for DB availability • Create lemon metric for DB services • Next meeting • 29th July? LCG meeting - 9

  10. Why and what was done • Space pressure on LCGR storage arrays • Of ~2750GB, only 175GB are available • Not possible to shrink datafiles • 650GB space not used in datafiles • Solution: move segments Presentation title - 10

  11. Why and what was done • Overview • Backup system will appreciate <200GB datafiles • Datafile is smallest unit to backup, not possible to parallelize neither resume Presentation title - 11

  12. Why and what was done • Partitioned tables • LCG_SAME.TESTDATA (April 2007) • Monthly partitions up to “2008”, indexed, clob • Data since July 2007 • Work to do to (data move during CPUJan08 not finished - see later) • LCG_SAME.TESTDATA_HISTORY (March 2008) • half-yearly partitions/tablespaces, no indexes • Data between July 2006 and July 2007 • Created during CPUJan2008 • LCG_GRIDVIEW.JOBSTATUSRAW (March 2008) • Monthly partition up to Dec/2010, indexed • Created during CPUJan2008 Presentation title - 12

  13. Why and what was done • LCG_GRIDVIEW_DATA01 space waste • Not possible to shrink datafiles. • Solution: move data to different datafile • ALTER TABLE MOVE + ALTER INDEX REBUILD • Copy table, constraints online • Copy indexes online • Made some cursors invalid (need to restart app) • Done for tables <1GB (Thursday 6.March) • DBMS_REDEFINITION does online • Copy table, indexes, constraints, keep synchronized • Rename tables, copy privileges • Done successfully for 7 tables (Monday 10.March) • Failed for table VO (but reported successful) Presentation title - 13

  14. Why and what was done • Why it failed (table VO)? • Service request open to Oracle • Table VO is heavily used (several users, synonyms, views, procedures) • Oracle failed to get a lock but did not report error • “ORA-4020 Deadlock when trying to lock xxx” reported for other tables when moving • Similar problem for table SITES and NODES • Currently difficult to create/drop tables referencing those tables • (tables in bad state?  Service Request) Presentation title - 14

  15. Missing operations • LCG_GRIDVIEW • Recreate tables SITES, NODES, VO • Move 10 tables off DATA01 (120GB) • Possible “exp/imp” or “table move + index rebuild” • 8 hours?? • LCG_SAME • Move partitions 2H2007 to correct tablespace • Split 2008 partitions • Create partitions up to Dec2010 • >1 day, “transparent” Presentation title - 15

  16. Outline Presentation title - 16

  17. Current situation Presentation title - 17

  18. What can be done • Partitioning • Some maintenance work • No space gain • Aggregates • After aggregation, delete row data • Space gain and performance boost • History table (no indexes, compressed) • Little space gain • Heavy maintenance work Presentation title - 18

  19. Expected growth • Start monitoring of space growth per table • What are expectations? • How much aggregate data will be kept? • What about aggregation of aggregates? • LCG_SAM? • LCG_GRIDVIEW? • LCG_FTS? Presentation title - 19

  20. Outline Presentation title - 20

  21. Transparent interventions • Huge database (for SAM, GridView) • Impossible to perform full scale tests • Some operations ‘with risk’ for long periods • How to schedule? • Possible to do with downtime? (less risk) • Notification flow? Presentation title - 21

  22. Applications resilience to interventions • Resilient: adj. • Marked by the ability to recover readily, as from misfortune; • Capable of returning to an original shape or position, as after having been compressed. Presentation title - 22

  23. Outline Presentation title - 23

  24. Next meeting • Main developers of LCG • Weekly report, interventions planned, SQL optimization, share solutions • Schedule: Monday after the 15th at 14:00 • Next meeting 21st April – 14:00 Presentation title - 24

  25. Presentation title - 25

More Related