140 likes | 265 Vues
D0 Taking Stock. By Anil Kumar CD/CSS/DSG July 10, 2006. D0 offline Production/Integration Infrastructure. 8 900MHz CPU 16G RAM The machine has a Clariion 4500 hardware raid array. Oracle Server 10gR2 (64 bit) on Solaris 2.9 (64 bit) Load Avg 4-6 CPU usage ~77% , Memory Free : 50%
E N D
D0 Taking Stock By Anil Kumar CD/CSS/DSG July 10, 2006 D0 Taking Stock
D0 offline Production/Integration Infrastructure • 8 900MHz CPU 16G RAM • The machine has a Clariion 4500 hardware raid array. • Oracle Server 10gR2 (64 bit) on Solaris 2.9 (64 bit) • Load Avg 4-6 CPU usage ~77% , Memory Free : 50% • Uptime excluding scheduled down times 99.97716% Uptime (based on 120 min of total db unavailability) since June, 2005 vs 99.85269 D0 Taking Stock
D0 offline production Luminosity • Sun v40z with 2 AMD 844MHz CPU • RHEL3 x86_64 • 2 Ultra 160 RAID controllers • 16G RAM • Oracle server 10gR1 D0 Taking Stock
D0 offline development Infrastructure D0ora1 • Sun E4500 • 8 400 MHz CPU, 4GB of RAM • Oracle 10gR2 64 bit on Solaris 2.9 • Load Avg 1-2 , CPU usage ~33%, Mem Free 19% D0lum1 • Sun v40z with 2 844MHz AMD CPU • RHEL3 x86_64 • 16G RAM • Oracle server 10gR2 D0 Taking Stock
Space Usage D0 Taking Stock
Space Usage Summary • d0ofprd1 1285 GB used. • d0ofint1 103 GB used. 2.25TB is available for use for int and production. • d0ofdev1 120 GB Used 11GB is available for use. • d0oflumd 285GB used doflumi 482MB used 150GB is available for d0oflumd and d0oflumi • D0oflump 363GB used 411 GB is available. D0 Taking Stock
Capacity Planning • Next three years expected growth 1.1 TB SAM growth 375Gb/year and other apps 15Gb/year. This exclude Luminosity DB We have around 2.2TB available. • Luminosity growth is 125Gb/year. D0 Taking Stock
Accomplishments • Upgraded D0 offline databases to 10gR2. Also OS upgrade for D0dbsrv nodes. Replacement of d0dbsrv5 node with new hardware and upgraded memory to 4GB vs 2GB • Export of Trigger Database. Retention Policy 30 days on disk and daily taken to Dcache. • Mini-trigger Simulator Set-up • Deployment of Lum Db in production 10gR2. • Quarterly Database Security/OS patches Up-to-date. • Upgrade OEM to 10g • Rewrite of dbatools/toolman for enhanced features of monitoring and 10g support. • Disk Capacity Upgrade on d0 offline production database. • Db Security Enhancements. Restricting access to Dictionary. Restricted Usage of Database Links. Password complexiety,locking the obsolete accounts and password complexity. • Deployment of SAM Request System Schema v6_0. Also deployed version v6_1. V6_3 in development. • Moving d0 offline to a standardized backup recovery method using a san and enstore. Parallel testing of san as backup media for development and production instances going well. D0 Taking Stock
Back-up/Recovery • D0ofprd1 - Daily, 7 days of archives, one backup always on DISK - Bi-monthly backup of READ ONLY tablespaces - Allocated 2TB Used 1.2TB, to tape Daily, RMAN Back-up time -> 6 Hrs ( 3 Hrs 45 Excl READ ONLY + 2 Hrs 20 READ ONLY ) No Export -Tape Rotation : 1 Week for Daily backups and 2 months for Read Only backups. - Backups taken to dcache 2x/week, Read-Only taken 2x/month. Archives taken every 30 min. • D0lump Daily backups to SAN. To dcache daily. Archives taken every 30 min. • D0ofint1 Once a week on Local disk • D0ofdev1 Sat. backup on local disk otherwise on SAN -Allocated 100GB, used 58GB, Daily Tape Backup RMAN Backup time -> 2 hrs. Tape Rotation : 2 Months. D0 Taking Stock
Production backups to SAN • Two 1TB SAN mount points in use on d0ora2 One in use on d0lum2 • daily backup to SAN • Always 1 backup on disk, plus X200 tape library backup of RMAN from local disk, and dcache copy • Read-only portion of database backed up twice/month to SAN D0 Taking Stock
SAN issues • Current SAN is not 24 x 7 support • IDE disks are not as reliable as other, more expensive disks are. However, these seems to be reliable. We do rman backup validate for backup files on SAN. Also recentely recovery was done after restore from SAN. • Current SAN is trouble free except when the path failed a couple of months ago, and because the san is not dual path, it prevented backups over the weekend, as this is not 24/7 supported and we had to wait till monday to get support. • Purchasing 24 x 7 SAN requires licensing and changes to O/S to be able to use it • Details for Future of SAN at RunII databases will be covered in Ray P/Steve K. ‘s presentation. D0 Taking Stock
SAM Schema • Production Deployments : Storage Location v6_1. SAM Request Sub System v6_0 • Work-in-progress - v6_3 Retiring Files. • Upgrade to Mini SAM as SAM Schema Evolved. -> This facilitate individual developers to have copy of SAM metadata and seed data available for server software rewrite if needed. • Mini-SAM in Postgres. Initiative to move towards free ware Databases for SAM Proof of product not complete, requires testing with a dbserver from the sam development team • 2.38B events in 47 Partitions. Now Avg 1 partition/ 3 running weeks Partitions Rollover dates URL : http://www-css.fnal.gov/dsg/internal/databs_appl/sam_event_partitions.html D0 Taking Stock
What’s Next ? • Deploy san/enstore backup recovery plan. • Replacement of Aging Clariion Array. • May be new d0dbsrv nodes. At least Primary Nodes. Luminosity DB server is 2 times performant with C++ caching Server, but causes intermittent crash of other Calib servers. May be dedicated nodes for Luminosity Servers. • Cut New event Partitions for SAM • ASO ( Advanced Security Option) Deployment. • Upgrade Designer and its repository to 10g • Bundling of Redhat renewal licenses into one P.O. • Testing of postgres mini sam for proof of product. D0 Taking Stock
Concerns • Python Dcoracle to be built with Oracle 10g Client. Oracle recommends client version should be same as Database Version. Any Oracle Patch may break Pyhton Dcoracle built with 8i client. • Backups will get bigger . So backup of VLDB • SAM Servers on Linux ? Security Audits may mandate dedicated node for SAM servers and web servers. • Not Enough Space for Integration db to do full refresh of SAM. • Single point of failures with D0 offline database. • Future of the aging clarion array must be addressed in next budget. • Hardware for D0 DB server machines is very old. Should consider upgrading the hardware for d0 db servers. • Post the Performance Graphs gone in 10gR2 monitoring tool. D0 Taking Stock