1 / 29

Massively Parallel Database Dump and Reload

Massively Parallel Database Dump and Reload. By Frank Spaniak US Tsubaki 301 East Marquardt Drive Wheeling IL 60609 FrankSpaniak@ustsubaki.com. Introduction. UST -Senior Systems Administrator InterParking (GE) WWI – President - Software and Management Consulting

julio
Télécharger la présentation

Massively Parallel Database Dump and Reload

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Massively Parallel Database Dump and Reload By Frank Spaniak US Tsubaki 301 East Marquardt Drive Wheeling IL 60609 FrankSpaniak@ustsubaki.com

  2. Introduction UST -Senior Systems Administrator InterParking (GE) WWI – President - Software and Management Consulting Progress since 1984 - 3.2j Basic before it was called Visual basic (1980 yikes!) Some Current / Past Clients and Project involvement • General Parking (now InterParking) • Remanufacturing Division of Ford • National Safety Council • Americania De Finzas (Chain of banks in Mexico) • Board of Directors Artisan Corp

  3. Tsubakimoto Corporation (or 私の愛すべき働き場所) • Founded in 1917 • The most comprehensive product line in the industry • Power transmission products • Network of sales & service outlets in 70 countries spanning the globe • Consolidated net sales of over $900 million • Major affiliates in the US, Canada, Europe, Singapore, Taiwan and Australia • Vital production plants in Japan, United States, Canada, Taiwan, Singapore and Korea • Employs more than 4,500 people globally • Using QAD since 1991 • Originally a VMS shop until about 6 years ago.

  4. HP 9000 N-class (Earth) 4 -350mhz processors 6 gig memory  HP-UX 11.0 350-400 users of which 250 on a wide area network 29 Different databases including QAD and several Custom databases Running QAD 9.0 SP3 (V8) and EB2 sp4 (V9) The Largest QAD Database was 24 Gig (48 extents) 24x6 ½  Operation over 13 Locations and Plants in North America The Setting(or My Inheritance, before taxes)

  5. Unable to perform a lot of archive and deleting. Couldn’t Archive/Delete even a single day’s invoices without the process crashing, 72 hours per day’s work to delete (that’s is a whole another story). We had 9 years Payable records to wade thru. Nightly Corvu processing was starting at 3 am and finishing about 3pm the next day. Full MRP was 1 hour 20 min. Month end processing congestion was unbearable, it was taking 3 days to do. Delays on sub systems (AIM, shipping manifesting) it was 50-70 seconds, seriously hurting production. Freedom data collection download 8 hours over the WAN to complete. It was clear to all the users that processing times were becoming unworkable. What the lack of a Dump/Load caused(or why we kept working late)

  6. Several QAD databases were consolidated in late 2003 but no dump and reload. A dump/load was attempted about Dec. 2003 it had been 1.5 years since the prior dump and load, but it was unsuccessful. As of July 2006, it had been 3½ years since the last dump and load. Changes in production planning closed the window of work from long weekends to just 12 hours at the outside, 8 hours was the target. An now for some history(or Gameboy anyone?)

  7. Shipping backup of the database to another site, to perform the dump/load on because it was a faster box. 24 hours and a lot of $$. Building a box at the HP service center in Chicago and bringing a tape there to do the dump and reload. 12+ hours and security problems. These options all fell outside the 8-12 Hour window. What’s a DBA to do?(or is it Friday yet?) Several options were considered to improve conditions The Constraints that the IT department operated under • Small Time window for the actual dump and load process  < 8 hours; The shorter the better. • This process had to be provable before running on production. • Expenses were tight so we could not purchase software to do this.

  8. Since exclusive access was in short supply, we had to perform testing in pieces on a test database. Then add the times together. The time to dump/load the main production database took approximately 4 days, with exclusive access to the machine (binary dump and load). Consistent with previous attempts. No other system in the house at that time had the space / horsepower to do the dump and load. Initial Assessment(or did someone paint a target on me?) Several tests had been performed to gain metrics on the time needed for a dump and load, this was over the course of a several weeks

  9. Found that we had an overall database scatter factor of 5.5.  Many tables were much worse. Approximately 97 million Records totaling  24 Gig. We would try to do as much AR archiving and deleting as possible before the dump and reload. To bring the databases down for any extended period of time for all the plants and processes, was a significant coordinated effort; like herding cats. Initial Assessment(or did someone paint a target on me?)

  10. Over this process, We purchased a replacement for Earth HP RP 4440 (Samson) 8 -1-ghz processors 16 gig memory HP-UX 11.i 2-170 gig mirrored internal drives .5 TB external storage over gigabit fiber to our SAN (EVA).  Initial Assessment (continued)(or Yes they did) This went along way in being able to test and prevent many sins from happening.

  11. Summary of Systems HP 9000 N-class 4 -350 Mhz 6 Gig memory Earth San 2.1 Tb RP 4440 6 – 1 Ghz 16 Gig Memory Samson

  12. The Action Plan (or Who could we find to blame?) Using Samson as the testing Computer: • Performed a dump and reload with the new box, it was still taking over 8 hours for the main database. •  Perform the dump/load for the testing system and put the users on it for several weeks to verify the data, and then started working on speeding up the process. •   While the dump and reload is going on, prep the systems on earth by building the empty extents to restore the backups.         • Streamlined the dump and reload to about 7 hours thru automation and some parallel operation, automating several of the commands to dump and reload the db. Very little wiggle room for problems.

  13. The san drives were significantly faster that than the internal drives  i.e. 480 meg/sec Transfer speed vs. 1000/meg sec over to the SAN. Since the db was so fragmented the machine was sitting in i/o wait for the dump and the fiber line to the SAN was relatively quiet. The question became how can we make this run at top speed consistently? Discoveries made(or You gotta love those computer toys)

  14. A reasonably Current ksh or sh and the commands pwd wc Head Environment is setup in bin/set_progress.sh Sets DLC and PROPATH Variables. Sets RUN_DIR and DONE_DIR variables. And updates the PATH for Progress to the local bin directory Start up is via dumpctl.sh Which verifies that the environmental settings are what they are supposed to be and runs the application. A server must be running on the database to be dumped. Required UNIX commands and setup (or Say what?)

  15. Lay of the Land(or Playtime in UNIX) Dump Control DB Application /prod/dump/dump_control.p Script Directory - /prod/dump/bin set_progress.sh Sets Environment dmp_get_done.sh Looks for Processes done dmp_get_rec.sh Gets 1 (one) done file marker dmp_start_thread.sh Starts 1 (one) dump thread. Run Directory /tmp/run Holds run Markers Done Directory /tmp/done Holds done Markers Target DB Target Directory - /backups/dump

  16. bin/set_progress.sh : #!/usr/bin/ksh # # Establish Progress Location and environment used by app (fjs) # temp dir to store the run and done markers while running export RUN_DIR=/tmp/run export DONE_DIR=/tmp/done # Progress export DLC=/dlc/83b22 export PROPATH=$BASEDIR:$DLC # check dirs are valid [ ! -d ${RUN_DIR} ] && export XERROR="Run Marker $RUN_DIR directory is not valid“ [ ! -d ${DONE_DIR} ] && export XERROR="Done Marker directory $DONE_DIR is not valid " [ ! -d ${DLC} ] && export XERROR="$DLC directory is not valid " # add Progress and the local bin dir to path # BASEDIR is passed from startup script export PATH=$DLC:$DLC/bin:$BASEDIR/bin:$PATH

  17. bin/dmp_start_thread.sh : #!/usr/bin/ksh # # start dump thread (fjs) # # Parameters are # 1 db name to dump # 2 filename to dump # 3 dump directory . ./bin/set_progress.sh $DLC/bin/proutil ${1} -C dump ${2} ${3} > /dev/null 2>&1 # indicate file done mv ${RUN_DIR}/${2}.run ${DONE_DIR}/${2}.done

  18. Main Screen

  19. #1 Database Group And Tables

  20. #1 Adding a new group

  21. #1 Adding a new group (continued) Based on the database information entered, this will read the schema into the control Group If you select Yes, it will read in the table names and set the default dump location for the table

  22. #1 Adding a new group If yes then a list of all the database tables is displayed and you can select a table to override it’s dump location.

  23. #2 Doing the Database Dump When you select #2 Run Dump Procedure you are prompted with database group to run for. Once the group is selected a Recap screen is shown and proceeds to the data dump.

  24. #2 The Database Dump Running 40 Data Dumps Running at once! Be careful of bus bandwidth saturation ! YMMV

  25. #3 & 4 Generate Load and Index rebuild script • These have no display output and simply generate scripts • Load-script.sh for the database load • Index-rebuild.sh for the index rebuild phase • This is done for V8 Compatibility they do not have to be run is sequence; the load and index scripts can be generated before hand to put in automated procedures.

  26. Reduced the database dump time to 45 minutes With load and index rebuild at 1:35 total little over 4.5 hours end to end from 7. Overall percentage gain of about 40% Nightly Corvu processing finishing about 5am (2 hours) Full MRP was reduced to about 5 minutes per site Month end processing was finishing by mid afternoon the first day. Support sub-system response is 1-3 seconds. Freedom download taking 3 hours over the WAN. Eliminated 12 extents and still left 5 empty ones. The Result (or How I Got a Day off work )

  27. The Result - continued (or How I Got a Day off work … with pay) • With the reduced processing time we were further able to reduce the database size with archiving and deleting Leaving more time for:

  28. The Result (or What I did on my day off ) Golf And Beer

  29. The Result (or This space for rent ) Questions ?

More Related