1 / 34

Performance Analysis with Real Case

IBM Korea Global Technology Services. Performance Analysis with Real Case. IBM GTS Infrastructure Support Services Kang, SeungRok. Index. * Case 1 : Script vs C binary * Case 2 : Real Memory and Paging Space * Case 3 : CPU usage & Java Performance * Case 4 : Disk I/O & Disk Wait Ratio.

astra
Télécharger la présentation

Performance Analysis with Real Case

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IBM Korea Global Technology Services Performance Analysis with Real Case IBM GTS Infrastructure Support Services Kang, SeungRok

  2. Index * Case 1 : Script vs C binary * Case 2 : Real Memory and Paging Space * Case 3 : CPU usage & Java Performance * Case 4 : Disk I/O & Disk Wait Ratio

  3. Performance Problem? From my personal experience (your mileage may vary) What are the most common performance problem causes? . 50% poor disk layout + mgmt - some disks 90%+ busy while 50% not used at all - do you have a clear document that maps the files to actual disks? . 10% poor setup of RDBMS tuning parameters relating to memory use . 10% single threaded batch applications (and we have been using SMP for 7 years!!) . 10% poorly written customer extensions to standard applications . 5% system running with errors in the errpt log file (including CPU failures!!) . 5% paging on large RAM (>2 GB) systems & vmtune not use to set min/maxperm . 5% AIX problems already discovered and fixed but AIX was not up to date. . 4% badly ported app = not compiled with optimisation or on old AIX versions . 1% genuine bugs in AIX * Nigel Griffiths

  4. performance management 1. Plan 2. Do 3. Check 4. Act D P C A ITIL ISO-9000 Service performance TIME Conclusion : Performance analysis is continuous. This Class show Performance issue is continuous.

  5. System administrator Application Developer DBA Network administrator Conclusion : Performance analysis is teamwork play. This Class show Performance issue is teamwork play. Without teamwork, only bad performance!

  6. Case 1 : Shell vs C binary Which one is better as performance view? Software is most important point of performance view

  7. Case 1 : Shell vs C binary top_sh.sh top_c.c #include<stdio.h> void main(int argc,char *argv[]) { FILE *fp; char Buffer[1024]; char Head[10]; char Data1[1014]; char Data2[1014]; char Data3[1014]; char Date[1024]; fp = fopen(argv[1],"rw"); while(fgets(Buffer,1024,fp)!=0) { strcpy(Head,strtok(Buffer,",")); strcpy(Data1,strtok(NULL,",")); strcpy(Data2,strtok(NULL,",")); strcpy(Data3,strtok(NULL,"")); if(strcmp(Head,"ZZZZ")==0) { strcpy(Date,Data2); } if(strcmp(Head,"TOP")==0) { printf("%s,%s,%s,%s,%s",Head,Date,Data1,Data2,Data3); } } } #/usr/bin/ksh export IFS=, cat $1 | while read HEAD DATA1 DATA2 DATA3 do # echo $HEAD if [[ $HEAD == "ZZZZ" ]] then ZZZDATE=$DATA2 fi if [[ $HEAD == "TOP" ]] then echo $HEAD","$ZZZDATE","$DATA1","$DATA2","$DATA3 fi done

  8. Case 1 : Shell vs C binary top_sh.sh top_c.c CPU Performance CPU Performance File Cache I/O Performance File Cache I/O Performance

  9. Case 1 : Shell vs C binary top_sh.sh top_c.c root@sj_open2:/srkang/case1> timex ./top_c dbserver.nmon > top_c_dbserver.out real 0.20 user 0.16 sys 0.02 root@sj_open2:/srkang/case1> root@sj_open2:/srkang/case1> timex ./top_sh.sh dbserver.nmon > top_sh_dbserver.out real 46.92 user 18.51 sys 44.01 root@sj_open2:/srkang/case1> PID TTY TIME CMD 520416 pts/0 0:00 ksh 544862 pts/0 0:00 \--timex 553036 pts/0 0:00 \--top_c_sleep PID TTY TIME CMD 520416 pts/0 0:00 ksh 544814 pts/0 0:00 \--timex 442610 pts/0 0:06 \--sh 479414 pts/0 0:01 \--cat

  10. Case 1 : Shell vs Shell * Some regular expression made bad performance – Fix exist #!/usr/bin/ksh ON_LIST="gpfs164vg gpfs163vg gpfs162vg gpfs161vg gpfs160vg \ gpfs158vg gpfs157vg gpfs156vg gpfs155vg gpfs154vg gpfs153vg gpfs152vg \ gpfs151vg gpfs150vg gpfs149vg gpfs148vg gpfs147vg gpfs146vg gpfs145vg \ gpfs144vg gpfs143vg gpfs142vg gpfs141vg gpfs140vg gpfs139vg gpfs138vg \ gpfs137vg gpfs136vg gpfs135vg gpfs134vg gpfs133vg gpfs132vg gpfs131vg \ gpfs365vg gpfs364vg gpfs363vg gpfs362vg gpfs361vg gpfs360vg gpfs359vg" LIST_OF_HDISKS_FOR_RG="vpath0,vpath8,vpath16,vpath24,vpath32,\ vpath40,vpath48,vpath56,vpath1,vpath9,vpath17,vpath25,vpath33,\ vpath41,vpath49,vpath57,vpath336,vpath337,vpath338,vpath339,vpath340,\ vpath341,vpath342,vpath343,vpath2,vpath10,vpath18,vpath26,vpath34,\ vpath45,vpath53,vpath61,vpath6,vpath14,vpath22,vpath30,vpath38,\ vpath46,vpath54,vpath62" LIST_OF_VOLUME_GROUPS_FOR_RG="dbgmelmvg,dbgmelmvg,dbgmelmvg,\ dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,\ dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,\ dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,\ dbgmelmvg,dbgmelmvg,dbgmelmvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,\ dbpegasusvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,\ dbpegasusvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,\ dbpegasusvg,dbpegasusvg" for disk in $(IFS=', ' set -- $LIST_OF_HDISKS_FOR_RG ; print $*) do print $LIST_OF_VOLUME_GROUPS_FOR_RG |\ IFS=', ' read vg LIST_OF_VOLUME_GROUPS_FOR_RG if [[ -n $vg && $ON_LIST = @(?(* )$vg?( *)) ]] #if [[ -n $vg && -n "$(print "$ON_LIST" | grep " $vg ")" ]] # If i use the above statement, the script works much faster. then continue else echo "would run make_disk_available $disk" fi done if [[ -n $vg && $ON_LIST = @(?(* )$vg?( *)) ]] ---------------------------------------------------- if [[ -n $vg && -n "$(print "$ON_LIST" | grep " $vg ")" ]] real 0m29.68s user 0m29.43s sys 0m0.25s ----------------------------------------------------- real 0m0.68s user 0m0.14s sys 0m0.51s

  11. Case 1 : Shell vs C binary * Some Customer Used Shell Application which is made for handling a Big SAM file from DB exports. That split the SAM file to several files and delivery those data to other system by FTP. Those job was running for several hours (5~6 hours) After they change those batch work to C application, Those job was running in just a hour. You can made shell script easily, But system can be goes worse performance!

  12. Case 2 : Real Memory & Paging Space As good as system has a big memory?

  13. Case 2 : Real Memory & Paging Space Performance with Paging Space In/Out root@sj_open2:/srkang/case2> timex ./top_c dbserver150.nmon > top_c_dbserver150.out real 11.11 user 3.30 sys 0.54 Performance without Paging Space In/Out root@sj_open2:/srkang/case2> timex ./top_c dbserver150.nmon > top_c_dbserver150.out real 3.85 user 3.29 sys 0.45

  14. Case 2 : Real Memory & Paging Space * Some Customer’s database was corrupted. So they had to restore their data from backup tapes. They had to complete recovery those system in short time. So they want to be focus those system to restore. With default setting, they guess that those restore time would be totally 5~6 hours. But they changed their maxpgahead, strict_maxperm then their restore time was just 2 hours. maxpgahead : page ahead minfree : paging started when free memory reach minfree maxfree : paging should stop maxfree = minfree + maxpgahead JFS2 : j2_maxPageReadAhead It is important to file cache of real memory like computation memory.

  15. Case 2 : Real Memory & Paging Space * One day, Some customer’s batch application that is related with Oracle database. They knows their Application ran slowly when time went by. It looked like Memory Performance issue. But Root cause of those symptom was Kernel Memory Leak. Kernel Memory was on only Real memory not paging space. So those application’s memory went to paging space. And it made those application slow. Other problem could be cause of specific application’s performance issue

  16. Case 2 : Real Memory & Paging Space * Some Customer has 2-tier system, which were database system and web server system. Some day, they had performance issue. It was resolved after they restarted their web server system. Their root cause looks like paging space problem in database system. But when they restarted web server, the most paging spaced process disappeared. Don’t look at only one system, Please look around related whole systems.

  17. Case 2 : Real Memory & Paging Space • maxperm 80% (3/31) • maxperm 30%(7/26) * Graph is similar but performance is not similar.

  18. Quiz 1 * Which point of bottleneck? root@sj_open2:/srkang> vmstat 1 System configuration: lcpu=4 mem=1840MB kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 0 229611 1024 0 0 7 12330 112280 0 18 109356 213304 13 33 55 0 0 11 236371 0 0 1 1025 6880 45005 0 205 36424 63339 3 10 48 39 0 22 238178 5 0 7 483 1792 3709 0 179 7873 16152 1 3 46 50 0 7 251143 807 0 34 4681 12883 52835 0 434 44194 95733 6 19 40 35 2 0 268925 939 0 49 2323 17801 29475 0 337 75478 143061 8 23 57 12 0 2 290688 985 0 15 2484 21802 41972 0 270 79692 173846 10 29 47 14 0 0 316751 1041 0 6 4 26125 101487 0 10 104434 209025 12 36 51 1 2 0 341875 558 0 12 3750 25153 60894 0 341 104390 201942 12 34 53 1 1 0 358023 0 0 12 5413 16235 22743 0 449 73764 140318 8 24 58 10

  19. Case 3 : CPU usage & Java Performance Data1 = new String(st.nextToken()); if(st.hasMoreTokens()) { Data2 = new String(st.nextToken()); while(st.hasMoreTokens()) { Data3 = new String(Data3 + "," + st.nextToken()); } } } if(Head.equals("ZZZZ")) { Date = Data2; } if(Head.equals("TOP")) { System.out.println(Head+","+Date+","+Data1+","+Data2+Data3); } } infile.close(); } catch (IOException e) { System.out.println("File Open Exception"); } } } top_java.java import java.*; import java.io.*; import java.util.*; import java.io.BufferedReader; class top_java { public static void main(String[] args) { String Head = new String(); String Data1 = new String(); String Data2 = new String(); String Data3 = new String(); String Date = new String() ; try { BufferedReader infile = new BufferedReader(new FileReader(args[0])); String str; while ((str = infile.readLine()) != null) { Data3 = new String(); StringTokenizer st = new StringTokenizer(str,","); Head = new String(st.nextToken()); if(st.hasMoreTokens()) {

  20. Case 3 : CPU usage & Java Performance root@sj_open2:/srkang> timex top_c dbserver.nmon > dbserver.nmon_c.out real 0.30 user 0.16 sys 0.02 root@sj_open2:/srkang> timex /usr/java14/jre/bin/java top_java dbserver.nmon > dbserver.nmon_java.out real 2.15 user 1.53 sys 0.56 -rw-r--r-- 1 root system 5125362 Apr 01 23:21 dbserver.nmon_c.out -rw-r--r-- 1 root system 5125362 Apr 01 23:26 dbserver.nmon_java.out

  21. Case 3 : CPU usage & Java Performance Java performance metrics Application Service request response times (cross-JVM), service request call counts, class-level and method-level response times, class and method call counts, object allocations and deallocations, and so on Application server Thread pool metrics, database connection pool metrics, JCA connection pool metrics, entity bean and stateful session bean cache metrics, stateless session bean and message-driven bean pool metrics, JMS server metrics, and transaction metrics JVM Memory usage and garbage collection metrics Operating system/platform CPU usage, physical memory usage, disk input/output metrics, and network connectivity metrics

  22. Case 3 : CPU usage & Java Performance * Unnecessary system GC made system performance bad. Option –Xdisableexplicitgc recommanded. Please use IBM JVM free tools. http://www-128.ibm.com/developerworks/java/jdk/diagnosis/141.html

  23. Case 3 : CPU usage & Java Performance System TEST environment * Some Customer Test WAS server with 3-rd party test program. 600 users test count those result.

  24. SPECjAppServer2004 Standard Case 3 : CPU usage & Java Performance * After Upgrade WAS Application Server, System Administrator thought that those system CPU usage will be half than before. * New system ‘s SPECjbb2000 value is 20% is higher than old one. Refer SPECjbb2000. http://www-03.ibm.com/systems/p/benchmarks/jba.html

  25. SPECjAppServer2004 Standard Case 3 : CPU usage & Java Performance * After Upgrade Java Application Server, System Administrator thought that those system CPU usage will be lower than before. But CPU usage goes higher value. What’s is going on? * Main Application was Developed with Java as Polling system. New system polling more faster than old one. So CPU usage goes higher value.

  26. SPECjAppServer2004 Standard Case 3 : CPU usage & Java Performance * This case is related with TPMC. Those System Hit 60% Maximum CPU usage rate. After Upgrade those system, those CPU Max rate is under 10%. (without WAIT value) * Wait value is not include in CPU Usage.

  27. SPECjAppServer2004 Standard Case 3 : CPU usage & Java Performance * High Availability can made better performance

  28. Quiz 2 * Which one is most important ? ② ① ④ ③

  29. Case 4 : Disk I/O & Disk Wait Rate * Some System’s wait% value is high. They upgrade their CPU to higher Hz one. After then, Those system CPU wait% is looks so high. But there is not any performance issue. But Administrator was concerned that point.

  30. Case 4 : Disk I/O & Disk Wait Rate * Sometimes, FTP is used as Network Performance Test ftp> ha Hash mark printing on (1024 bytes/hash mark). ftp> put 100M 200 PORT command successful. 150 Opening data connection for 100M. #################################################################################################### # 226 Transfer complete. 104857600 bytes sent in 10.99 seconds (9321 Kbytes/s) local: 100M remote: 100M ftp> put 100M 200 PORT command successful. 150 Opening data connection for 100M. #################################################################################################### # 226 Transfer complete. 104857600 bytes sent in 8.846 seconds (1.158e+04 Kbytes/s) local: 100M remote: 100M ftp> put 100M 200 PORT command successful. 150 Opening data connection for 100M. HIGH water mark for pending write I/Os per file [0] LOW water mark for pending write I/Os per file [0] HIGH water mark for pending write I/Os per file [30] LOW water mark for pending write I/Os per file [20] FTP test without I/O ftp> put "| dd if=/dev/zero bs=32k count=10000" /dev/null 327680000 bytes sent in 27.65 seconds (1.157e+04 Kbytes/s) local: | dd if=/dev/zero bs=32k count=10000 remote: /dev/null

  31. Case 4 : Disk I/O & Disk Wait Rate * NFS is usually used. But many users don’t verify NFS status. USE nfsstat root@sj_open2:/> nfsstat -m /ss from /ss:S80 Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,rsize=32768,wsize=32768,retrans=5 All: srtt=0 (0ms), dev=0 (0ms), cur=0 (0ms) root@sj_open2:/ss> timex dd if=/dev/zero of=./100M bs=1024k count=100 100+0 records in. 100+0 records out. real 9.84 user 0.00 sys 0.36 root@sj_open2:/> mount -o vers=2 S80:/ss /ss root@sj_open2:/> nfsstat -m /ss from /ss:S80 Flags: vers=2,proto=tcp,auth=unix,hard,intr,dynamic,rsize=8192,wsize=8192,retrans=5 All: srtt=0 (0ms), dev=0 (0ms), cur=0 (0ms) root@sj_open2:/> cd /ss root@sj_open2:/ss> timex dd if=/dev/zero of=./100M bs=1024k count=100 100+0 records in. 100+0 records out. real 97.64 user 0.00 sys 0.40

  32. Server #1 Server #2 Case 4 : Disk I/O & Disk Wait Rate * On Server #1, Disk Busy rate was 100%. But their I/O ratio was very low. That time, On Server #2, Their Full Backup Process was running. Those two systems access same disks through a cache in DISK subsystem. That made disk lock on server #1 Please look around total environment. Symptom can be only one side result.

  33. Case 4 : Disk I/O & Disk Wait Rate * One day, System Administrator changed some system’s HBA card that is reported by H/W fault. The next day he changed, Those system got Performance Problem. Before * It looks like Application Query Problem. But Application Team didn’t agree with that. They Just believe those problem was from Change Management. After

  34. Q & A Q & A Thank you very much !

More Related