1 / 38

Lecture 4 Timing Linux Processes & Threads

Lecture 4 Timing Linux Processes & Threads. CSCE 713 Advanced Computer Architecture. Topics … Readings. January 19, 2012. Overview. Last Time Posix Pthreads : create, join, exit, mutexes / class/csce713-006 Code and Data Readings for today

juanjharris
Télécharger la présentation

Lecture 4 Timing Linux Processes & Threads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 4Timing Linux Processes & Threads CSCE 713 Advanced Computer Architecture • Topics • … • Readings January 19, 2012

  2. Overview • Last Time • PosixPthreads: create, join, exit, mutexes • /class/csce713-006 Code and Data • Readings for today • http://csapp.cs.cmu.edu/public/1e/public/ch9-preview.pdf • New • Website alive and kicking; dropbox too! • From Last time: Gauss-Seidel Method, Barriers, Threads Assignment • Next time performance evaluation, barriers and MPI intro

  3. Threads programming Assignment • Matrix addition (embarassingly parallel) • Versions • Sequential • Sequential with blocking factor • Sequential Read without conversions • Multi threaded passing number of threads as command line argument (args.c code should be distributed as an example) • Plot of several runs • Next time

  4. Time in the Computer World • Clock cycle time = ? • 1GHz processor has 10-9 = 1 ns clock cycle time Computer Systems: A Programmers Perspective, Bryant and O’Hallaron

  5. Times in the Unix World Command • Real Time • User Time • System Time • Waiting Time – time waiting on I/O operations and while other processes execute • Clock Time • Which time is most important? • Latency vs throughput

  6. The Time Command • TIME(1) TIME(1) • NAME • time - run programs and summarize system resource usage • Example of a sleepy program • real 3m10.006s • user 0m0.004s • sys 0m0.000s

  7. MIPS, MFLOPS, • Benchmarks • SPEC2006 – relevance to this course

  8. Multitasking and Interval Interrupts • Recall Multitasking from OS class: • multiple processes ready to go • scheduler decides who goes next • each process given small time slice (1-10ms) • The process end either because of an I/O operation, page-fault, even cache-miss, or it uses up time slice • System has an “interval timer” to interupt at the end of the time slice

  9. CSAPP Fig 9.2 • . Computer Systems: A Programmers Perspective, Bryant and O’Hallaron

  10. Measuring Time by Interval Counting • . Computer Systems: A Programmers Perspective, Bryant and O’Hallaron

  11. Practice Problem 9.4 • On a system with a timer interval of 10 ms, some segment of process A is recorded as requiring 70 ms, combining both system and user time. What are the minimum and maximum actual times used by thissegment?

  12. Programmer Access to Interval Timers • #include <sys/times.h> • structtms • clock_ttms_utime; /* user time * / • clock_ttms_s time; /* system time * / • clock_ttms_cutime; /* user time of reaped children */ • clock_ttms_cstime; /* system time of reaped children */ • } ; • clock_t times(structtms *buf); • Returns: number of clock ticks elapsed since system started Computer Systems: A Programmers Perspective, Bryant and O’Hallaron

  13. Measuring interval counting accuracy. Fig 9.8

  14. Cycle Counters on the IA32 • Cycle counters increment every clock cycle • 32 bit counter with 1GHz clock wraps after 232/109 ~ 4.3 seconds • 64 bit counter takes a lot longer 264/109 ~ 570 years • IA32 counter accessed with the “rdtsc” • (read time stamp counter) instruction • It places high order 32 bits in %edx and low 32 in %eax • It would be nice if we had a c interface • void access counter(unsigned *hi, unsigned *lo);

  15. Including Assembly Code in C • void access_counter(unsigned *hi, unsigned *lo) • { • asm("rdtsc; movl %%edx,%0; movl %%eax,%1" /* Read cycle counter */ • : "=r" (*hi), "=r" (*lo) /* and move results to */ • : /* No input */ /* the two outputs */ • : "%edx", "%eax"); • }

  16. Closer Look at Extended ASM void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } asm(“Instruction String" : Output List : Input List : Clobbers List); } • Instruction String • Series of assembly commands • Separated by “;” or “\n” • Use “%%” where normally would use “%”

  17. Closer Look at Extended ASM void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } asm(“Instruction String" : Output List : Input List : Clobbers List); } • Output List • Expressions indicating destinations for values %0, %1, …, %j • Enclosed in parentheses • Must be lvalue • Value that can appear on LHS of assignment • Tag "=r" indicates that symbolic value (%0, etc.), should be replaced by register

  18. Closer Look at Extended ASM void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } asm(“Instruction String" : Output List : Input List : Clobbers List); } • Input List • Series of expressions indicating sources for values %j+1, %j+2, … • Enclosed in parentheses • Any expression returning value • Tag "r" indicates that symbolic value (%0, etc.) will come from register

  19. Closer Look at Extended ASM void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } asm(“Instruction String" : Output List : Input List : Clobbers List); } • Clobbers List • List of register names that get altered by assembly instruction • Compiler will make sure doesn’t store something in one of these registers that must be preserved across asm • Value set before & used after

  20. Using access_counter • #include "clock.h" • void start counter(); • double get counter();

  21. The K-Best Measurement Scheme • Measurements of process time in a loaded system are always overestimates • Context switching, cache operations, branch predictions • Interference from other processes • Even unloaded I/O operations will vary because of locations of disk heads etc. • Heisenberg uncertainty principle • Heisenbugs – bugs influenced/created by monitoring code

  22. The K-Best Measurement Scheme • Warm up the instruction cache • Record the K fastest times • If these agree with some tolerance say .1% then the fastest of these represents the true execution time.

  23. Measuring Using Time of DayCode/CSAPP/perf/tod. c • #include <stdlib.h> • #include <stdio.h> • #include <time.h> • #include <sys/time.h> • #include <unistd.h> • static structtimevaltstart; • /* Record current time */ • void start_timer() • { • gettimeofday(&tstart, NULL); • }

  24. /* Get number of seconds since last call to start_timer */ • double get_timer() • { • structtimevaltfinish; • long sec, usec; • gettimeofday(&tfinish, NULL); • sec = tfinish.tv_sec - tstart.tv_sec; • usec = tfinish.tv_usec - tstart.tv_usec; • return sec + 1e-6*usec; • }

  25. /* Determine how many "seconds" are in tod counter */ • static void callibrate_tod() • { • double quick, slow; • start_timer(); • quick = get_timer(); • start_timer(); • sleep(1); • slow = get_timer(); • printf("%.2f - %f = %.2f seconds/sleep second\n", • slow, quick, slow-quick); • }

  26. static void run_timer() • { • inti = 0; • double t1, t2, t3, t4; • start_timer(); • t1 = get_timer(); • do { • t2 = get_timer(); • } while (t2 == t1); • do { • t3 = get_timer(); • i++; • } while (t3 == t2); • printf("Time = %f usecs, in %d iterations, %f usec/iteration\n“, 1e6 *(t3-t2), i, 1e6 *(t3-t2)/i);

  27. start_timer(); • i = 0; • do { • t4 = get_timer(); • i++; • } while (t4 < 1.0); • printf("%d iterations in %f secs = %f usec/iteration\n", • i, t4, 1e6*t4/i); • }

  28. static void check_time() { • long e; • int s, m, h, d, y; • start_timer(); • e = tstart.tv_sec; • s = e % 60; • e = e / 60; • m = e % 60; • e = e / 60; • h = e % 24; • e = e / 24; • d = e % 365; • e = e / 365; • y = e; • printf("This clock started %d years, %d days, %d hours, %d minutes, %d seconds ago\n“, y, d, h, m, s); • }

  29. int main(intarg, char *argv[]) • { • callibrate_tod(); • run_timer(); • check_time(); • return 0; • }

  30. Gprof – profiling Linux programs • Links • https://computing.llnl.gov/tutorials/performance_tools/ • http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.html • http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html • http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29 • http://en.wikipedia.org/wiki/List_of_performance_analysis_tools • http://docs.freebsd.org/44doc/psd/18.gprof/paper.pdf • http://linuxgazette.net/100/vinayak.html • http://ececmpsysweb.groups.et.byu.net/cmpsys.2004.winter/citizenship/Bryan_Wheeler/Profiling_Tutorial.html

  31. Man gprof • GPROF(1) GNU GPROF(1) • NAME gprof- display call graph profile data • SYNOPSIS • gprof [ -[abcDhilLrsTvwxyz] ] [ -[ACeEfFJnNOpPqQZ][name] ] • [ -I dirs ] [ -d[num] ] [ -k from/to ] • …] • [ --[no-]exec-counts[=name] ] • … • [ --debug[=level] ] [ --function-ordering ] • … • [ --file-info ] [ --help ] [ --line ] [ --min-count=n ] • … • [ --static-call-graph ] [ --sum ] [ --table-length=len ] • … • [ image-file ] [ profile-file ... ]

  32. Gprof – GNU profiler • http://www.unix-tutorials.com/go.php?id=321 -ArnoutEngelen • $ time event2dot //this took more than 3 minutes on this input: • real 3m36.316s user 0m55.590s sys 0m1.070s • $ g++ -pg dotgen.cpp readfile.cpp main.cpp graph.cpp config.cpp -o event2dot • $ gprof event2dot | less'. • gprof now shows us the following functions are important: • % cumulative self self total • time seconds seconds calls s/call s/call name • 43.32 46.03 46.03 339952989 0.00 0.00 CompareNodes(…) • 25.06 72.66 26.63 55000 0.00 0.00 getNode(…) • 16.80 90.51 17.85 339433374 0.00 0.00 CompareEdges(…) • 12.70 104.01 13.50 51987 0.00 0.00 addAnnotatedEdge(… ) • 1.98 106.11 2.10 51987 0.00 0.00 addEdge(…)

  33. Valgrind • Valgrind is a memory mismanagement detector. It shows you memory leaks, deallocation errors, cache profiling • Memcheck can detect: • Use of uninitialised memory • Reading/writing memory after it has been free'd • Reading/writing off the end of malloc'd blocks • Reading/writing inappropriate areas on the stack • Memory leaks -- where pointers to malloc'd blocks are lost forever • Mismatched use of malloc/new/new; vs free/delete/delete; • Overlapping src and dst pointers in memcpy() and related functions • Some misuses of the POSIX pthreads API http://cs.ecs.baylor.edu/~donahoo/tools/valgrind/

  34. General Parallel Computing Wiki-Links • http://en.wikipedia.org/wiki/Category:Parallel_computing • http://en.wikipedia.org/wiki/Intel_Parallel_Studio • http://www.itservices.hku.hk/sp2/workshop/html/ibmhwsw/ibmhwsw.html • http://www.itservices.hku.hk/sp2/workshop/html/ibmhwsw/ibmhwsw.html#power

  35. Libraries • http://static.msi.umn.edu/tutorial/scicomp/sp/math-numerical-lib-SP/num_lib.html

More Related