CS 575 Supercomputing - Lecture Outline
Chapter 6: Timing and Profiling
And Links to Sun Documents on Tools
October 13, 2003

Dr. Kris Stewart (stewart@rohan.sdsu.edu)
San Diego State University

This URL is http://www.stewart.cs.sdsu.edu/cs575/lecs/ch06.html


Why do we use software tools to examine performance?

Recall RISC architectures are Ch2: pipelined to ensure efficient use of the separate hardware devices

Earlier in the semester, we examined tools to allow the programmer to easily time an entire program, or any UNIX command, when in the csh.

Using time csample

Fig. 6-1 The built-in csh time function

© O'Reilly Publishers (Used with permission)

Timing a Portion of the Program

Overview of the System Timer [Sept. 11, 2002 Lab Introduction to Timing]

Subroutine Profiling

There are standard tools provided for UNIX platforms that let one identify, by subprogram, where usage is highest. This leads to characterizing sharp and flat profiles.

Where would you invest your time to optimize?

Fig. 6-2 Sharp Profile - dominated by routine 1

© O'Reilly Publishers (Used with permission)

Fig. 6-3 Flat profile - no routine dominates

© O'Reilly Publishers (Used with permission)

How do you discover these routines and their run-time performance?

Fig. 6-4 Simple Call Graph

© O'Reilly Publishers (Used with permission)

Gprof

gprof is a useful UNIX tool that requires recompiliation of all codes with the -pg option. As you'll see below, extensive data is collected including the call graph as well as the ordered list of modules and their CPU percent usage. There is no alterative of the user source code, say to insert system timer calls.

Sun Doc Analyzing Program Performance - Appendix A gives examples using Gprof

Modified from our text, p. 118,

cat input100
10
10
cat input200
10
20
cat input400
10
40
# script to run the test data and accumlate gprof summaries
# don't forget to    nice +19 runit
limit cputime 3600
nice +19 diffusion < input100 
mv gmon.out gmon.1
nice +19 diffusion < input200 
mv gmon.out gmon.2
nice +19 diffusion < input400 
mv gmon.out gmon.3
# typo in our text p. 118 (no "-s")
gprof diffusion  gmon.1 gmon.2 gmon.3 > gprof.summary.out

run-gprof-sge

What is produced by the profiler?

Entire data file
Flat Profile
This is the top part of the gprof.summary.out file - defines the terms and gives the ordered list of subprograms by execution time
Call Graph
The next portion of the "Entire data file" defines the Call Graph in a text-based manner
Index (Function by name)
Every module (name and index number) - includes the system calls too

Fig. 6-5 FORTRAN example

© O'Reilly Publishers (Used with permission)

CAUTION: A Few Words About Accuracy

These profiling tools involve recompiling the source code to force the compiler to generate additional code that can be sampled. For example, every 1/100th second, the profiler asks Which routine is running? (program stack) and counts this for performance data.

Fig. 6-6 Quantization errors in profiling

© O'Reilly Publishers (Used with permission)


Consider the following schematic for the Sun Sparc Enterprize 4000 hardware
Sun Sparc SunFire 4800 Hardware Overview

p. 26 of document above (http://www.sun.com/products-n-solutions/hardware/docs/pdf/805-7362-12.pdf)