Computable Performance Metrics

Module 2: Computing Error and Work Estimators

Module Author

Kris Stewart

Computer Science Department

http://www.stewart.cs.sdsu.edu/

W.M. Keck Foundation

This file:
ComputablePerformanceMetrics2.htm

__Overview:__

Many of the calculations performed these days benefit from
the methods of numerical analysis with their proven error analysis. An excellent text is *Numerical Methods and Software* [1] providing a careful balance
between algorithms and their usefulness on modern computers, the development of
the error properties of these algorithms and the impact of the implementation
of the algorithm in a computer language, FORTRAN, in a portable manner so that
the text, and its codes, can be applied to the wide variety of computing
systems available now. The current
module will augment this treatment by pursuing **computationally measurable** performance metrics, such as accuracy
and cost that can be computed during the solution process.

Our metric for cost will be the CPU-time for the task. Our metric for accuracy will be the norm of the error, the difference between the true solution and the computed solution; therefore we limit ourselves in this exposition to problems with known solutions. The problem class to be explored is from the numerical solution of differential equations because we feel this links the mathematical theory predicted for the algorithm with an actual computable measure of performance nicely and is a problem class of interest to computational science.

We begin with the 1d solution of the boundary value problem

u’’(x) = f(x), u(0)=1, u(1)=1

with f(x) = -(2 π)^{2}
sin(2π x), so that true solution
is given by u(x) = sin (2π x). We are following an excellent lesson prepared
by Professor Matt W. Choptuik, Professor of Physics & Astronomy, _{}

__Introduction
to problem:__

In this module, we demonstrate a way to compute performance
statistics that are related to the mathematical predictions of the
approximation formulas from numerical analysis.
We examine the error due to finite difference approximation of the
derivative which is expected to be *O*
(∆x^{2 }) for the uniform
discretization ∆x = 1/N, where N = 2^{level}
for the sequence of grids, level = 4, 5, …, 20, therefore values of N = 16, 32,
…, 1048578. We also examine the work
performance to solve the tridiagonal linear system of equations for this sequence
of grid spacings. The known number of
arithmetic operations for applying the Gaussian Elimination algorithm to the
matrix with this known sparsity structure is *O*(N) for the adds and multiplies.
We recognize that we must consider the cost for fetching the values from
memory or from cache, and storing the result value back to memory. These costs will depend on the compiler and the
computer architecture where the program is run.
These costs are difficult to predict in advance. Therefore, we focus on the known Op Count for
arithmetic and seek validation from the run-time CPU time. We hope to discover: **For
what value of N will the arithmetic costs come to dominate the other issues of
memory access?**

__Statement
of the problem:__

Relating performance of Gaussian Elimination for tridiagonal
linear systems using known algorithm operation counts for work with the
measurable run-time data of an implemented code is explored. We expect the __linear__ work for a
problem of size N to be represented as

Work (N) = c N,

for some constant c. When
we double the number of points, we expect the work to double, i.e. take twice
as long to compute.

We expect __quadratic__ accuracy proportional to the
square of our grid spacing, ∆x, or inversely proportional to the square
of the problem size, given by

Accuracy (N) = d
∆x^{2 }= d 1/N^{2}

for some constant d. When we double the number of points, we expect
the error to decrease by a factor of 4, i.e. multiply the error by a factor of _{}

We use the standard finite difference approximation of

u’’(x_{i}) ~
(u_{i+1 }– 2*u_{i} + u_{i-1})/∆x^{2},

x_{i} = i*∆x,

u_{i+1} ~ u(x_{i+1})

and examine a sequence of grid-sized levels,

N = 2^{level},
level = 4, 5, … 20.

Our overall expectations of performance may be described using the ratios of the successively finer grids. For example, in the case for N=16 and N=32

Accuracy(16) ~ d (1/16)^{2}

and

Accuracy(32) ~ d (1/32)^{2
} ~ 2 ^{-2} Accuracy(1/16)

Computing their ratio and taking the Log_{2} yields
the exponent of change, one (1) for the linear measure of work and two (2) for
the quadratic measure of accuracy.

Log_{2}
([Accuracy(coarse grid) / Accuracy(fine grid)]
) ~ 2

Log_{2}
([Work(coarse grid) / Work(fine grid)] )
~ -1

Our computational task is to attempt to verify the quadratic relationship for accuracy using the known true solution to compute the error and discover the value of N where this occurs. In the same program, we expect to verify the linear work to solve a tridiagonal system of linear equations.

__Solution
methodology:__

We use equally spaced grid points on [0,1]

This solution was computed by Dr. Matt Choptuik and is
available from his home page http://laplace.physics.ubc.ca/People/matt/Teaching/04Fall/PHYS410/Doc/linsys/ex2/soln8.ps. Dr. Choptuik’s
class lesson was the starting point for the code used to demonstrate this
module’s **computable metrics** of
accuracy and work. We used GhostScript
[5] to view the postscript files from Dr. Choptuik’s class page and SnagIt [6]
to capture the image as PGN (portable network graphic) above to include in this document.

Dr. Choptuik’s class page also
presents the plot of the error for varying grids of level = 5, 6 and 7 reproduced in this module.

http://laplace.physics.ubc.ca/People/matt/Teaching/04Fall/PHYS410/Doc/linsys/ex2/err567.ps

__Assessment
of how well the model solves the problem:__

We are indebted to Professor Matthew Choptuik, Dept of
Physics and Astronomy,

__Empirical
data:__

Table 1 presents the results from running kbvp1d-single.f
with the single precision LAPACK subroutine SGTSV. Also on Table 1 are the known eigenvalues for
the matrix A, found online through a Google search described in the references
[7]. Notice the shaded rows of the table
(in the Word document) when level ≥ 8.
These are *untrustworthy* due to
the ill-conditioning of the matrix A and the natural propagation of round-off
error in the computation. Round-off
error was covered in Modules 0 and 1 of this set of notes.

__Conceptual
questions to examine student’s understanding:__

**Did you notice a
consistency in the behavior of the work performance metric in Table 1?**

Student should notice that the Work Ratio column shows a
smooth value of ½ = 2^{-1}, which would be expected for a linear
algorithm doing Gaussian Elimination on a tridiagonal system of equations. The computed exponent is a smooth value of -1.0
validating that Work is inversely proposal to problem size.

Note: student should
begin to feel comfortable examining these computed values and understanding
that log_{2}(1/2) = -1, which is our expect work performance metric for
a linear algorithm.

**Did you notice a
consistency in the behavior of the accuracy performance metric, the Error
Ratio, in Table 1?**

Students should notice that the first 4 rows, do provide a smooth
result. The number of grid points is
doubled, N → N*2 or ∆x
→ _{}, and we expect the accuracy to be make smaller by a factor
of _{} or a factor of 4 for
the quadratic grid approximation used for the finite difference. Taking the ratio of the coarse-grid approximate
to the fine-grid approximate, we compute the Error Ratio of roughly 4 with the
exponent of 2, for the first four rows of Table 1 (level = 4, 5, 6, 7).

After the fourth row of Table 1, we have a grid-spacing of h = ∆x = 0.0078125. We are reaching a limitation that comes from the ill-conditioning of the linear system of equations. For the current problem with N = 4, our matrix is

A = _{}

The matrix A has a large magnitude eigenvalue

λ_{1} = -2
{ 1 + cos (π/N) } ~ -4

and small magnitude eigenvalue

λ_{N} =
-2 { 1 + cos ({N-1)π/N } ~ -h^{2}

The explicit formula for these eigenvalues was found online [7]. The eigenvalues determine the condition number of a symmetric matrix such as this [8]. Please see the discussion under Tables 1 and 2.

__Problems
and projects:__

Students can be asked to use a double precision version of the same experiment. The results are presented in Table 2 running kbvp1d-double.f with the double precision LAPACK code, DGTSV. These results are shaded in rows 11 to 17 indicating their untrustworthiness, as we saw in the single precision results for smaller values of N.

__Solutions:__

We take the problem cited above and use the FORTRAN code
kbvp1d-double.f, included in the zipfile with this module, to approximate the
solution [2] for this module. This code
sets up the boundary value problem above, set in a way so that the true
solution is known and computed. We
modified the original code to examine a sequence of grids, for levels 4 up to
12, giving N = 2^{level} + 1, i.e. N = 17, 33, 65, 129, 257, 513, 1025,
2049, 4097, 8193, 16385, 32769, 65537, 131073, 262145, 524289, and
1048577. The sample code uses the LAPACK
tridiagonal system solver, DGTSV. The
code uses the UNIX utility, dtime [9].
The zipfile also contains files to compile and link with the high
performance Sun subroutine library.

__Suggestions
to instructors for using the module:__

It is tempting to try to cover too much in a module such as
this. It is a challenge to distinguish
for the students how simplifications have been made without compromising the
actual solution process. In the past,
the author used this module to motivate students to discover the relationship
between the mathematical theory that states Gaussian Elimination can be performed
in *O*(N^{3}) operations for a
dense matrix and the observed performance in computing the numerical
solution. Students are often familiar
with Gaussian Elimination for dense systems and can believe the Op Count
through an easy lecture, but it is a mistake to try to treat the example of the
boundary value problem as dense due it
has known sparsity structure. We have
observed that students will continue to use computational techniques learned in
class for problem solving later in their career. They do not remember your caveat “This is
only for a class example, you would never use dense Gaussian Elimination for
this problem in practice”. We now feel
that it is better to use the special case of tridiagonal systems with their
linear operation count for cost.

We also recommend referring to u(x) so that later one can ease into the 2d problem as u(x,y). Our students just are not comfortable with y(x) and then later u(x,y(x)) and its dependencies, so a careful choice of notation at the start will pay off later.

Searching for a suitable problem to generate Au=b, it was easy to select the diffusion equation with its computational molecule to approximate

∂^{2}u(x_{i})
/ ∂x^{2} ~ _{}

at *x*_{i} =
*x _{0}*

An interesting source of a wide variety of computational molecules is available from of Abramowitz & Stegun, now available on-line[10]. We present formula 25.3.23 here.

It was an interesting find to see how many problems from physics are approximated by this same linear system. A nice lecture is presented in [11] for the Two-Mass Vibrating System by Professor Erik Cheever.

Consideration should be put into the prerequisites that you
might want to require for a course, or a module such as this. In the university setting, the content of
courses is always dynamic and there is a time-dependency on the correspondence
between a syllabus, a catalog description and what is actually covered in a
course. For this module, we use

We note an interesting curricular investigation for any instructor to pursue to better understand the situation at your local institution. If you require a course as a prerequisite for your own course, be clear in your own mind about precisely which topic from that prerequisite course you really expect students to understand. Then, take the time to speak with the faculty member at your school who currently teaches the course. Ask them how they cover this topic. Do they motivate the topic? With what example? Think to yourself how well this treatment prepares students for your course. You can then try to convince the other instructor teaching the prerequisite of your course to see the value of orienting their course to be aligned more smoothly with your needs. If this is not successful, you will want to spend time in your course to develop the concept more before pursuing this example.

Instructors will want to have a good relationship with the System Adminstrator of the computer used for instruction. This module will benefit from using the largest possible vector sizes to have the full range of computed metric values to make a convincing argument. At SDSU, we needed an increment in our Storage Limit for student accounts.

__References:__

[1] Numerical Methods and Software, D. Kahaner, C. Moler and S. Nash, Prentice-Hall, 1989. Although this text is somewhat dated, using the FORTRAN routines from LinPACK, the treatment of linear systems, the condition number of the matrix and the behavior of its error is very well written and appropriate for this module.

[2] Professor M.W. Choptuik, Physics 410: Computational Physics Course Notes, Fall 2004, Linear Systems. http://laplace.physics.ubc.ca/People/matt/Teaching/04Fall/PHYS410/Notes.html

[3] LAPACK Users’ Guide, 3^{rd} Edition,

http://www.netlib.org/lapack/lug/

[4] DGTSV tridiagonal system solver from LAPACK. Available online from

http://www.netlib.org/lapack/double/dgtsv.f

[5] Ghostgum Software Pty Ltd, provides a shareware code to view postscript files that we have used for a while. Many other providers are possible. http://www.ghostgum.com.au/

[6] SnagIt Screen Capture software from TechSmith. http://www.techsmith.com/snagit.asp

[7] Eigenvalues of discretization matrix available from Math Forum discussion on line at

http://mathforum.org/kb/thread.jspa?threadID=1415533&messageID=4926102

This was an interesting Google search. I remembered from my first numerical analysis course, using the excellent text, Numerical Analysis: A Second Course [8] by Ortega, that the closed form formula for these eigenvalues was presented. My copy of the text was 40 miles away in my SDSU office and to “save the environment”, I did not want to drive out there. A Google search for "closed form" eigenvalue "-1 2 -1" yielded the above URL on the second page of results.

“Paul Abbott Re: why are the polynomials in this series all solvable by radicals?

Posted: Jul 18, 2006 12:11 AM

In article <1153154093.967916.305090@i42g2000cwa.googlegroups.com>,

"Lee Davidson" <lmd@meta5.com> wrote:

> Consider n+1 identical masses m in linear series with n
identical

> springs of spring constant k connecting adjacent masses. Taking the

> lengths of the springs as n variables, you can set up n independent

> second order linear differential equations and analyze the solutions.

> What you want are eigenvalues, to determine characteristic vibrational

> frequencies. After some simplification and change of variable, you get

> polynomials according to the following recurrence relation:

> p(1) = x

> p(2) = x^2-1

> p(n) = x * p(n-1) - p(n-2) for n >= 3

Determining the closed-form eigenvalues is straightforward;
employing

the recurrence relations for the Chebyshev polynomials, one obtains

p(n) = ChebyshevU[n, x/2] == Sin[(n + 1) ArcCos[x/2]]/Sqrt[1 - x^2/4]

Alternatively, the eigenvalues of the matrix above are

-2 (1 +

Paul Abbott Phone: 61 8 6488 2734

The University of Western Australia (CRICOS Provider No 00126G)

“

[8] J.M. Ortega, Numerical Analysis: A Second

[9] Sun Performance Library User’s Guide

http://192.18.109.11/819-3692/819-3692.pdf

[10] Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables, M. Abramowitz and I.E. Stegun, Editors, National Bureau of Standards, 1955. Now available on line from http://www.convertit.com/Go/ConvertIt/Reference/AMS55.ASP?Res=150&Page=-1

The actual formula cited above can be found directly at

http://www.convertit.com/Go/ConvertIt/Reference/AMS55.ASP?Res=150&Page=884&Submit=Go

Taking you directly to page 844 of this fascinating compendium.

[11] Erik Cheever, Professor and Chair, Department of
Engineering,

http://www.swarthmore.edu/NatSci/echeeve1/Ref/MtrxVibe/EigApp/EigVib.html

We backed up within the URL to find the author was Professor Cheever

http://www.swarthmore.edu/NatSci/echeeve1/

**Table 1: bvp1d using SGTSV from LAPACK, Single
Precision, Tridiagonal System Solver**

** (shaded values [level≥8] look to be contaminated by
ill-conditioning and round-off)**

** Error Ratio
Error Exponent Work Ratio Work Exponent
Grid Space H Time
True Error**

level: 4 N:
17 h: 6.2500E-02 T: 7.3000E-05 Err:
8.8837E-03

level: 5 N:
33 Erat: 3.9681E+00 log2:
1.9933E+00 Trat: 6.6364E-01
log2:-5.9297E-01 h: 3.1250E-02 T:
1.1000E-04 Err: 2.2387E-03

level: 6 N:
65 Erat: 3.9930E+00 log2:
2.0023E+00 Trat: 4.9327E-01
log2:-1.0220E+00 h: 1.5625E-02 T:
2.2300E-04 Err: 5.6067E-04

level: 7 N:
129 Erat: 4.0013E+00 log2:
2.0053E+00 Trat: 4.8690E-01
log2:-1.0408E+00 h: 7.8125E-03 T:
4.5800E-04 Err: 1.4012E-04

level: 8 N: 257 Erat:
1.9158E+00 log2: 9.4022E-01 Trat:
4.9945E-01 log2:-1.0040E+00 h:
3.9062E-03 T: 9.1700E-04 Err: 7.3140E-05

level: 9 N: 513 Erat:
4.1756E+00 log2: 2.0670E+00 Trat:
4.9541E-01 log2:-1.0158E+00 h:
1.9531E-03 T: 1.8510E-03 Err: 1.7516E-05

level: 10 N: 1025
Erat: 7.2343E-02 log2:-3.7982E+00 Trat: 4.9879E-01 log2:-1.0059E+00 h: 9.7656E-04 T: 3.7110E-03 Err: 2.4212E-04

level: 11 N: 2049
Erat: 2.5629E-01 log2:-1.9689E+00 Trat: 5.0244E-01 log2:-9.9539E-01 h: 4.8828E-04 T: 7.3860E-03 Err: 9.4472E-04

level: 12 N: 4097
Erat: 1.7760E-01 log2:-2.4993E+00 Trat: 5.0214E-01 log2:-9.9624E-01 h: 2.4414E-04 T: 1.4709E-02 Err: 5.3192E-03

level: 13 N: 8193
Erat: 4.2768E-02 log2:-4.5584E+00 Trat: 4.9699E-01 log2:-1.0111E+00 h: 1.2207E-04 T: 2.9596E-02 Err: 1.2437E-01

level: 14 N: 16385
Erat: 3.9379E+00 log2: 1.9822E+00 Trat: 1.8995E+00 log2: 9.2786E-01 h: 6.1035E-05 T: 1.5581E-02 Err: 3.1584E-02

level: 15 N: 32769
Erat: 7.9818E-03 log2:-6.9860E+00 Trat: 4.9861E-01 log2:-1.0065E+00 h: 3.0518E-05 T: 3.1249E-02 Err: 3.9569E+00

level: 16 N: 65537
Erat: 9.9904E-01 log2:-1.3962E-03 Trat: 4.9888E-01 log2:-1.0057E+00 h: 1.5259E-05 T: 6.2638E-02 Err: 3.9608E+00

level: 17 N: 131073 Erat:
1.0020E+00 log2: 2.8946E-03 Trat:
4.9883E-01 log2:-1.0058E+00 h:
7.6294E-06 T: 1.2557E-01 Err: 3.9529E+00

level: 18 N: 262145 Erat:
1.0027E+00 log2: 3.9606E-03 Trat:
5.0204E-01 log2:-9.9654E-01 h:
3.8147E-06 T: 2.5012E-01 Err: 3.9420E+00

level: 19 N: 524289 Erat:
1.0006E+00 log2: 9.2084E-04 Trat:
5.0447E-01 log2:-9.8954E-01 h:
1.9073E-06 T: 4.9581E-01 Err: 3.9395E+00

level: 20 N:1048577 Erat:
9.9286E-01 log2:-1.0367E-02 Trat:
4.8770E-01 log2:-1.0385E+00 h:
9.5367E-07 T: 1.0166E+00 Err: 3.9679E+00

λ_{1}
= -2 ( 1 + cos _{}) ≈ -4
(for N sufficiently large so that π/N is roughly 0 and cos _{} ≈ 1)

λ_{N}
= -2 ( 1 + cos _{}) ≈ -2 (
1 + cos π ) = -2 { 1 + (-1 + _{}+ …) ) ≈ - h^{2}

Yielding a condition number of roughly_{}, which for above N=129 (h = .0078125), gives an estimate of
the propagated round-off error beyond the single-precision machine round-off.

**Table 2: bvp1d using DGTSV from LAPACK, Double
Precision, Tridiagonal System Solver**

**(shaded values [level≥14]
look to be contaminated by ill-conditioning and round-off)**

** Error Ratio
Error Exponent Work Ratio
Work Exponent Grid Space H Time True Error**

level: 4 N:
17 h:
6.2500E-02 T: 8.3000E-04 Err: 8.8841E-03

level: 5 N:
33 Erat: 3.9637E+00 log2:
1.9916E+00 Trat: 1.1857E+02 log2:
6.9063E+00 h: 3.1250E-02 T: 7.0000E-06
Err: 2.2414E-03

level: 6 N:
65 Erat: 3.9753E+00 log2:
1.9959E+00 Trat: 6.3636E-01
log2:-6.5366E-01 h: 1.5625E-02 T:
1.1000E-05 Err: 5.6383E-04

level: 7 N:
129 Erat: 3.9860E+00 log2:
1.9998E+00 Trat: 5.5000E-01
log2:-8.6459E-01 h: 7.8125E-03 T:
2.0000E-05 Err: 1.4145E-04

level: 8 N:
257 Erat: 3.9926E+00 log2:
2.0022E+00 Trat: 5.4054E-01
log2:-8.8968E-01 h: 3.9062E-03 T:
3.7000E-05 Err: 3.5428E-05

level: 9 N:
513 Erat: 3.9962E+00 log2:
2.0035E+00 Trat: 5.2113E-01
log2:-9.4257E-01 h: 1.9531E-03 T:
7.1000E-05 Err: 8.8655E-06

level: 10 N: 1025 Erat:
3.9981E+00 log2: 2.0041E+00 Trat:
5.0714E-01 log2:-9.8191E-01 h:
9.7656E-04 T: 1.4000E-04 Err: 2.2174E-06

level: 11 N: 2049 Erat:
3.9990E+00 log2: 2.0045E+00 Trat:
5.0360E-01 log2:-9.9206E-01 h:
4.8828E-04 T: 2.7800E-04 Err: 5.5449E-07

level: 12 N: 4097 Erat:
3.9995E+00 log2: 2.0047E+00 Trat:
5.0090E-01 log2:-9.9982E-01 h:
2.4414E-04 T: 5.5500E-04 Err: 1.3864E-07

level: 13 N: 8193 Erat:
4.0004E+00 log2: 2.0050E+00 Trat:
5.0226E-01 log2:-9.9589E-01 h:
1.2207E-04 T: 1.1050E-03 Err: 3.4657E-08

level: 14 N: 16385
Erat: 3.9926E+00 log2: 2.0022E+00 Trat: 5.0023E-01 log2:-1.0018E+00 h: 6.1035E-05 T: 2.2090E-03 Err: 8.6803E-09

level: 15 N: 32769
Erat: 4.1033E+00 log2: 2.0417E+00 Trat: 4.9763E-01 log2:-1.0093E+00 h: 3.0518E-05 T: 4.4390E-03 Err: 2.1154E-09

level: 16 N: 65537
Erat: 3.3954E+00 log2: 1.7678E+00 Trat: 4.5538E-01 log2:-1.1376E+00 h: 1.5259E-05 T: 9.7480E-03 Err: 6.2304E-10

level: 17 N: 131073 Erat:
6.0395E-01 log2:-7.2927E-01 Trat:
4.5194E-01 log2:-1.1486E+00 h: 7.6294E-06
T: 2.1569E-02 Err: 1.0316E-09

level: 18 N: 262145 Erat:
7.1720E-01 log2:-4.8071E-01 Trat:
4.7850E-01 log2:-1.0660E+00 h:
3.8147E-06 T: 4.5076E-02 Err: 1.4384E-09

level: 19 N: 524289 Erat:
7.9064E-01 log2:-3.3973E-01 Trat:
4.9784E-01 log2:-1.0087E+00 h:
1.9073E-06 T: 9.0543E-02 Err: 1.8192E-09

level: 20 N:1048577 Erat:
4.7141E-01 log2:-1.0876E+00 Trat:
4.9370E-01 log2:-1.0207E+00 h:
9.5367E-07 T: 1.8339E-01 Err: 3.8591E-09

Working in double precision, we see continued, validated error decline through N = 8193. The eigenvalues (and condition number of the matrix) have not changed from the single precision case above. So for N=8193, (h=1.22e-4), we have the propagated error from the marginally conditioned matrix problem, 4/h^2, is roughly a factor of 4.e+8, (400000000), multiplied by double precision round-off error and by the discretization error of the underlying grid approximation.