Computable Performance Metrics

Module 2: Computing Error and Work Estimators

Module Author

Kris Stewart

Computer Science Department

San Diego State University

http://www.stewart.cs.sdsu.edu/

W.M. Keck Foundation

This file: ComputablePerformanceMetrics2.htm

Overview:

Many of the calculations performed these days benefit from the methods of numerical analysis with their proven error analysis.  An excellent text is Numerical Methods and Software [1] providing a careful balance between algorithms and their usefulness on modern computers, the development of the error properties of these algorithms and the impact of the implementation of the algorithm in a computer language, FORTRAN, in a portable manner so that the text, and its codes, can be applied to the wide variety of computing systems available now.  The current module will augment this treatment by pursuing computationally measurable performance metrics, such as accuracy and cost that can be computed during the solution process.

Our metric for cost will be the CPU-time for the task.  Our metric for accuracy will be the norm of the error, the difference between the true solution and the computed solution; therefore we limit ourselves in this exposition to problems with known solutions.  The problem class to be explored is from the numerical solution of differential equations because we feel this links the mathematical theory predicted for the algorithm with an actual computable measure of performance nicely and is a problem class of interest to computational science.

We begin with the 1d solution of the boundary value problem

u’’(x) = f(x), u(0)=1, u(1)=1

with f(x) = -(2 π)2 sin(2π x), so that true solution is given by u(x) = sin (2π x).  We are following an excellent lesson prepared by Professor Matt W. Choptuik, Professor of Physics & Astronomy, University of British Columbia, Vancouver, BC, CANADA [2].  His bvp1d.f code was the basis for this module.  Slight alterations were made to suit the goals here.  With a uniform discretization, the standard finite difference approximation yields a tridiagonal linear system of equations, A u = b, which is solved using SGTSV from LAPACK [3] and later by DGTSV [4] for the double-precision solution.

Introduction to problem:

In this module, we demonstrate a way to compute performance statistics that are related to the mathematical predictions of the approximation formulas from numerical analysis.  We examine the error due to finite difference approximation of the derivative which is expected to be O (∆x2 ) for the uniform discretization ∆x = 1/N, where N = 2level for the sequence of grids, level = 4, 5, …, 20, therefore values of N = 16, 32, …, 1048578.  We also examine the work performance to solve the tridiagonal linear system of equations for this sequence of grid spacings.  The known number of arithmetic operations for applying the Gaussian Elimination algorithm to the matrix with this known sparsity structure is O(N) for the adds and multiplies.  We recognize that we must consider the cost for fetching the values from memory or from cache, and storing the result value back to memory.  These costs will depend on the compiler and the computer architecture where the program is run.  These costs are difficult to predict in advance.  Therefore, we focus on the known Op Count for arithmetic and seek validation from the run-time CPU time.  We hope to discover:  For what value of N will the arithmetic costs come to dominate the other issues of memory access?

Statement of the problem:

Relating performance of Gaussian Elimination for tridiagonal linear systems using known algorithm operation counts for work with the measurable run-time data of an implemented code is explored.  We expect the linear work for a problem of size N to be represented as

Work (N) = c N,

for some constant c.  When we double the number of points, we expect the work to double, i.e. take twice as long to compute.

We expect quadratic accuracy proportional to the square of our grid spacing, ∆x, or inversely proportional to the square of the problem size, given by

Accuracy (N) = d ∆x2 = d 1/N2

for some constant d.   When we double the number of points, we expect the error to decrease by a factor of 4, i.e. multiply the error by a factor of

We use the standard finite difference approximation of

u’’(xi) ~ (ui+1 – 2*ui + ui-1)/∆x2,

xi = i*∆x,

ui+1 ~ u(xi+1)

and examine a sequence of grid-sized levels,

N = 2level, level = 4, 5, … 20.

Our overall expectations of performance may be described using the ratios of the successively finer grids.  For example, in the case for N=16 and N=32

Accuracy(16) ~ d (1/16)2

and

Accuracy(32) ~ d (1/32)2  ~  2 -2 Accuracy(1/16)

Computing their ratio and taking the Log2 yields the exponent of change, one (1) for the linear measure of work and two (2) for the quadratic measure of accuracy.

Log2 ([Accuracy(coarse grid) / Accuracy(fine grid)]  ) ~ 2

Log2 ([Work(coarse grid) / Work(fine grid)]  ) ~ -1

Our computational task is to attempt to verify the quadratic relationship for accuracy using the known true solution to compute the error and discover the value of N where this occurs.  In the same program, we expect to verify the linear work to solve a tridiagonal system of linear equations.

Solution methodology:

We use equally spaced grid points on [0,1]

This solution was computed by Dr. Matt Choptuik and is available from his home page http://laplace.physics.ubc.ca/People/matt/Teaching/04Fall/PHYS410/Doc/linsys/ex2/soln8.ps.  Dr. Choptuik’s class lesson was the starting point for the code used to demonstrate this module’s computable metrics of accuracy and work.  We used GhostScript [5] to view the postscript files from Dr. Choptuik’s class page and SnagIt [6] to capture the image as PGN (portable network graphic) above to include in this document.

Dr. Choptuik’s class page also presents the plot of the error for varying grids of  level = 5, 6 and 7 reproduced in this module.

http://laplace.physics.ubc.ca/People/matt/Teaching/04Fall/PHYS410/Doc/linsys/ex2/err567.ps

Assessment of how well the model solves the problem:

We are indebted to Professor Matthew Choptuik, Dept of Physics and Astronomy, Univ. of British Columbia, 6224 Agricultural Road, Vancouver BC, V6T 1Z1 CANADA, for the excellent lecture notes and projects he shared on his class web page [2]

Empirical data:

Table 1 presents the results from running kbvp1d-single.f with the single precision LAPACK subroutine SGTSV.  Also on Table 1 are the known eigenvalues for the matrix A, found online through a Google search described in the references [7].  Notice the shaded rows of the table (in the Word document) when level ≥ 8.  These are untrustworthy due to the ill-conditioning of the matrix A and the natural propagation of round-off error in the computation.  Round-off error was covered in Modules 0 and 1 of this set of notes.

Conceptual questions to examine student’s understanding:

Did you notice a consistency in the behavior of the work performance metric in Table 1?

Student should notice that the Work Ratio column shows a smooth value of ½ = 2-1, which would be expected for a linear algorithm doing Gaussian Elimination on a tridiagonal system of equations.  The computed exponent is a smooth value of -1.0 validating that Work is inversely proposal to problem size.

Note:  student should begin to feel comfortable examining these computed values and understanding that log2(1/2) = -1, which is our expect work performance metric for a linear algorithm.

Did you notice a consistency in the behavior of the accuracy performance metric, the Error Ratio, in Table 1?

Students should notice that the first 4 rows, do provide a smooth result.  The number of grid points is doubled, N  → N*2 or ∆x → , and we expect the accuracy to be make smaller by a factor of  or a factor of 4 for the quadratic grid approximation used for the finite difference.  Taking the ratio of the coarse-grid approximate to the fine-grid approximate, we compute the Error Ratio of roughly 4 with the exponent of 2, for the first four rows of Table 1 (level = 4, 5, 6, 7).

After the fourth row of Table 1, we have a grid-spacing of h = ∆x = 0.0078125.  We are reaching a limitation that comes from the ill-conditioning of the linear system of equations.  For the current problem with N = 4, our matrix is

A =

The matrix A has a large magnitude eigenvalue

λ1 = -2 { 1 + cos (π/N) }  ~ -4

and small magnitude eigenvalue

λN = -2 { 1 + cos ({N-1)π/N }  ~ -h2

The explicit formula for these eigenvalues was found online [7].  The eigenvalues determine the condition number of a symmetric matrix such as this [8].  Please see the discussion under Tables 1 and 2.

Problems and projects:

Students can be asked to use a double precision version of the same experiment.  The results are presented in Table 2 running kbvp1d-double.f with the double precision LAPACK code, DGTSV.  These results are shaded in rows 11 to 17 indicating their untrustworthiness, as we saw in the single precision results for smaller values of N.

Solutions:

We take the problem cited above and use the FORTRAN code kbvp1d-double.f, included in the zipfile with this module, to approximate the solution [2] for this module.  This code sets up the boundary value problem above, set in a way so that the true solution is known and computed.  We modified the original code to examine a sequence of grids, for levels 4 up to 12, giving N = 2level + 1, i.e. N = 17, 33, 65, 129, 257, 513, 1025, 2049, 4097, 8193, 16385, 32769, 65537, 131073, 262145, 524289, and 1048577.  The sample code uses the LAPACK tridiagonal system solver, DGTSV.  The code uses the UNIX utility, dtime [9].  The zipfile also contains files to compile and link with the high performance Sun subroutine library.

Suggestions to instructors for using the module:

It is tempting to try to cover too much in a module such as this.  It is a challenge to distinguish for the students how simplifications have been made without compromising the actual solution process.  In the past, the author used this module to motivate students to discover the relationship between the mathematical theory that states Gaussian Elimination can be performed in O(N3) operations for a dense matrix and the observed performance in computing the numerical solution.  Students are often familiar with Gaussian Elimination for dense systems and can believe the Op Count through an easy lecture, but it is a mistake to try to treat the example of the  boundary value problem as dense due it has known sparsity structure.  We have observed that students will continue to use computational techniques learned in class for problem solving later in their career.  They do not remember your caveat “This is only for a class example, you would never use dense Gaussian Elimination for this problem in practice”.  We now feel that it is better to use the special case of tridiagonal systems with their linear operation count for cost.

We also recommend referring to u(x) so that later one can ease into the 2d problem as u(x,y).  Our students just are not comfortable with y(x) and then later u(x,y(x)) and its dependencies, so a careful choice of notation at the start will pay off later.

Searching for a suitable problem to generate Au=b, it was easy to select the diffusion equation with its computational molecule to approximate

2u(xi) / ∂x2 ~

at xi = x0 + i*∆x  for the uniform spacing ∆x dividing the domain into N equal intervals.

An interesting source of a wide variety of computational molecules is available from of Abramowitz & Stegun, now available on-line[10].  We present formula 25.3.23 here.

It was an interesting find to see how many problems from physics are approximated by this same linear system.  A nice lecture is presented in [11] for the Two-Mass Vibrating System by Professor Erik Cheever.

Consideration should be put into the prerequisites that you might want to require for a course, or a module such as this.  In the university setting, the content of courses is always dynamic and there is a time-dependency on the correspondence between a syllabus, a catalog description and what is actually covered in a course.   For this module, we use Taylor series in one dimension to understand the discretization error for the approximation of u’’(x), which is typically covered in the first semester of calculus.  We faculty are comfortable with two dimensional Taylor expansions, which is implied by the computational module above, but our students may not have that comfort level yet.  This can be addressed in the lecture when the module is presented.

We note an interesting curricular investigation for any instructor to pursue to better understand the situation at your local institution.  If you require a course as a prerequisite for your own course, be clear in your own mind about precisely which topic from that prerequisite course you really expect students to understand.  Then, take the time to speak with the faculty member at your school who currently teaches the course.  Ask them how they cover this topic.  Do they motivate the topic?  With what example?  Think to yourself how well this treatment prepares students for your course.  You can then try to convince the other instructor teaching the prerequisite of your course to see the value of orienting their course to be aligned more smoothly with your needs.  If this is not successful, you will want to spend time in your course to develop the concept more before pursuing this example.

Instructors will want to have a good relationship with the System Adminstrator of the computer used for instruction.  This module will benefit from using the largest possible vector sizes to have the full range of computed metric values to make a convincing argument.  At SDSU, we needed an increment in our Storage Limit for student accounts.

References:

[1] Numerical Methods and Software, D. Kahaner, C. Moler and S. Nash, Prentice-Hall, 1989.  Although this text is somewhat dated, using the FORTRAN routines from LinPACK, the treatment of linear systems, the condition number of the matrix and the behavior of its error is very well written and appropriate for this module.

[2] Professor M.W. Choptuik, Physics 410: Computational Physics Course Notes, Fall 2004, Linear Systems.  http://laplace.physics.ubc.ca/People/matt/Teaching/04Fall/PHYS410/Notes.html

[3] LAPACK Users’ Guide, 3rd Edition, SIAM Press, 1999. LAPACK = Linear Algebra PACKage Users’ Guide and links to the F77 codes is available on line from NETLIB

http://www.netlib.org/lapack/lug/

[4] DGTSV tridiagonal system solver from LAPACK.  Available online from

http://www.netlib.org/lapack/double/dgtsv.f

[5] Ghostgum Software Pty Ltd, provides a shareware code to view postscript files that we have used for a while.  Many other providers are possible.  http://www.ghostgum.com.au/

[6] SnagIt Screen Capture software from TechSmith. http://www.techsmith.com/snagit.asp

[7] Eigenvalues of discretization matrix available from Math Forum discussion on line at

This was an interesting Google search.  I remembered from my first numerical analysis course, using the excellent text, Numerical Analysis: A Second Course [8] by Ortega, that the closed form formula for these eigenvalues was presented.  My copy of the text was 40 miles away in my SDSU office and to “save the environment”, I did not want to drive out there.   A Google search for "closed form" eigenvalue "-1 2 -1" yielded the above URL on the second page of results.

“Paul Abbott   Re: why are the polynomials in this series all solvable by radicals?

Posted: Jul 18, 2006 12:11 AM

"Lee Davidson" <lmd@meta5.com> wrote:

> Consider n+1 identical masses m in linear series with n identical
> springs of spring constant k connecting adjacent masses. Taking the
> lengths of the springs as n variables, you can set up n independent
> second order linear differential equations and analyze the solutions.
> What you want are eigenvalues, to determine characteristic vibrational
> frequencies. After some simplification and change of variable, you get
> polynomials according to the following recurrence relation:
> p(1) = x
> p(2) = x^2-1
> p(n) = x * p(n-1) - p(n-2) for n >= 3

Determining the closed-form eigenvalues is straightforward; employing
the recurrence relations for the Chebyshev polynomials, one obtains
p(n) = ChebyshevU[n, x/2] == Sin[(n + 1) ArcCos[x/2]]/Sqrt[1 - x^2/4]
Alternatively, the eigenvalues of the matrix above are
-2 (1 + Cos[πk / (n+1)]), k = 1, 2, ..., n

Paul Abbott Phone: 61 8 6488 2734
School of Physics, M013 Fax: +61 8 6488 1014
The University of Western Australia (CRICOS Provider No 00126G)
AUSTRALIA http://physics.uwa.edu.au/~paul

[8] J.M. Ortega, Numerical Analysis: A Second Course, SIAM Press, 1990. This is an unabridged republication of the work first published by Academic Press, 1972.  http://202.38.126.65/mirror/www.siam.org/catalog/mcc06/ortega3.htm

[9] Sun Performance Library User’s Guide

[10] Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables, M. Abramowitz and I.E. Stegun, Editors, National Bureau of Standards, 1955.  Now available on line from http://www.convertit.com/Go/ConvertIt/Reference/AMS55.ASP?Res=150&Page=-1

The actual formula cited above can be found directly at

http://www.convertit.com/Go/ConvertIt/Reference/AMS55.ASP?Res=150&Page=884&Submit=Go

Taking you directly to page 844 of this fascinating compendium.

[11] Erik Cheever, Professor and Chair, Department of Engineering, Swarthmore College

http://www.swarthmore.edu/NatSci/echeeve1/Ref/MtrxVibe/EigApp/EigVib.html
We backed up within the URL to find the author was Professor Cheever

http://www.swarthmore.edu/NatSci/echeeve1/

Table 1: bvp1d using SGTSV from LAPACK, Single Precision, Tridiagonal System Solver

(shaded values  [level≥8] look to be contaminated by ill-conditioning and round-off)

Error Ratio      Error Exponent        Work Ratio     Work Exponent        Grid Space H     Time        True Error

level:  4 N:     17                                                                                                                                  h: 6.2500E-02 T: 7.3000E-05 Err: 8.8837E-03

level:  5 N:     33          Erat: 3.9681E+00 log2: 1.9933E+00 Trat: 6.6364E-01 log2:-5.9297E-01    h: 3.1250E-02 T: 1.1000E-04 Err: 2.2387E-03

level:  6 N:     65          Erat: 3.9930E+00 log2: 2.0023E+00 Trat: 4.9327E-01 log2:-1.0220E+00   h: 1.5625E-02 T: 2.2300E-04 Err: 5.6067E-04

level:  7 N:    129         Erat: 4.0013E+00 log2: 2.0053E+00 Trat: 4.8690E-01 log2:-1.0408E+00   h: 7.8125E-03 T: 4.5800E-04 Err: 1.4012E-04

level:  8 N:    257         Erat: 1.9158E+00 log2: 9.4022E-01   Trat: 4.9945E-01 log2:-1.0040E+00   h: 3.9062E-03 T: 9.1700E-04 Err: 7.3140E-05

level:  9 N:    513         Erat: 4.1756E+00 log2: 2.0670E+00  Trat: 4.9541E-01 log2:-1.0158E+00   h: 1.9531E-03 T: 1.8510E-03 Err: 1.7516E-05

level: 10 N:   1025       Erat: 7.2343E-02 log2:-3.7982E+00   Trat: 4.9879E-01 log2:-1.0059E+00   h: 9.7656E-04 T: 3.7110E-03 Err: 2.4212E-04

level: 11 N:   2049       Erat: 2.5629E-01 log2:-1.9689E+00   Trat: 5.0244E-01 log2:-9.9539E-01    h: 4.8828E-04 T: 7.3860E-03 Err: 9.4472E-04

level: 12 N:   4097       Erat: 1.7760E-01 log2:-2.4993E+00   Trat: 5.0214E-01 log2:-9.9624E-01    h: 2.4414E-04 T: 1.4709E-02 Err: 5.3192E-03

level: 13 N:   8193       Erat: 4.2768E-02 log2:-4.5584E+00   Trat: 4.9699E-01 log2:-1.0111E+00   h: 1.2207E-04 T: 2.9596E-02 Err: 1.2437E-01

level: 14 N:  16385      Erat: 3.9379E+00 log2: 1.9822E+00 Trat: 1.8995E+00 log2: 9.2786E-01   h: 6.1035E-05 T: 1.5581E-02 Err: 3.1584E-02

level: 15 N:  32769      Erat: 7.9818E-03 log2:-6.9860E+00   Trat: 4.9861E-01 log2:-1.0065E+00   h: 3.0518E-05 T: 3.1249E-02 Err: 3.9569E+00

level: 16 N:  65537      Erat: 9.9904E-01 log2:-1.3962E-03    Trat: 4.9888E-01 log2:-1.0057E+00   h: 1.5259E-05 T: 6.2638E-02 Err: 3.9608E+00

level: 17 N: 131073     Erat: 1.0020E+00 log2: 2.8946E-03   Trat: 4.9883E-01 log2:-1.0058E+00   h: 7.6294E-06 T: 1.2557E-01 Err: 3.9529E+00

level: 18 N: 262145     Erat: 1.0027E+00 log2: 3.9606E-03   Trat: 5.0204E-01 log2:-9.9654E-01    h: 3.8147E-06 T: 2.5012E-01 Err: 3.9420E+00

level: 19 N: 524289     Erat: 1.0006E+00 log2: 9.2084E-04   Trat: 5.0447E-01 log2:-9.8954E-01    h: 1.9073E-06 T: 4.9581E-01 Err: 3.9395E+00

level: 20 N:1048577    Erat: 9.9286E-01 log2:-1.0367E-02    Trat: 4.8770E-01 log2:-1.0385E+00   h: 9.5367E-07 T: 1.0166E+00 Err: 3.9679E+00

λ1 = -2 ( 1 + cos )  ≈ -4 (for N sufficiently large so that π/N is roughly 0 and cos   ≈ 1)

λN = -2 ( 1 + cos )  ≈ -2 ( 1 + cos π ) = -2 { 1 + (-1 + + …) ) ≈  - h2

Yielding a condition number of roughly, which for above N=129 (h = .0078125), gives an estimate of the propagated round-off error beyond the single-precision machine round-off.

Table 2: bvp1d using DGTSV from LAPACK, Double Precision, Tridiagonal System Solver

(shaded values  [level≥14] look to be contaminated by ill-conditioning and round-off)

Error Ratio          Error Exponent   Work Ratio        Work Exponent       Grid Space H     Time        True Error

level:  4 N:     17                                                                                                                                  h: 6.2500E-02 T: 8.3000E-04 Err: 8.8841E-03

level:  5 N:     33          Erat: 3.9637E+00 log2: 1.9916E+00 Trat: 1.1857E+02 log2: 6.9063E+00 h: 3.1250E-02 T: 7.0000E-06 Err: 2.2414E-03

level:  6 N:     65          Erat: 3.9753E+00 log2: 1.9959E+00 Trat: 6.3636E-01 log2:-6.5366E-01    h: 1.5625E-02 T: 1.1000E-05 Err: 5.6383E-04

level:  7 N:    129         Erat: 3.9860E+00 log2: 1.9998E+00 Trat: 5.5000E-01 log2:-8.6459E-01    h: 7.8125E-03 T: 2.0000E-05 Err: 1.4145E-04

level:  8 N:    257         Erat: 3.9926E+00 log2: 2.0022E+00 Trat: 5.4054E-01 log2:-8.8968E-01    h: 3.9062E-03 T: 3.7000E-05 Err: 3.5428E-05

level:  9 N:    513         Erat: 3.9962E+00 log2: 2.0035E+00 Trat: 5.2113E-01 log2:-9.4257E-01    h: 1.9531E-03 T: 7.1000E-05 Err: 8.8655E-06

level: 10 N:   1025       Erat: 3.9981E+00 log2: 2.0041E+00 Trat: 5.0714E-01 log2:-9.8191E-01    h: 9.7656E-04 T: 1.4000E-04 Err: 2.2174E-06

level: 11 N:   2049       Erat: 3.9990E+00 log2: 2.0045E+00 Trat: 5.0360E-01 log2:-9.9206E-01    h: 4.8828E-04 T: 2.7800E-04 Err: 5.5449E-07

level: 12 N:   4097       Erat: 3.9995E+00 log2: 2.0047E+00 Trat: 5.0090E-01 log2:-9.9982E-01    h: 2.4414E-04 T: 5.5500E-04 Err: 1.3864E-07

level: 13 N:   8193       Erat: 4.0004E+00 log2: 2.0050E+00 Trat: 5.0226E-01 log2:-9.9589E-01    h: 1.2207E-04 T: 1.1050E-03 Err: 3.4657E-08

level: 14 N:  16385      Erat: 3.9926E+00 log2: 2.0022E+00 Trat: 5.0023E-01 log2:-1.0018E+00   h: 6.1035E-05 T: 2.2090E-03 Err: 8.6803E-09

level: 15 N:  32769      Erat: 4.1033E+00 log2: 2.0417E+00 Trat: 4.9763E-01 log2:-1.0093E+00   h: 3.0518E-05 T: 4.4390E-03 Err: 2.1154E-09

level: 16 N:  65537      Erat: 3.3954E+00 log2: 1.7678E+00 Trat: 4.5538E-01 log2:-1.1376E+00   h: 1.5259E-05 T: 9.7480E-03 Err: 6.2304E-10

level: 17 N: 131073     Erat: 6.0395E-01 log2:-7.2927E-01    Trat: 4.5194E-01 log2:-1.1486E+00   h: 7.6294E-06 T: 2.1569E-02 Err: 1.0316E-09

level: 18 N: 262145     Erat: 7.1720E-01 log2:-4.8071E-01    Trat: 4.7850E-01 log2:-1.0660E+00   h: 3.8147E-06 T: 4.5076E-02 Err: 1.4384E-09

level: 19 N: 524289     Erat: 7.9064E-01 log2:-3.3973E-01    Trat: 4.9784E-01 log2:-1.0087E+00   h: 1.9073E-06 T: 9.0543E-02 Err: 1.8192E-09

level: 20 N:1048577    Erat: 4.7141E-01 log2:-1.0876E+00   Trat: 4.9370E-01 log2:-1.0207E+00   h: 9.5367E-07 T: 1.8339E-01 Err: 3.8591E-09

Working in double precision, we see continued, validated error decline through  N = 8193.  The eigenvalues (and condition number of the matrix) have not changed from the single precision case above.  So for N=8193, (h=1.22e-4), we have the propagated error from the marginally conditioned matrix problem, 4/h^2, is roughly a factor of  4.e+8, (400000000), multiplied by double precision round-off error and by the discretization error of the underlying grid approximation.