CS 575 Supercomputing - Lecture Outline
Chapter 7: Eliminating Clutter

15October03

Dr. Kris Stewart (stewart@rohan.sdsu.edu)
San Diego State University

This URL is http://www.stewart.cs.sdsu.edu/cs575/lecs/ch7.html

Chapter 7: Eliminating Clutter

Things that contribute to overhead

Things that restrict compiler flexibility

Quoting our text (p. 128) "It's not a mistake that some of the same items appear in both lists."

Chapter 6 covered tools for runtime profiling of the behavior of the user source code. Chapter 7 points out user coding to avoid since it can defeat the possible optimizations a compiler would do for you.

Tools to help look at structure

f90 -Xlist f_diffusion.f sgeco.f sgesl.f blas.f
produces a listing

How do we obtain a similar view of your code when using C?
cflow, cscope, dbx, gdb have been suggested.

Your suggestions?


How do we tie together the compiler (user source code) and the assembly language (the computer code)?
Use the -S option for cc or f90 to obtain the Sun Sparc Assembler Language, RISC language, corresponding to the translation of the user source code.

Some examples of user source code and the possible impact on producing optimal run-time performance by complicating the compiler job to optimize the source code.

Subroutine Calls

Consider
Do I=1,N
	A(I) = A(I) + B(I) * C
ENDDO
or the nested call
DO I=1,N
	CALL MADD (A(I), B(I), C)
ENDDO
SUBROUTINE MADD (A, B, C)
A = A + B*C
RETURN
END
This is an example of a lot of subroutine overhead to accomplish a simple task.

Another example is the use of COMMON blocks in Fortran, typically used for global values.

COMMON /USELESS/ K
DO K=1,1000
	IF (K .EQ. 1) CALL AUX
ENDDO

p. 131 text "Remember, if the function or subroutine does a reasonable amount of work, procedure call overhead isn't going to matter very much."

Macros

Macros are used in C and Fortran and involve pattern matching.
#define average (x,y) ( (x+y)/2 )
main()
{
	float q=100, p=50;
	float a;
	a = average (p,q);
	printf ("%f\n",a);
}
The C preprocessor cpp expands all #define statement inline (as patterns).
a = average (p,q);    is replaced by    a = ((p+q)/2);
A more problematic example:
#define multiply (a,b) (a*b)     invoked as   

c = multiply (x+t,y+v)
                                 results in
x+t*y+v                          
                                 not the intention
Oh for a missing set of parenthesis.

Procedure inlining

Some compilers are capable of inlining subroutine and function declarations - providing the modular code (programmer's prefer) with efficiency that a compiler may be able to provide. The Cray T90 provides extensive capabilities for this. Rohan's F90 man page refers to inlining (and in the acknowledgements at the very end states that Sun's F90 compiler is "derived from Cray CF90(tm), a product of Cray Resarch, Inc."

Branches

Keep to an absolute minimum.

Branches within Loops

A dusty deck example:
PARAMETER (SMALL = 1.0E-20)
DO I=1,N
	IF (ABS(A(I)) .GE. SMALL) THEN
		B(I) = B(I) + A(I) * C
	ENDIF
ENDDO
In the olden days, the cost of a floating point multiply was so extensive that in a lengthy loop (large N), you would pay the cost of the comparison and absolute value to avoid it.

Loop Invariant Conditionals

DO I=1,K
	IF (N .EQ. 0) THEN
		A(I) = A(I) + B(I) * C
	ELSE
		A(I) = 0.
	ENDIF
ENDDO 
invariant N will not change in the loop, therefore the result of the test on N (setting the value of A) won't change. Recast the loop by making the text outside the loop and replacing the loop body twice for each case:
IF (N .EQ. 0) THEN
	DO I=1,K
		A(I) = A(I) + B(I)*C
	ENDDO
ELSE
	DO I=1,K
		A(I) = 0
	ENDDO
ENDIF 
Much easier for the compiler to pipeline since the IF-statement is outside the loop.

Loop Index Dependent Conditionals

DO I=1,N
   DO J=1,N
      IF (J .LT. I)
         A(J,I) = A(J,I) + B(J,I)*C
      ELSE
         A(J,I) = 0.0
      ENDIF
   ENDDO
ENDDO 

These nested loops easily predict when J < I, so having a logical test is unneeded. Better to write code as:

DO I=1,N
   DO J=1,I-1
         A(J,I) = A(J,I) + B(J,I)*C
   ENDDO
   DO J=I,N
         A(J,I) = 0.0
   ENDDO
ENDDO

Independent Loop Conditionals

Dependent Loop Conditionals

Reductions

Conditionals that Transfer Control

Other Clutter

Doing your own Common Subexpression Elimination

Handling Array Elements in Loops

Return to CS 575 Homepage