return to Kris' Home Page
This URL is stewart.sdsu.edu/super92.html
04June2022

Supercomputing & Undergraduate Education at the San Diego Supercomputer Center
Poster Presentation at Supercomputing '92
Minneapolis, Minnesota; November 16-20, 1992

Dr. Kris Stewart
Associate Professor, Dept. of Mathematical Sciences
Senior Fellow, San Diego Supercomputer Center
San Diego State University San Diego, CA 92182-0314
stewart@sdsu.edu
Dec. 14, 1992 (update)

SUE, Supercomputing and Undergraduate Education, was funded primarily by the Division of Advanced Scientific Computing at the National Science Foundation. Additional funding was provided by The Cray Research Foundation.

A) Background on the grant
The San Diego Supercomputer Center (SDSC) has taken a proactive role in disseminating information to make supercomputing accessible to a much wider audience. In particular, SDSC has targeted instructors at undergraduate institutions to introduce their students to supercom- puters and their use. As part of this effort, SDSC established Super- computing and Undergraduate Education (SUE). This program enhances the supercomputing expertise of faculty and helps them incorporate supercomputing topics into their curricula and departmental majors.
A major component of this program is a one-week residential summer workshop at SDSC for faculty from primarily undergraduate institu- tions throughout the U.S. Lecture materials from both the 1991 and 1992 workshops can be obtained via anonymous FTP over the Internet (See handout for details).
Another component of this program is an annual course taught at San Diego State University (SDSU), in which undergraduate students learn about supercomputers and their use. This course is described in the SDSU catalog as follows:

 CS 575 - Supercomputing for the Sciences
 Interdisciplinary course intended for all science and engineering
 majors.
 o Advanced computing techniques developed for supercomputers
 o Overview of architecture, software tools, scientific
   computing and communications
 o Hands-on experience using supercomputers
 
 Prereq.: Extensive programming background in Fortran or C.
 
This course has been taught twice.

B) Major Themes of the Program
SUE's 1991 and 1992 faculty workshops at SDSC and the undergraduate curricula at SDSU focused on the following topics in supercomputing.
1 Interdependence of Computer Science & Scientific Experts
The workshop faculty represented two groups: those interested in learning discipline-specific applications packages and those inter- ested in using software tools to facilitate programming. To accommo- date both groups, we presented an overview of available resources (applications packages and program optimization techniques) and encouraged faculty to seek further information independently. We identified the technical people they could contact for further infor- mation on their particular interests.
Similarly, the required programming background for the course rein- forced the traditional separation between computer science and sci- ence/engineering students by encouraging the former to take the course. In both cases, the participants interacted well as they real- ized the benefits of working with others with different strengths.
2 CRAY architecture
The course used the text Computer Architecture: A Quantitative Approach, by Hennessy and Patterson (published by Morgan Kaufmann). The sections we used from this text proved invaluable.
We tried to gain an appreciation for the sources of the Cray Y-MP's power and understand the subtleties of its design as they impact the way a programmer should approach a problem. Therefore, we carefully avoided many of the sections of the text that did not directly relate to the Cray Y-MP. (The course notes available via anonymous FTP can provide a guide for other instructors interested in using this text for a similar purpose.) This text was supplemented with documents by Cray Research, Inc.; particularly the following titles:
TR-OPT (Rev. D) cf77 & scc Features and Optimization. An excellent, Cray-specific training report covering the Fortran (cf77) and C (scc) compilers. Provides code examples, diagrams, and explanations of crucial vectorization topics (and the conditions that inhibit the compiler), memory organization, performance tools, common optimiza- tion techniques, and much more.
TR-YSAAP Cray Y-MP System Architecture for Applications Programmers. Covers material at the assembler level in more detail. Works well with the models developed in Hennessy and Patterson (unfortunately this document is no longer available)
3 Architecture of parallel supercomputers (Intel iPSC/860 and nCUBE 2)
This was not covered in the undergraduate course at SDSU, but was covered at the faculty workshop by consultants from Intel and nCUBE, who gave introductory lectures on the parallel architectures.
4 Communications
Both the workshop and the course covered the resources available through the Internet, including:

     a.   Accessing sources of information
          (news groups, anonymous FTP):
               nnsc.nsf.net (NSF information site)
               oak.oakland.edu (simtel20 mirror)
               sumex-aim.stanford.edu (info-mac)
               ftp.cs.titech.ac.jp
                    (Dr. David Kahaner's Japan Bulletins)

     b.   Communicating with peers (via e-mail)

     c.   Using FTP to transfer programs between machines (a
               crude look at heterogeneous computing)

5 Accounting
At many universities, accounts issued to students in computer courses are limited only by the amount of available file space. Therefore, the concept of monitoring CPU usage can be new. The students in the SDSU class were given programming projects on a mainframe at SDSU and were introduced to the concept of accounting using crude UNIX timing tools (e.g., dtime) before they were moved to the Cray.
When they ran programs on the Cray, students and faculty had a fixed amount of CPU time to work with. Students were reminded that they must complete their course projects without exceeding their CPU allo- cations. The more sophisticated timing and resource monitoring tools were used to ensure that the student projects did not use up more time than they were allocated for the semester.
6 Computer ethics/responsibility
Students in the course chose between two computer-related scenarios and wrote a one-page essay on a scenario, discussing the behaviors of the individuals involved. The essays were graded on a pass/fail basis (only essays that showed a lack of thought or effort "failed"). The goal, for the instructor, was to gain a better understanding of the students' attitudes concerning computer-related issues.
Dr. Dan Sulzbach, Executive Director of SDSC, also gave a week of lectures on computer ethics, which stimulated some very interesting class discussions.
Dr. Sulzbach began his talk by stating:
 
 "I am not a philosopher, not an ethicist, not an expert. I am a
 computer professional like you. I'm not here to preach because I
 have no license or authority to do that. I hope only to raise some
 issues related to computer ethics. I will undoubtedly ask more
 questions than I answer."
Computer accounts on the Cray Y-MP were distributed only after the week-long discussions on ethics.

7 Writing Most of the students had no problem with the computer ethics essay, but they typically had little experience in writing up a science- oriented programming report. Therefore, we assigned a programming project that consisted of the following components:
a. Solve a problem on a campus mainframe and document the mainframe's performance.
b. Solve the same problem on the Cray.
c. Compute an enhancement of the problem on the Cray and document the Cray's performance.
Instructor feedback after reading the document from assignment (a) greatly enhanced the organization and content of the final document in (c).
See Section G SDSU course programming assignments in this handout for more details on the exact assignments. Section G also gives the instructor's clarifications and directions to students to help them specify and document their projects.

C) SUE Workshop Overview
This one-week workshop overviewed the many resources available at the San Diego Supercomputer Center. In general, he mornings involved pre- sentations by consultants from the following institutions:

     San Diego Supercomputer Center (SDSC)
     Cray Research, Incorporated (CRI)
     Intel Supercomputer Systems
     nCUBE Corporation
In the afternoons, software demonstrations or open labs were orga- nized. Due to the diversity of backgrounds of the workshop partici- pants, we covered a very wide variety of information at a basic level. Participants were encouraged to discuss the information in more detail with the consultants in the afternoon laboratory ses- sions. We also provided information on resources available over the Internet to enhance curriculum development in supercomputing.
Technical support was provided by

          Cray Research, Incorporated
          Intel Corporation
          nCUBE Corporation

D) SUE Workshop Schedule

                         MONDAY
 
 am  Welcome
     What is a Supercomputer? (Dan Sulzbach)
     Business (Kris Stewart)
          How to login, how to print
          DataTree file storage
          Effective use of resources/accounting
          (batch queues) at SDSC
     SDSC Cray User Guide
     CS 575: Supercomputing for the Sciences (Kris Stewart)
          Hennessy & Patterson text: Computer Architecture:
               A Quantitative Approach (Morgan Kaufmann)
          Responsibility and ethics
          Robbins & Robbins, Cray X-MP/Model 24 (Springer-Verlag)
     Cray TR-OPT examples
     Fortran/C exercises
     Dr. Lloyd Fosdick's HPSC Overview
 
 pm  Afternoon lab session
     Run Fosdick-HPSC ch. 7 examples.
     Run TR-OPT examples and use performance tools
     Run your own "student project" codes

                         TUESDAY
 
 Etan Scherzer (CRI instructor) covers TR-OPT
 
 TR-OPT is an extensive Cray software training workbook which is
 typically covered at a more leisurely pace over a one-week period.
 Etan will try to highlight crucial portions and will then be avail-
 able for individual discussions this afternoon and tomorrow.
 
 Etan will be available all afternoon to answer any Cray-specific
 questions.

                         WEDNESDAY
 
 am  Cray Application Packages (SDSC Consultants)
          Biology (Jack Rogers)
          Chemistry (Jerry Greenberg)
          CFD (Rich Charles)
          Math (Bob Leary)
 
 pm  "Teaching Chemistry" (Rozeanne Steckler)
     "Teaching Advanced Graphics" (Michael Bailey)
     VisLab reserved: (AVS, Insight)
          Intro to VisLab and Workstations

                         THURSDAY
 
 am  Access to Parallel Machines at SDSC
     nCUBE Introduction and Examples (Chuck Niggley,
          Consultant, nCUBE)
     Intel iPSC/860 (Dancil Strickland, Regional Parallel
          Systems Engineer, Intel Corp.)
 
 pm  Scalable Version of Wave Equation (Carl Scarbnick, SDSC)
     VisLab reserved
          Work through examples on the parallel machines.
          nCUBE and Intel consultants will be available for
               assistance.

                         FRIDAY
 
 am  Discussions
     What do you see as the future of high-performance
          computing?
     Parallel vs. vector
     HPSC Curriculum, Dr. Lloyd Fosdick, U. Colorado,
          Boulder
     Computational Science, Dr. Geoffrey Fox, Syracuse
          University
     Parallel Computing, Dr. Chris Nevison, Colgate
          University
     Different Curricula Orientations
          Discipline-specific
          B.S. in Computational Chemistry or Computational Science?

E) SUE workshop materials via anonymous FTP
Most of the files from the SUE workshop can be accessed via anonymous ftp. To access them, FTP to the host rohan.sdsu.edu then retrieve from the directories /pub/sdscinfo/SUE-notes, /pub/sdscinfo/SDSC- info-files or /pub/sdscinfo/Supercomputing-Course-Notes.
A short description of the individual files is contained at the end of this handout.

F) Lecture materials and readings
The sections from the following chapters of Computer Architecture: A Quantitative Approach (Hennessy & Patterson) were covered in the first nine weeks of lectures in the course. See the lecture notes available via anonymous FTP for more details on the sections covered.

Chap. 1  Fundamentals of Computer Design
Chap. 2  Performance and Cost
Chap. 3  Instruction Set Design: Alternatives and Principles
Chap. 4  Instruction Set Examples and Measurements of Use
Chap. 5  Basic Processor Implementation Techniques
Chap. 6  Pipelining
Chap. 7  Vector Processors
Chap. 8  Memory-Hierarchy
The goal was to understand the following advanced topics:

     Pipelined (segmented) functional units
     Chaining of functional units
     Memory bank conflicts
     Compiler capabilities

G) SDSU course programming assignments
As the lecture material was covered, students worked on their first two programming assignments. The goal was to develop a feeling for timings on the SDSU mainframe compared to accuracy of approximation schemes. Students were asked to assess how much it costs (measured in CPU time for now) to get a good answer (measured by true error). Since most students had little numerical analysis background, Dr. Stewart provided the original code for the "First Program" problem, and the students were instructed to insert the appropriate timing calls.

First Program 1991 Course

 Run and time a Fortran code that solves a two-point boundary value
 problem
               y''(t) + pi^2 y(t) = 0
          y(0) = 0                 y(1/2) = 1
 True solution: y(t) = sin (pi t)
 via finite differences.
 

The dimension of the approximation should be varied and you should time the separate pieces of the solution process. You should docu- ment the performance of the SDSU mainframe on this problem and dis- cuss the sensitivity in accuracy and timing
NOTE: This proved to be a somewhat confusing problem for students new to numerical analysis. There are too many sources of error. As the finite difference grid is refined, the approximation is more accurate. But the linear system that is solved becomes more poorly conditioned, thereby introducing errors.
 First Program   1992 Course
 
 Consider the linear system A x = b, where the N by N matrix A is
 given by
 
 aij = 1/(i+j-1)    (the notorious Hilbert matrix)
 
 The right-hand side, b, will be chosen so that the true solution,
 x, will be all 1s. Therefore,
 
 bj = S aij
 
 You should solve this linear system for various values of N and
 observe the error incurred (by computing true error, since we know
 the true solution should have X = 1s) and the performance (measured
 by the elapsed CPU time).

Main Project
Science-Oriented Program to be run on SDSU Mainframe and subsequently on the CRAY
The main project was crucial to this course. Students were going to use the Cray to run this project, and the instructor did not want them to squander Cray resources before they became familiar with their problems. Students were allowed to pick from a selection of problems provided by the instructor, or they could solve a scientific problem from their particular backgrounds or work environments.
 Good sources for problem statements were:
 
 "Computing Applications to Differential Equations: Modelling in the
 Physical and Social Sciences," by J.M.A. Danby; Reston Publishing
 
 "Numerical Methods and Software" by Kahaner, Moler and Nash; Pren-
 tice-Hall Publishers.
 
 "Computational Physics" by Koonin and Meredith; Addison Wesley Pub-
 lishing Company.

The MAIN PROJECT assignment:
a) Get your "science" program running on the SDSU mainframe.
b) Write a report describing your problem and your program's
   performance on the SDSU mainframe.
c) Get your program running on the Cray.
d) Extend your problem. For example, use a finer grid spacing
   or use more species in an interaction. (This will depend
   on your particular problem.)
e) Submit a final report on your Cray project.

Topics to be covered in your report:
a) Your write-up should have a self-contained statement of the problem. The reader should not have to read your code to find out what equations you are working with, or what the specific problem is that you are solving.
b) Give a complete reference to where the problem came from.
c) Define your measure of work so that comparisons can be made when you run on the Cray. You can't talk about "faster" or "better" with- out a specific measure of performance.
d) Discuss what conclusions are drawn from the problem itself. What is the "science" story revealed by the original problem? Why was this problem solved.
e) You should carefully organize the results. A summary of pertinent results for both the "science" of the problem and the "performance" of the program should be presented. Optionally, include an appendix for more detailed results.

H) SDSU course take-home exam

 Midterm Exam for the 1992 course
 
 This midterm focused on compiler terminology and related concepts
 from the Hennessy/Patterson text with the Cray document TR-OPT,
 including the following :
 
     Jamming Vectorizing Loops
     Separating Loops into Vectorizable and Nonvectorizable
          Iterations
     Linearizing Nested Loops
     Unrolling Loops (vertically and horizontally)

 Midterm Exam from the 1991 Course
 
 The midterm from the 1991 course asked students to write and time
 DLXV assembly code (developed in great detail in the Hennessy and
 Patterson text with numerous examples) for the translation of For-
 tran to perform the matrix/vector multiply, Ax = b, in two differ-
 ent manners.
 
 Row-oriented manner we learn first:
 
           = xj ,  j=1,...,N
 
 Column-oriented manner more suitable for vector processors:
 
          b  A   + ... + b  A   = x
           1  *1          N  *N
 
 This is a demanding problem, but most students gain a deep under-
 standing of the Cray's vector processor structure.

I) Other Educational Programs at SDSC
HPCC and K-8 Education - Jayne Keller, SDSC Education Coordinator (jaynek@sdsc.edu)
Integrating high-performance computing and communications (HPCC) into the curriculum of primary and secondary schools is critical to the development of the technicians, scientists, and engineers of the future. SDSC offers the following activities to address this need: Supercomputer Center field trip, HPCC half-day in-service workshop, the SDSC road show, and a technology checklist.
Computational Chemistry - Rozeanne Steckler, SDSC Manager, Applications R & D (steckler@sdsc.edu) SDSU Adjunct Professor, Chemistry
The SDSU course Chemistry 596: Chemistry on Supercomputers is designed as an overview of modern computational chemistry with an emphasis on learning to use the major chemistry software packages. This course is not designed as an introduction to theoretical chem- istry, but rather a course to introduce experimental chemists to the computational tools available and how to use them in an informed manner. Many aspects of computational chemistry will be introduced with each topic presented in coordinated lectures and labs.
Computer Graphics - Michael J. Bailey, SDSC Manager of Scientific Visualization (mjb@sdsc.edu)
The UCSD course AMES 293 Advanced Computer Graphics for Engineers and Scientists is targeted towards students in engineering or science majors who are interested in applying advanced visualiza- tion techniques to solving scientific problems. It is not oriented towards any one major in particular, but is instead directed towards science in general. Students in this course will learn techniques that will allow them to develop and use scientific graphics programs effectively.
Research Experience for Undergraduates at the San Diego Supercomputer Center
Hassan Aref, SDSC Chief Scientist, and Rozeanne Steckler, SDSC Manager, Applications R & D (see above) SDSU Adjunct Professor, Chemistry
Students work on research projects in fields of interest within the disciplines that make up computational science. Supervisors are faculty at the student's home institution and SDSC staff. Included are workshops on high-performance computing and special lectures on such topics as parallel computing, graphics and scientific visual- ization, and numerical analysis. Some students, already engaged in computational research with a faculty member, select a topic within that project. Others, with an interest in a certain area, use the REU program to take the first steps. The objective is to give each student a taste of research in computational science, albeit within a condensed time frame. All students are given access to appropriate computer resources at SDSC.
CERFnet - Susan Estrada, Executive Director (estradas@sdsc.edu)
The California Education and Research Federation Network (CERFnet) provides a connection to the world via Internet giving access to hundreds of databases and over a million users worldwide. Its goal is to promote collaboration among scientists, engineers, and educa- tors in commercial, government, and academic sectors.
CERFnet provides a 24-hour hotline, continuous network monitoring and management, an expert staff, and maintenance support. Begun with the support from the National Science Foundation, CERFnet is a project of General Atomics, a San Diego-based research and develop- ment company.
Reuben H. Fleet Space Theater & Science Center: Project Oasis - Joseph Deken, Senior Fellow, SDSC (dekenj@sdsc.edu) Senior Scientist, RHF
The Reuben H. Fleet Space Theater and Science Center is one of the most highly respected informal science education centers in the nation. Located in Balboa Park's cultural complex, it houses the world's first OMNIMAX theater, more than 60 "hands-on" science exhibits which encourage visitor participation, as well as a multi- media planetarium show.
In July of 1992, SDSC and the Reuben Fleet Center launched a formal collaboration called Project Oasis. The focus of this collaboration is twofold:
1 To develop interactive exhibits and educational programs about high-performance computing and communications for the general public.
2 To develop new technology for interactive exhibits and educa- tional programs using leading edge computing and communications systems, especially computer networking and visualization.
As part of Project Oasis activities, two SDSC consultants gave lec- tures at the Reuben H. Fleet Space Theater & Science Center recently:
"Computer Visualization I: The Solar System" - by Dave Nadeau, Visualization Specialist at SDSC
"Computer Visualization II: The Antarctic Seafloor" - by Jim McLeod, Visualization Specialist at SDSC

                           Overview of SUE
                                  
       Kris Stewart, Assoc. Prof., SDSU (stewart@cs.sdsu.edu)
                 Senior Fellow, SDSC (619) 942-1012
                                  
This file (readme.SUE) presents a description of the files
distributed at the 1991 Supercomputer and Undergraduate Education
(SUE) Workshop.  These files are in /pub/sdscinfo/SUE-notes anonymous
FTP from rohan.sdsu.edu

     Access and use of documents statement        (accesuse.asc)
     Schedule of events for 1992 SUE workshop     (agenda.92)
     Brief overview of SDSC accounting and SUE resources
                                                  (accounti.sds)
     Obtaining Cray documents (cost)              (cray-man.cst)
     Overview of Dr. Lloyd Fosdick's HPSC program from U. Colorado
Boulder.  This is a terrific program in undergraduate High
Performance Scientific Computing.                 (hpsc-ks.rme)
     Intuitive introduction to ODEs, Euler's method, SAXPY and their
connection with vector operations. This fits well with Chapter 7 from
Hennessy and Patterson.                           (my-chap7.asc)
     Kay A. Robbins and Steven Robbins wrote a book that serves as a
good teaching tool for understanding the details of the Cray at the
assembler level. The Cray X-MP/Model 24: A Case Study in Pipelined
Architecture and Vector Processing was published by Springer-Verlag
in 1987.                                          (xmpsim1.asc)

1. Organization of CS 575 Supercomputing for the Sciences, taught at
San Diego State University, Spring 1991 and 1992, by Kris Stewart

     a. Outline of course                    (575ovrvw.s92)
     b. Initial handout to students          (575init.asc)
     c. List of student programming projects - most final reports and
source code are available as tar files       (575proj.asc)
     d. A road-map through the lecture notes and how they were used
in the CS 575 course                         (readme.575)

2. Actual course notes for CS 575 based on the text, Computer
Architecture: A Quantitative Approach by John Hennessy and David
Patterson, Morgan Kaufmann Publisher, 1990, coupled with handouts
from Cray Research Incorporated from two documents, TR-OPT and TR-
YSAAP.  The files associated with the Patterson and Hennessy text are
named ph-something.asc, those associated with Cray documents are name
cray-something.asc.  There are many files - see readme.575 in the
anonymous ftp directory /pub/sdscinfo/Supercomputing-Lecture-Notes
from rohan.sdsc.edu

     Of particular interest are the files tr-opt3.asc and tr-opt7.asc
Chapter 3 of the CRI document TR-OPT presents an overview of vector-
ization terminology and examples.  This includes a Fortran code which
is useful for showing different types of loops and how the compiler
identifies them in the output listing.  This code should only be com-
piled not executed since the arrays involved are never initialized.
Chpater 7 of TR-OPT presents the fundamental optimization techniques.
The file tr-opt7.asc contains Fortran code that should be executed
since timing statistics are collected to demonstrate the effects of a
programmer's source code on the Cray's performance and the ability of
the Fortran compiler to automatically optimize source code.

3. Computer Ethics and Responsibility Section. It was felt that
before students were given access to the Cray Y-MP it was essential
to have an explicit discussion of "ethics" and "responsibility"
coupled with a written essay assignment.

     a. Computer ethics assignment (four scenarios)
                                             (ethicasg.asc)

     b. Handout of the lecture given by Dan Sulzbach  (ethic-l2.asc)

4. Using the Cray - note students will have spent 6 weeks programming
on the SDSU mainframe in Unix prior to moving to the Cray.            
                                             (readme.3rd)
     a. Initial handout to students on Cray use (crayinit.s92)
        This handout also discusses the files:
               crayfopt.asc (examples of Fortran optimization)
               crayc-ex.asc (samples of c codes and techniques)
               my-optim.tar (sample tar file for students to
                              to use to become familiar with tar)

     b. Man pages for Cray Fortran (cf77, fpp, fmp, cft77) compiling
          environment                        (crayacce.asc)

     c. Location of sample codes and how to create sample run of      
          Fortran to get listing, marked loops and diagnostics
               (cf77 -ZV -Wf"-emx")          (crayacce.asc)

     d. Man pages for cc and cl for C compiler with listing
                                             (crayacce.asc)

     e. As in c) above for C to get listing (cl) and diagnostics
               (cc -h report=vsi)            (crayacce.asc)

I recommend that instructors take extra time to explain reslist (a
relatively expensive command which gives students information on
their remaining resources), ja (an inexpensive Unix system call with
various parameters) and the NQS batch system.  You are charged double
for all interactive jobs on the Cray at SDSC.  Students usually are
not familiar with using Batch Queues, which can reduce the charges on
a job to 0.5 times actual use (therefore a four-fold decrease over
interactive user).  Students need to become experienced with these
queues to effectively use their finite amount of Cray time.

SDSC Documents (note new users of the SDSC Cray will receive the SDSC
User Guide. You can obtain additional copies of the User Guide
through the doc processor on the Y-MP. Type doc. The file you want is
usrguide.  This is a very long file, so I would not recommend getting
the whole thing.  The individual chapters of the User Guide are
available as separate files, e.g. ugoptim, ugtools, ugunicos, etc).
Other files available from the doc processor (and in the directory
/pub/sdscinfo/SDSC-info-files anonymous FTP from rohan.sdsu.edu):

     f. Introduction to UNICOS                    (unicos)

     g. EZFortran, EZC, EZDebug              (ezfortrn, ezc, ezdebug)

     h. EZStorage (DTI, data tree documentation)  (ezstorag)
        EZBatch (NQS, Networking Queueing System) (ezbatch)

     i. EZMath (math libraries available)         (ezmath)
        EZGraphics (graphics capabilities)        (ezgrphcs)

     j. EZShell (for those brave Unix hackers who want to customize
          their environment)                      (ezshell)
        EZTools (make, fmgen, fsplit, tar and cpio) (eztools)

     k. Optimization - an excellent document, oriented towards
          Fortran, on how to produce optimized Fortran source code on
          the Cray Y-MP                           (optimiz)

                  Overview of CS 575 Lecture Notes
                                  
       Kris Stewart, Assoc. Prof., SDSU (stewart@cs.sdsu.edu)
                 Senior Fellow, SDSC (619) 942-1012
                                  
     This file (readme.2nd) presents a detailed overview of the
actual lectures of the CS 575 course.  These files are in
/pub/sdscinfo/Supercomputing-Course-Notes anonymous FTP from
rohan.sdsu.edu

     The files fall into three classiciations:

a) course information and additional examples from the instructor

b) those related directly to the Hennessy & Patterson text describing
which sections/topics/concepts were used from each chapter

c) xeroxed copied of pages from the Cray documents TR-OPT
                                  
                                  
                  Info and Examples from instructor

575init.asc    Initial handout given to students the first day of
               classes.

charac.asc     Handout given to students the second week as we
               discussed "What is a Supercomputer?".  These were
               notes taken from Dr. Dan Sulzbach talk at the SDSC
               Summer Institute in 1990.

assignmt.asc   Describes the programming assignments students were
               asked to complete during the semester.

my-chap7.asc   I wrote this section to try to motivate the idea of
               vector registers and operations from the point of view
               of science. A major computation performed repeatedly
               in scientific computation involves solving ordinary
               differential equations (ODE). Presents the idea of an
               ODE as a vector system of equations and shows how
               Euler's method can be visualized as doing simple
               vector operations, a saxpy with scalar h and vectors
               y/current, f/current and y/next. Although students do
               not have a deep background in numerical analysis,
               this has been successful in relating the saxpy to
               scientific computation at an intuitive level.

optimiz.doc    This is an SDSC document available via the doc
               processor on the Cray Y-MP. This was FTPed from
               the Cray to the SDSU mainframe and students students
               were encouraged to obtain their own copy.

crayfopt.asc   I coded up the examples from the SDSC Optimization
               document.  These code are presented in the appendix
               of that document and available on the Cray.  This
               handout has details on accessing the codes, untarring
               the codes, running the make utility to compile with
               various Fortran optimization flags on or off

crayc-ex.asc   Handout on C codes and how to rewrite them to improve
               performance.  These are concepts discussed in TR-OPT
               which I coded up to give students examples of C codes
               and the Cray tools to analyze their performance. These
               were provided by Etan Scherzer, CRI.

xmpsim.asc     The text The Cray X-MP/Model 24, A Case Study in
               Pipelined Architecture and Vector Processing by
               Kay A. Robbins and Steven Robbins (Springer-Verlag) is
               an excellent source for explanations and examples of
               the performance of the Cray at the assembler level.
                                  
                                  
         Details on sections used from Hennessy & Patterson

ph-intro.asc   Introductory discussion of the aims and orientation of
the course and the use of the text Computer Architecture: A
Quantitative Approach by Hennessy and Patterson (Morgan-Kaufmann,
Pub., San Mateo, CA)

ph-chap1.asc   This chapter establishes a definition of performance,
presents Amdahl's Law, and defines and uses terms such as latency and
throughput.  I added Gantt charts to illustrate the assembly line
example to agree with later discussions of pipelined CPUs.
(the handout cray-arc.asc from TR-OPT fits in well here also)

ph-chap2.asc   We are only interested in performance in this chapter.
The treatment of cost is oriented toward someone designing a computer
architecture. We are CONSUMERS of an architecture, not its designer
in this course (of course it's a very complex architecture we are
studying). This chapter discusses MIPS, MFLOPS and their limitations
as measures of performance.

ph-chap3.asc   Really only interested in Section 3.7 - The Role of
High-Level Languages and Compilers. The Cray Y-MP is a register-
register machine. There is a nice example using a graph coloring
algorithm for register allocation. Students should try to develop an
intuitive idea of what the compiler really does for them. The
compiler is the major software tool that aids in effective use of the
supercomputer.  (the handout cray-com.asc fits well here)

ph-chap4.asc   This chapter presents and discusses instruction sets
for the VAX, IBM 360/370, Intel 8086 and DLX.  DLX is the
architecture the book develops for all its examples and is our only
interested in this chapter. Section 4.5 present the DLX instruction
set and gives examples of its use. In Chapter 6, pipelining will be
presented using this instruction set. This is an important concept
for understanding the performance of the Cray. In Chapter 7, this
instruction set is extended to DLXV to implement vector processing
(and descriptions of chaining, vector stride, strip mining vector
loops, and more). So it is essential to work through lots of examples
using DLX so that students are comfortable with the material in
Chapter 6 and the extensions in Chapter 7.  (add cray-scl.asc here)

ph-chap5.asc   This chapter establishes the basic steps of execution:
instruction fetch, instruction decode and register fetch, execution
of effective address, memory access and branch completion, write-back
step. Students probably don't realize that at the machine (or
assembler) level there are many tasks to be performed to accomplish
something as simple as

                    ADD R1,R2,R3

This will be important when Chapter 6 discusses pipelining. (add the
handout cray-con.asc here)

ph-over6.asc   Chapters 6 and 7 are the main goal of the course
lecture material. The concepts of:
     pipeline operation of segmented scalar computational
          units
     vector registers
     segmented vector functional units and chained pipelines
     vector stride, memory bank conflicts and stalls in the
          pipe
     strip mining vector loops
are covered in these two chapters. Our recurring example will be the
saxpy. We first examine Chapter 6 and its sections:

     6.1 What is pipelining?
     6.2 The basic pipeline for DLX
     6.3 Making the pipeline work
     6.4 The major hurdle of pipelinig - hazards
          Introduces forwarding which is crutial for
          understanding chaining of functional units
     6.6 Extending the DLX pipeline to handle multicycle
          operations
     6.8 Advanced pipeling - taking advantage of more
          instruction-level parallelism
     6.12 Historical perspective and references

Important homework problems: 6-11 to 6-19 on saxpy

ph-over7.asc   This chapter presents the DLVX instruction set that is
used in the examples and homework problems (add handouts cray-ch3.asc
and tropt-3.asc here).

phover7b.asc   This covers section 7.6 of the text - Enhancing Vector
Performance.  This discusses the VERY important concept of chaining.
Important homework problems: 7-1 to 7-6, 7-8 to 7-10. Most of these
were worked in lecture so students would get lots of practice with
timings.  (add handout cray-ch7.asc and tropt-7.asc here)

phover7c.asc   The real guts of the classes - end of Chapter 7 of
Hennessy and Patterson

7.7 Putting it all together: Evaluating the performance of vector
processors
7.8 Fallacies and pitfalls - interesting reading
7.9 Concluding remarks - interesting reading
7.10 Historical perspective and references. VERY interesting reading

            Pages from Cray documents TR-OPT and TR-YSAAP

cray-arc.asc   These pages included a diagram of the hardware
components of the Cray Y-MP, diagram of the 8 CPUs and their memory
and communcations sections, a table of the 14 separate functional
units with the registers they use and time (in clock periods) and a
block diagram of a single CPU.

cray-com.asc   These pages included a quick look at the Fortran
Compiling System (cf77) and the Standard C system. This presented
some standard ideas for scalar optimization. These are operations
performed by the compilers students are accustomed to using on the
scalar machines at SDSU, e.g. expression reordering, constant
folding, common subexpression elimination). This handout also
discusses the phases of the compilation process: source statement
processing, scalar optimization, vectorization, code generation.

cray-scl.asc   These pages covered another look at the registers,
functional units and memory access paths for the Cray Y-MP.

cray-con.asc   These pages cover the control section of the CPU.

cray-ch3.asc   Students were given copies of Chapter 3 of TR-OPT.
This chapter presents a view of Vectorization on the Cray Y-MP.  It
has many examples and I found it very readable.

tropt-3.asc    I coded up a Fortran code from the examples in Chapter
3 and we went over this in class.  I also showed students the
tropt3.m file which was produced by cf77's Fortran vectorization
preprocessor (fpp). This gets students accustomed to the powerful
tools provided on the Cray to aid in optimizing codes.
cray-ch7.asc   Students were given copies of Chapter 7 of TR-OPT.
This chapter presents a vector of Common CPU optimization techniques.

tropt-7.asc    I coded up Fortran code from the examples in Chapter 7
and ran before and after timings for the standard (original) coding
and the optimized (modified) coding. Also there were two setting used
in compiling the code - one with all optimization turned off and one
with standard optimization.  So there are four sets of execution
times to be examined.  The base case is original/no-optimization, but
students can gain a feeling for how smart the cf77 compiling
environment is by comparing the execution times for original with
optimization-on.  The source code (tropt7.f) and tropt7.m which is
the translated code from fpp are both included.

                    Actually running on the Cray
                                  
       Kris Stewart, Assoc. Prof., SDSU (stewart@cs.sdsu.edu)
                 Senior Fellow, SDSC (619) 942-1012
                                  
This file (readme.3rd) provides detail pertinent to actually running
codes on the Cray Y-MP at SDSC.  These files are in
/pub/sdscinfo/Supercomputing-Course-Notes anonymous FTP from
rohan.sdsu.edu

Students received their own copy of SDSC User's Guide. This is an
excellent document written by the SDSC consultants giving a thorough
introduction to the Cray, its programming tools, applications
packages and more.  Given human nature, the User's Guide's
thoroughness tends to dissuade users from just sitting down and
reading it cover to cover. I therefore e-mailed students a copy of

sdsc-gid.asc   A Road Map for the SDSC User's Guide highlighting
portions students should pay particular attention to.

crayacce.asc   This gives information which instructors could obtain
and then make available to their students on their home machines.
Includes especially useful man pages, the location on the Cray Y-MP
of some sample Fortran and C codes used in TR-OPT.  Also gives my
recommended calling options for cf77 and cc/cl for generating
informative examples and listings.

Students were given the location of crayfopt.asc and cray-exc.asc on
our home machine since they are very long.  They contain detailed
information that students could examine to familiarize themselves
with the SDSC Cray tools without using up their own CPU allocation.

Since students will have a finite amount of CPU time on the Cray, I
tried to provide a lot of information that most users would obtain in
initial explorations on a new machine.  I wanted to avoid having 30
separate students repeat the same explorations. Therefore, students
were e-mailed a copy of

initcray.asc   a quick update on accessing the Cray from SDSU.
Highlights the Unix processors on the Cray and other SDSC specific
things - like DTI - that Cray users need to know about.

I think most people will find the initcray.asc file informative,
though parts of it are specific to accessing the SDSC Cray from
facilities at SDSU. It has hints on Cray usage. It presents a log of
an actual session, so that you can see how the logon process
proceeds.  The Cray news file is read. The doc processor is used to
find the file optimiz which is a very informative SDSC Fortran
document).  This file is copied to a local file (since we see that it
is approximately 100 pages long) so that you can edit it, or download
it to another machine, or whatever.

initcray.asc has names for the off-site printers (at SDSU) that you
can have your output sent to. If you generate output on the Cray and
do not specify that it be routed to a machine to your particular
site, then the output will be sent to you via U.S. mail. This will
take a couple of days.
return to Kris' Home Page