Dr. Kris Stewart
Associate Professor, Dept. of Mathematical Sciences
Senior Fellow, San Diego Supercomputer Center
San Diego State University
San Diego, CA 92182-0314
stewart@sdsu.edu
Dec. 14, 1992 (update)
SUE, Supercomputing and Undergraduate Education, was funded primarily by the Division of Advanced Scientific Computing at the National Science Foundation. Additional funding was provided by The Cray Research Foundation.
A) Background on the grant
The San Diego Supercomputer Center (SDSC) has taken a proactive role
in disseminating information to make supercomputing accessible to a
much wider audience. In particular, SDSC has targeted instructors at
undergraduate institutions to introduce their students to supercom-
puters and their use. As part of this effort, SDSC established Super-
computing and Undergraduate Education (SUE). This program enhances
the supercomputing expertise of faculty and helps them incorporate
supercomputing topics into their curricula and departmental majors.
A major component of this program is a one-week residential summer
workshop at SDSC for faculty from primarily undergraduate institu-
tions throughout the U.S. Lecture materials from both the 1991 and
1992 workshops can be obtained via anonymous FTP over the Internet
(See handout for details).
Another component of this program is an annual course taught at San
Diego State University (SDSU), in which undergraduate students
learn about supercomputers and their use. This course is described
in the SDSU catalog as follows:
CS 575 - Supercomputing for the Sciences Interdisciplinary course intended for all science and engineering majors. o Advanced computing techniques developed for supercomputers o Overview of architecture, software tools, scientific computing and communications o Hands-on experience using supercomputers Prereq.: Extensive programming background in Fortran or C.This course has been taught twice.
B) Major Themes of the Program
SUE's 1991 and 1992 faculty workshops at SDSC and the undergraduate
curricula at SDSU focused on the following topics in supercomputing.
1 Interdependence of Computer Science & Scientific Experts
The workshop faculty represented two groups: those interested in
learning discipline-specific applications packages and those inter-
ested in using software tools to facilitate programming. To accommo-
date both groups, we presented an overview of available resources
(applications packages and program optimization techniques) and
encouraged faculty to seek further information independently. We
identified the technical people they could contact for further infor-
mation on their particular interests.
Similarly, the required programming background for the course rein-
forced the traditional separation between computer science and sci-
ence/engineering students by encouraging the former to take the
course. In both cases, the participants interacted well as they real-
ized the benefits of working with others with different strengths.
2 CRAY architecture
The course used the text Computer Architecture: A Quantitative
Approach, by Hennessy and Patterson (published by Morgan Kaufmann).
The sections we used from this text proved invaluable.
We tried to gain an appreciation for the sources of the Cray Y-MP's
power and understand the subtleties of its design as they impact the
way a programmer should approach a problem. Therefore, we carefully
avoided many of the sections of the text that did not directly relate
to the Cray Y-MP. (The course notes available via anonymous FTP can
provide a guide for other instructors interested in using this text
for a similar purpose.) This text was supplemented with documents by
Cray Research, Inc.; particularly the following titles:
TR-OPT (Rev. D) cf77 & scc Features and Optimization. An excellent,
Cray-specific training report covering the Fortran (cf77) and C (scc)
compilers. Provides code examples, diagrams, and explanations of
crucial vectorization topics (and the conditions that inhibit the
compiler), memory organization, performance tools, common optimiza-
tion techniques, and much more.
TR-YSAAP Cray Y-MP System Architecture for Applications Programmers.
Covers material at the assembler level in more detail. Works well
with the models developed in Hennessy and Patterson (unfortunately
this document is no longer available)
3 Architecture of parallel supercomputers
(Intel iPSC/860 and nCUBE 2)
This was not covered in the undergraduate course at SDSU, but was
covered at the faculty workshop by consultants from Intel and nCUBE,
who gave introductory lectures on the parallel architectures.
4 Communications
Both the workshop and the course covered the resources
available through the Internet, including:
a. Accessing sources of information
(news groups, anonymous FTP):
nnsc.nsf.net (NSF information site)
oak.oakland.edu (simtel20 mirror)
sumex-aim.stanford.edu (info-mac)
ftp.cs.titech.ac.jp
(Dr. David Kahaner's Japan Bulletins)
b. Communicating with peers (via e-mail)
c. Using FTP to transfer programs between machines (a
crude look at heterogeneous computing)
Dr. Sulzbach began his talk by stating: "I am not a philosopher, not an ethicist, not an expert. I am a computer professional like you. I'm not here to preach because I have no license or authority to do that. I hope only to raise some issues related to computer ethics. I will undoubtedly ask more questions than I answer."Computer accounts on the Cray Y-MP were distributed only after the week-long discussions on ethics.
7 Writing
Most of the students had no problem with the computer ethics essay,
but they typically had little experience in writing up a science-
oriented programming report. Therefore, we assigned a programming
project that consisted of the following components:
a. Solve a problem on a campus mainframe and document the mainframe's
performance.
b. Solve the same problem on the Cray.
c. Compute an enhancement of the problem on the Cray and document the
Cray's performance.
Instructor feedback after reading the document from assignment (a)
greatly enhanced the organization and content of the final document
in (c).
See Section G SDSU course programming assignments in this handout for
more details on the exact assignments. Section G also gives the
instructor's clarifications and directions to students to help them
specify and document their projects.
C) SUE Workshop Overview
This one-week workshop overviewed the many resources available at the
San Diego Supercomputer Center. In general, he mornings involved pre-
sentations by consultants from the following institutions:
San Diego Supercomputer Center (SDSC)
Cray Research, Incorporated (CRI)
Intel Supercomputer Systems
nCUBE Corporation
In the afternoons, software demonstrations or open labs were orga-
nized. Due to the diversity of backgrounds of the workshop partici-
pants, we covered a very wide variety of information at a basic
level. Participants were encouraged to discuss the information in
more detail with the consultants in the afternoon laboratory ses-
sions. We also provided information on resources available over the
Internet to enhance curriculum development in supercomputing.
Technical support was provided by
Cray Research, Incorporated
Intel Corporation
nCUBE Corporation
D) SUE Workshop Schedule
MONDAY
am Welcome
What is a Supercomputer? (Dan Sulzbach)
Business (Kris Stewart)
How to login, how to print
DataTree file storage
Effective use of resources/accounting
(batch queues) at SDSC
SDSC Cray User Guide
CS 575: Supercomputing for the Sciences (Kris Stewart)
Hennessy & Patterson text: Computer Architecture:
A Quantitative Approach (Morgan Kaufmann)
Responsibility and ethics
Robbins & Robbins, Cray X-MP/Model 24 (Springer-Verlag)
Cray TR-OPT examples
Fortran/C exercises
Dr. Lloyd Fosdick's HPSC Overview
pm Afternoon lab session
Run Fosdick-HPSC ch. 7 examples.
Run TR-OPT examples and use performance tools
Run your own "student project" codes
TUESDAY
Etan Scherzer (CRI instructor) covers TR-OPT
TR-OPT is an extensive Cray software training workbook which is
typically covered at a more leisurely pace over a one-week period.
Etan will try to highlight crucial portions and will then be avail-
able for individual discussions this afternoon and tomorrow.
Etan will be available all afternoon to answer any Cray-specific
questions.
WEDNESDAY
am Cray Application Packages (SDSC Consultants)
Biology (Jack Rogers)
Chemistry (Jerry Greenberg)
CFD (Rich Charles)
Math (Bob Leary)
pm "Teaching Chemistry" (Rozeanne Steckler)
"Teaching Advanced Graphics" (Michael Bailey)
VisLab reserved: (AVS, Insight)
Intro to VisLab and Workstations
THURSDAY
am Access to Parallel Machines at SDSC
nCUBE Introduction and Examples (Chuck Niggley,
Consultant, nCUBE)
Intel iPSC/860 (Dancil Strickland, Regional Parallel
Systems Engineer, Intel Corp.)
pm Scalable Version of Wave Equation (Carl Scarbnick, SDSC)
VisLab reserved
Work through examples on the parallel machines.
nCUBE and Intel consultants will be available for
assistance.
FRIDAY
am Discussions
What do you see as the future of high-performance
computing?
Parallel vs. vector
HPSC Curriculum, Dr. Lloyd Fosdick, U. Colorado,
Boulder
Computational Science, Dr. Geoffrey Fox, Syracuse
University
Parallel Computing, Dr. Chris Nevison, Colgate
University
Different Curricula Orientations
Discipline-specific
B.S. in Computational Chemistry or Computational Science?
E) SUE workshop materials via anonymous FTP
Most of the files from the SUE workshop can be accessed via anonymous
ftp. To access them, FTP to the host rohan.sdsu.edu then retrieve
from the directories /pub/sdscinfo/SUE-notes, /pub/sdscinfo/SDSC-
info-files or /pub/sdscinfo/Supercomputing-Course-Notes.
A short description of the individual files is contained at the end
of this handout.
F) Lecture materials and readings
The sections from the following chapters of Computer Architecture: A
Quantitative Approach (Hennessy & Patterson) were covered in the
first nine weeks of lectures in the course. See the lecture notes
available via anonymous FTP for more details on the sections covered.
Chap. 1 Fundamentals of Computer Design
Chap. 2 Performance and Cost
Chap. 3 Instruction Set Design: Alternatives and Principles
Chap. 4 Instruction Set Examples and Measurements of Use
Chap. 5 Basic Processor Implementation Techniques
Chap. 6 Pipelining
Chap. 7 Vector Processors
Chap. 8 Memory-Hierarchy
The goal was to understand the following advanced topics:
Pipelined (segmented) functional units
Chaining of functional units
Memory bank conflicts
Compiler capabilities
G) SDSU course programming assignments
As the lecture material was covered, students worked on their first
two programming assignments. The goal was to develop a feeling for
timings on the SDSU mainframe compared to accuracy of approximation
schemes. Students were asked to assess how much it costs (measured in
CPU time for now) to get a good answer (measured by true error).
Since most students had little numerical analysis background, Dr.
Stewart provided the original code for the "First Program" problem,
and the students were instructed to insert the appropriate timing
calls.
First Program 1991 Course
Run and time a Fortran code that solves a two-point boundary value
problem
y''(t) + pi^2 y(t) = 0
y(0) = 0 y(1/2) = 1
True solution: y(t) = sin (pi t)
via finite differences.
First Program 1992 Course Consider the linear system A x = b, where the N by N matrix A is given by aij = 1/(i+j-1) (the notorious Hilbert matrix) The right-hand side, b, will be chosen so that the true solution, x, will be all 1s. Therefore, bj = S aij You should solve this linear system for various values of N and observe the error incurred (by computing true error, since we know the true solution should have X = 1s) and the performance (measured by the elapsed CPU time).
Good sources for problem statements were: "Computing Applications to Differential Equations: Modelling in the Physical and Social Sciences," by J.M.A. Danby; Reston Publishing "Numerical Methods and Software" by Kahaner, Moler and Nash; Pren- tice-Hall Publishers. "Computational Physics" by Koonin and Meredith; Addison Wesley Pub- lishing Company.
a) Get your "science" program running on the SDSU mainframe. b) Write a report describing your problem and your program's performance on the SDSU mainframe. c) Get your program running on the Cray. d) Extend your problem. For example, use a finer grid spacing or use more species in an interaction. (This will depend on your particular problem.) e) Submit a final report on your Cray project.
H) SDSU course take-home exam
Midterm Exam for the 1992 course
This midterm focused on compiler terminology and related concepts
from the Hennessy/Patterson text with the Cray document TR-OPT,
including the following :
Jamming Vectorizing Loops
Separating Loops into Vectorizable and Nonvectorizable
Iterations
Linearizing Nested Loops
Unrolling Loops (vertically and horizontally)
Midterm Exam from the 1991 Course
The midterm from the 1991 course asked students to write and time
DLXV assembly code (developed in great detail in the Hennessy and
Patterson text with numerous examples) for the translation of For-
tran to perform the matrix/vector multiply, Ax = b, in two differ-
ent manners.
Row-oriented manner we learn first:
= xj , j=1,...,N
Column-oriented manner more suitable for vector processors:
b A + ... + b A = x
1 *1 N *N
This is a demanding problem, but most students gain a deep under-
standing of the Cray's vector processor structure.
I) Other Educational Programs at SDSC
HPCC and K-8 Education -
Jayne Keller, SDSC Education Coordinator (jaynek@sdsc.edu)
Integrating high-performance computing and communications (HPCC)
into the curriculum of primary and secondary schools is critical to
the development of the technicians, scientists, and engineers of
the future. SDSC offers the following activities to address this
need: Supercomputer Center field trip, HPCC half-day in-service
workshop, the SDSC road show, and a technology checklist.
Computational Chemistry -
Rozeanne Steckler, SDSC Manager, Applications R & D (steckler@sdsc.edu)
SDSU Adjunct Professor, Chemistry
The SDSU course Chemistry 596: Chemistry on Supercomputers is
designed as an overview of modern computational chemistry with an
emphasis on learning to use the major chemistry software packages.
This course is not designed as an introduction to theoretical chem-
istry, but rather a course to introduce experimental chemists to
the computational tools available and how to use them in an
informed manner. Many aspects of computational chemistry will be
introduced with each topic presented in coordinated lectures and
labs.
Computer Graphics -
Michael J. Bailey, SDSC Manager of Scientific Visualization (mjb@sdsc.edu)
The UCSD course AMES 293 Advanced Computer Graphics for Engineers
and Scientists is targeted towards students in engineering or
science majors who are interested in applying advanced visualiza-
tion techniques to solving scientific problems. It is not oriented
towards any one major in particular, but is instead directed
towards science in general. Students in this course will learn
techniques that will allow them to develop and use scientific
graphics programs effectively.
Research Experience for Undergraduates at the San Diego
Supercomputer Center
Hassan Aref, SDSC Chief Scientist, and
Rozeanne Steckler, SDSC Manager, Applications R & D (see above)
SDSU Adjunct Professor, Chemistry
Students work on research projects in fields of interest within the
disciplines that make up computational science. Supervisors are
faculty at the student's home institution and SDSC staff. Included
are workshops on high-performance computing and special lectures on
such topics as parallel computing, graphics and scientific visual-
ization, and numerical analysis. Some students, already engaged in
computational research with a faculty member, select a topic within
that project. Others, with an interest in a certain area, use the
REU program to take the first steps. The objective is to give each
student a taste of research in computational science, albeit within
a condensed time frame. All students are given access to
appropriate computer resources at SDSC.
CERFnet -
Susan Estrada, Executive Director (estradas@sdsc.edu)
The California Education and Research Federation Network (CERFnet)
provides a connection to the world via Internet giving access to
hundreds of databases and over a million users worldwide. Its goal
is to promote collaboration among scientists, engineers, and educa-
tors in commercial, government, and academic sectors.
CERFnet provides a 24-hour hotline, continuous network monitoring
and management, an expert staff, and maintenance support. Begun
with the support from the National Science Foundation, CERFnet is a
project of General Atomics, a San Diego-based research and develop-
ment company.
Reuben H. Fleet Space Theater & Science Center: Project Oasis -
Joseph Deken, Senior Fellow, SDSC (dekenj@sdsc.edu)
Senior Scientist, RHF
The Reuben H. Fleet Space Theater and Science Center is one of the
most highly respected informal science education centers in the
nation. Located in Balboa Park's cultural complex, it houses the
world's first OMNIMAX theater, more than 60 "hands-on" science
exhibits which encourage visitor participation, as well as a multi-
media planetarium show.
In July of 1992, SDSC and the Reuben Fleet Center launched a formal
collaboration called Project Oasis. The focus of this collaboration
is twofold:
1 To develop interactive exhibits and educational programs
about high-performance computing and communications for the general
public.
2 To develop new technology for interactive exhibits and educa-
tional programs using leading edge computing and communications
systems, especially computer networking and visualization.
As part of Project Oasis activities, two SDSC consultants gave lec-
tures at the Reuben H. Fleet Space Theater & Science Center
recently:
"Computer Visualization I: The Solar System" -
by Dave Nadeau, Visualization Specialist at SDSC
"Computer Visualization II: The Antarctic Seafloor" -
by Jim McLeod, Visualization Specialist at SDSC
Overview of SUE
Kris Stewart, Assoc. Prof., SDSU (stewart@cs.sdsu.edu)
Senior Fellow, SDSC (619) 942-1012
This file (readme.SUE) presents a description of the files
distributed at the 1991 Supercomputer and Undergraduate Education
(SUE) Workshop. These files are in /pub/sdscinfo/SUE-notes anonymous
FTP from rohan.sdsu.edu
Access and use of documents statement (accesuse.asc)
Schedule of events for 1992 SUE workshop (agenda.92)
Brief overview of SDSC accounting and SUE resources
(accounti.sds)
Obtaining Cray documents (cost) (cray-man.cst)
Overview of Dr. Lloyd Fosdick's HPSC program from U. Colorado
Boulder. This is a terrific program in undergraduate High
Performance Scientific Computing. (hpsc-ks.rme)
Intuitive introduction to ODEs, Euler's method, SAXPY and their
connection with vector operations. This fits well with Chapter 7 from
Hennessy and Patterson. (my-chap7.asc)
Kay A. Robbins and Steven Robbins wrote a book that serves as a
good teaching tool for understanding the details of the Cray at the
assembler level. The Cray X-MP/Model 24: A Case Study in Pipelined
Architecture and Vector Processing was published by Springer-Verlag
in 1987. (xmpsim1.asc)
1. Organization of CS 575 Supercomputing for the Sciences, taught at
San Diego State University, Spring 1991 and 1992, by Kris Stewart
a. Outline of course (575ovrvw.s92)
b. Initial handout to students (575init.asc)
c. List of student programming projects - most final reports and
source code are available as tar files (575proj.asc)
d. A road-map through the lecture notes and how they were used
in the CS 575 course (readme.575)
2. Actual course notes for CS 575 based on the text, Computer
Architecture: A Quantitative Approach by John Hennessy and David
Patterson, Morgan Kaufmann Publisher, 1990, coupled with handouts
from Cray Research Incorporated from two documents, TR-OPT and TR-
YSAAP. The files associated with the Patterson and Hennessy text are
named ph-something.asc, those associated with Cray documents are name
cray-something.asc. There are many files - see readme.575 in the
anonymous ftp directory /pub/sdscinfo/Supercomputing-Lecture-Notes
from rohan.sdsc.edu
Of particular interest are the files tr-opt3.asc and tr-opt7.asc
Chapter 3 of the CRI document TR-OPT presents an overview of vector-
ization terminology and examples. This includes a Fortran code which
is useful for showing different types of loops and how the compiler
identifies them in the output listing. This code should only be com-
piled not executed since the arrays involved are never initialized.
Chpater 7 of TR-OPT presents the fundamental optimization techniques.
The file tr-opt7.asc contains Fortran code that should be executed
since timing statistics are collected to demonstrate the effects of a
programmer's source code on the Cray's performance and the ability of
the Fortran compiler to automatically optimize source code.
3. Computer Ethics and Responsibility Section. It was felt that
before students were given access to the Cray Y-MP it was essential
to have an explicit discussion of "ethics" and "responsibility"
coupled with a written essay assignment.
a. Computer ethics assignment (four scenarios)
(ethicasg.asc)
b. Handout of the lecture given by Dan Sulzbach (ethic-l2.asc)
4. Using the Cray - note students will have spent 6 weeks programming
on the SDSU mainframe in Unix prior to moving to the Cray.
(readme.3rd)
a. Initial handout to students on Cray use (crayinit.s92)
This handout also discusses the files:
crayfopt.asc (examples of Fortran optimization)
crayc-ex.asc (samples of c codes and techniques)
my-optim.tar (sample tar file for students to
to use to become familiar with tar)
b. Man pages for Cray Fortran (cf77, fpp, fmp, cft77) compiling
environment (crayacce.asc)
c. Location of sample codes and how to create sample run of
Fortran to get listing, marked loops and diagnostics
(cf77 -ZV -Wf"-emx") (crayacce.asc)
d. Man pages for cc and cl for C compiler with listing
(crayacce.asc)
e. As in c) above for C to get listing (cl) and diagnostics
(cc -h report=vsi) (crayacce.asc)
I recommend that instructors take extra time to explain reslist (a
relatively expensive command which gives students information on
their remaining resources), ja (an inexpensive Unix system call with
various parameters) and the NQS batch system. You are charged double
for all interactive jobs on the Cray at SDSC. Students usually are
not familiar with using Batch Queues, which can reduce the charges on
a job to 0.5 times actual use (therefore a four-fold decrease over
interactive user). Students need to become experienced with these
queues to effectively use their finite amount of Cray time.
SDSC Documents (note new users of the SDSC Cray will receive the SDSC
User Guide. You can obtain additional copies of the User Guide
through the doc processor on the Y-MP. Type doc. The file you want is
usrguide. This is a very long file, so I would not recommend getting
the whole thing. The individual chapters of the User Guide are
available as separate files, e.g. ugoptim, ugtools, ugunicos, etc).
Other files available from the doc processor (and in the directory
/pub/sdscinfo/SDSC-info-files anonymous FTP from rohan.sdsu.edu):
f. Introduction to UNICOS (unicos)
g. EZFortran, EZC, EZDebug (ezfortrn, ezc, ezdebug)
h. EZStorage (DTI, data tree documentation) (ezstorag)
EZBatch (NQS, Networking Queueing System) (ezbatch)
i. EZMath (math libraries available) (ezmath)
EZGraphics (graphics capabilities) (ezgrphcs)
j. EZShell (for those brave Unix hackers who want to customize
their environment) (ezshell)
EZTools (make, fmgen, fsplit, tar and cpio) (eztools)
k. Optimization - an excellent document, oriented towards
Fortran, on how to produce optimized Fortran source code on
the Cray Y-MP (optimiz)
Overview of CS 575 Lecture Notes
Kris Stewart, Assoc. Prof., SDSU (stewart@cs.sdsu.edu)
Senior Fellow, SDSC (619) 942-1012
This file (readme.2nd) presents a detailed overview of the
actual lectures of the CS 575 course. These files are in
/pub/sdscinfo/Supercomputing-Course-Notes anonymous FTP from
rohan.sdsu.edu
The files fall into three classiciations:
a) course information and additional examples from the instructor
b) those related directly to the Hennessy & Patterson text describing
which sections/topics/concepts were used from each chapter
c) xeroxed copied of pages from the Cray documents TR-OPT
Info and Examples from instructor
575init.asc Initial handout given to students the first day of
classes.
charac.asc Handout given to students the second week as we
discussed "What is a Supercomputer?". These were
notes taken from Dr. Dan Sulzbach talk at the SDSC
Summer Institute in 1990.
assignmt.asc Describes the programming assignments students were
asked to complete during the semester.
my-chap7.asc I wrote this section to try to motivate the idea of
vector registers and operations from the point of view
of science. A major computation performed repeatedly
in scientific computation involves solving ordinary
differential equations (ODE). Presents the idea of an
ODE as a vector system of equations and shows how
Euler's method can be visualized as doing simple
vector operations, a saxpy with scalar h and vectors
y/current, f/current and y/next. Although students do
not have a deep background in numerical analysis,
this has been successful in relating the saxpy to
scientific computation at an intuitive level.
optimiz.doc This is an SDSC document available via the doc
processor on the Cray Y-MP. This was FTPed from
the Cray to the SDSU mainframe and students students
were encouraged to obtain their own copy.
crayfopt.asc I coded up the examples from the SDSC Optimization
document. These code are presented in the appendix
of that document and available on the Cray. This
handout has details on accessing the codes, untarring
the codes, running the make utility to compile with
various Fortran optimization flags on or off
crayc-ex.asc Handout on C codes and how to rewrite them to improve
performance. These are concepts discussed in TR-OPT
which I coded up to give students examples of C codes
and the Cray tools to analyze their performance. These
were provided by Etan Scherzer, CRI.
xmpsim.asc The text The Cray X-MP/Model 24, A Case Study in
Pipelined Architecture and Vector Processing by
Kay A. Robbins and Steven Robbins (Springer-Verlag) is
an excellent source for explanations and examples of
the performance of the Cray at the assembler level.
Details on sections used from Hennessy & Patterson
ph-intro.asc Introductory discussion of the aims and orientation of
the course and the use of the text Computer Architecture: A
Quantitative Approach by Hennessy and Patterson (Morgan-Kaufmann,
Pub., San Mateo, CA)
ph-chap1.asc This chapter establishes a definition of performance,
presents Amdahl's Law, and defines and uses terms such as latency and
throughput. I added Gantt charts to illustrate the assembly line
example to agree with later discussions of pipelined CPUs.
(the handout cray-arc.asc from TR-OPT fits in well here also)
ph-chap2.asc We are only interested in performance in this chapter.
The treatment of cost is oriented toward someone designing a computer
architecture. We are CONSUMERS of an architecture, not its designer
in this course (of course it's a very complex architecture we are
studying). This chapter discusses MIPS, MFLOPS and their limitations
as measures of performance.
ph-chap3.asc Really only interested in Section 3.7 - The Role of
High-Level Languages and Compilers. The Cray Y-MP is a register-
register machine. There is a nice example using a graph coloring
algorithm for register allocation. Students should try to develop an
intuitive idea of what the compiler really does for them. The
compiler is the major software tool that aids in effective use of the
supercomputer. (the handout cray-com.asc fits well here)
ph-chap4.asc This chapter presents and discusses instruction sets
for the VAX, IBM 360/370, Intel 8086 and DLX. DLX is the
architecture the book develops for all its examples and is our only
interested in this chapter. Section 4.5 present the DLX instruction
set and gives examples of its use. In Chapter 6, pipelining will be
presented using this instruction set. This is an important concept
for understanding the performance of the Cray. In Chapter 7, this
instruction set is extended to DLXV to implement vector processing
(and descriptions of chaining, vector stride, strip mining vector
loops, and more). So it is essential to work through lots of examples
using DLX so that students are comfortable with the material in
Chapter 6 and the extensions in Chapter 7. (add cray-scl.asc here)
ph-chap5.asc This chapter establishes the basic steps of execution:
instruction fetch, instruction decode and register fetch, execution
of effective address, memory access and branch completion, write-back
step. Students probably don't realize that at the machine (or
assembler) level there are many tasks to be performed to accomplish
something as simple as
ADD R1,R2,R3
This will be important when Chapter 6 discusses pipelining. (add the
handout cray-con.asc here)
ph-over6.asc Chapters 6 and 7 are the main goal of the course
lecture material. The concepts of:
pipeline operation of segmented scalar computational
units
vector registers
segmented vector functional units and chained pipelines
vector stride, memory bank conflicts and stalls in the
pipe
strip mining vector loops
are covered in these two chapters. Our recurring example will be the
saxpy. We first examine Chapter 6 and its sections:
6.1 What is pipelining?
6.2 The basic pipeline for DLX
6.3 Making the pipeline work
6.4 The major hurdle of pipelinig - hazards
Introduces forwarding which is crutial for
understanding chaining of functional units
6.6 Extending the DLX pipeline to handle multicycle
operations
6.8 Advanced pipeling - taking advantage of more
instruction-level parallelism
6.12 Historical perspective and references
Important homework problems: 6-11 to 6-19 on saxpy
ph-over7.asc This chapter presents the DLVX instruction set that is
used in the examples and homework problems (add handouts cray-ch3.asc
and tropt-3.asc here).
phover7b.asc This covers section 7.6 of the text - Enhancing Vector
Performance. This discusses the VERY important concept of chaining.
Important homework problems: 7-1 to 7-6, 7-8 to 7-10. Most of these
were worked in lecture so students would get lots of practice with
timings. (add handout cray-ch7.asc and tropt-7.asc here)
phover7c.asc The real guts of the classes - end of Chapter 7 of
Hennessy and Patterson
7.7 Putting it all together: Evaluating the performance of vector
processors
7.8 Fallacies and pitfalls - interesting reading
7.9 Concluding remarks - interesting reading
7.10 Historical perspective and references. VERY interesting reading
Pages from Cray documents TR-OPT and TR-YSAAP
cray-arc.asc These pages included a diagram of the hardware
components of the Cray Y-MP, diagram of the 8 CPUs and their memory
and communcations sections, a table of the 14 separate functional
units with the registers they use and time (in clock periods) and a
block diagram of a single CPU.
cray-com.asc These pages included a quick look at the Fortran
Compiling System (cf77) and the Standard C system. This presented
some standard ideas for scalar optimization. These are operations
performed by the compilers students are accustomed to using on the
scalar machines at SDSU, e.g. expression reordering, constant
folding, common subexpression elimination). This handout also
discusses the phases of the compilation process: source statement
processing, scalar optimization, vectorization, code generation.
cray-scl.asc These pages covered another look at the registers,
functional units and memory access paths for the Cray Y-MP.
cray-con.asc These pages cover the control section of the CPU.
cray-ch3.asc Students were given copies of Chapter 3 of TR-OPT.
This chapter presents a view of Vectorization on the Cray Y-MP. It
has many examples and I found it very readable.
tropt-3.asc I coded up a Fortran code from the examples in Chapter
3 and we went over this in class. I also showed students the
tropt3.m file which was produced by cf77's Fortran vectorization
preprocessor (fpp). This gets students accustomed to the powerful
tools provided on the Cray to aid in optimizing codes.
cray-ch7.asc Students were given copies of Chapter 7 of TR-OPT.
This chapter presents a vector of Common CPU optimization techniques.
tropt-7.asc I coded up Fortran code from the examples in Chapter 7
and ran before and after timings for the standard (original) coding
and the optimized (modified) coding. Also there were two setting used
in compiling the code - one with all optimization turned off and one
with standard optimization. So there are four sets of execution
times to be examined. The base case is original/no-optimization, but
students can gain a feeling for how smart the cf77 compiling
environment is by comparing the execution times for original with
optimization-on. The source code (tropt7.f) and tropt7.m which is
the translated code from fpp are both included.
Actually running on the Cray
Kris Stewart, Assoc. Prof., SDSU (stewart@cs.sdsu.edu)
Senior Fellow, SDSC (619) 942-1012
This file (readme.3rd) provides detail pertinent to actually running
codes on the Cray Y-MP at SDSC. These files are in
/pub/sdscinfo/Supercomputing-Course-Notes anonymous FTP from
rohan.sdsu.edu
Students received their own copy of SDSC User's Guide. This is an
excellent document written by the SDSC consultants giving a thorough
introduction to the Cray, its programming tools, applications
packages and more. Given human nature, the User's Guide's
thoroughness tends to dissuade users from just sitting down and
reading it cover to cover. I therefore e-mailed students a copy of
sdsc-gid.asc A Road Map for the SDSC User's Guide highlighting
portions students should pay particular attention to.
crayacce.asc This gives information which instructors could obtain
and then make available to their students on their home machines.
Includes especially useful man pages, the location on the Cray Y-MP
of some sample Fortran and C codes used in TR-OPT. Also gives my
recommended calling options for cf77 and cc/cl for generating
informative examples and listings.
Students were given the location of crayfopt.asc and cray-exc.asc on
our home machine since they are very long. They contain detailed
information that students could examine to familiarize themselves
with the SDSC Cray tools without using up their own CPU allocation.
Since students will have a finite amount of CPU time on the Cray, I
tried to provide a lot of information that most users would obtain in
initial explorations on a new machine. I wanted to avoid having 30
separate students repeat the same explorations. Therefore, students
were e-mailed a copy of
initcray.asc a quick update on accessing the Cray from SDSU.
Highlights the Unix processors on the Cray and other SDSC specific
things - like DTI - that Cray users need to know about.
I think most people will find the initcray.asc file informative,
though parts of it are specific to accessing the SDSC Cray from
facilities at SDSU. It has hints on Cray usage. It presents a log of
an actual session, so that you can see how the logon process
proceeds. The Cray news file is read. The doc processor is used to
find the file optimiz which is a very informative SDSC Fortran
document). This file is copied to a local file (since we see that it
is approximately 100 pages long) so that you can edit it, or download
it to another machine, or whatever.
initcray.asc has names for the off-site printers (at SDSU) that you
can have your output sent to. If you generate output on the Cray and
do not specify that it be routed to a machine to your particular
site, then the output will be sent to you via U.S. mail. This will
take a couple of days.
return to Kris' Home Page