Dr. Kris Stewart
Associate Professor, Dept. of Mathematical Sciences
Senior Fellow, San Diego Supercomputer Center
San Diego State University
San Diego, CA 92182-0314
stewart@sdsu.edu
Dec. 14, 1992 (update)
SUE, Supercomputing and Undergraduate Education, was funded primarily by the Division of Advanced Scientific Computing at the National Science Foundation. Additional funding was provided by The Cray Research Foundation.
A) Background on the grant
The San Diego Supercomputer Center (SDSC) has taken a proactive role
in disseminating information to make supercomputing accessible to a
much wider audience. In particular, SDSC has targeted instructors at
undergraduate institutions to introduce their students to supercom-
puters and their use. As part of this effort, SDSC established Super-
computing and Undergraduate Education (SUE). This program enhances
the supercomputing expertise of faculty and helps them incorporate
supercomputing topics into their curricula and departmental majors.
A major component of this program is a one-week residential summer
workshop at SDSC for faculty from primarily undergraduate institu-
tions throughout the U.S. Lecture materials from both the 1991 and
1992 workshops can be obtained via anonymous FTP over the Internet
(See handout for details).
Another component of this program is an annual course taught at San
Diego State University (SDSU), in which undergraduate students
learn about supercomputers and their use. This course is described
in the SDSU catalog as follows:
CS 575 - Supercomputing for the Sciences Interdisciplinary course intended for all science and engineering majors. o Advanced computing techniques developed for supercomputers o Overview of architecture, software tools, scientific computing and communications o Hands-on experience using supercomputers Prereq.: Extensive programming background in Fortran or C.This course has been taught twice.
B) Major Themes of the Program
SUE's 1991 and 1992 faculty workshops at SDSC and the undergraduate
curricula at SDSU focused on the following topics in supercomputing.
1 Interdependence of Computer Science & Scientific Experts
The workshop faculty represented two groups: those interested in
learning discipline-specific applications packages and those inter-
ested in using software tools to facilitate programming. To accommo-
date both groups, we presented an overview of available resources
(applications packages and program optimization techniques) and
encouraged faculty to seek further information independently. We
identified the technical people they could contact for further infor-
mation on their particular interests.
Similarly, the required programming background for the course rein-
forced the traditional separation between computer science and sci-
ence/engineering students by encouraging the former to take the
course. In both cases, the participants interacted well as they real-
ized the benefits of working with others with different strengths.
2 CRAY architecture
The course used the text Computer Architecture: A Quantitative
Approach, by Hennessy and Patterson (published by Morgan Kaufmann).
The sections we used from this text proved invaluable.
We tried to gain an appreciation for the sources of the Cray Y-MP's
power and understand the subtleties of its design as they impact the
way a programmer should approach a problem. Therefore, we carefully
avoided many of the sections of the text that did not directly relate
to the Cray Y-MP. (The course notes available via anonymous FTP can
provide a guide for other instructors interested in using this text
for a similar purpose.) This text was supplemented with documents by
Cray Research, Inc.; particularly the following titles:
TR-OPT (Rev. D) cf77 & scc Features and Optimization. An excellent,
Cray-specific training report covering the Fortran (cf77) and C (scc)
compilers. Provides code examples, diagrams, and explanations of
crucial vectorization topics (and the conditions that inhibit the
compiler), memory organization, performance tools, common optimiza-
tion techniques, and much more.
TR-YSAAP Cray Y-MP System Architecture for Applications Programmers.
Covers material at the assembler level in more detail. Works well
with the models developed in Hennessy and Patterson (unfortunately
this document is no longer available)
3 Architecture of parallel supercomputers
(Intel iPSC/860 and nCUBE 2)
This was not covered in the undergraduate course at SDSU, but was
covered at the faculty workshop by consultants from Intel and nCUBE,
who gave introductory lectures on the parallel architectures.
4 Communications
Both the workshop and the course covered the resources
available through the Internet, including:
a. Accessing sources of information (news groups, anonymous FTP): nnsc.nsf.net (NSF information site) oak.oakland.edu (simtel20 mirror) sumex-aim.stanford.edu (info-mac) ftp.cs.titech.ac.jp (Dr. David Kahaner's Japan Bulletins) b. Communicating with peers (via e-mail) c. Using FTP to transfer programs between machines (a crude look at heterogeneous computing)
Dr. Sulzbach began his talk by stating: "I am not a philosopher, not an ethicist, not an expert. I am a computer professional like you. I'm not here to preach because I have no license or authority to do that. I hope only to raise some issues related to computer ethics. I will undoubtedly ask more questions than I answer."Computer accounts on the Cray Y-MP were distributed only after the week-long discussions on ethics.
7 Writing
Most of the students had no problem with the computer ethics essay,
but they typically had little experience in writing up a science-
oriented programming report. Therefore, we assigned a programming
project that consisted of the following components:
a. Solve a problem on a campus mainframe and document the mainframe's
performance.
b. Solve the same problem on the Cray.
c. Compute an enhancement of the problem on the Cray and document the
Cray's performance.
Instructor feedback after reading the document from assignment (a)
greatly enhanced the organization and content of the final document
in (c).
See Section G SDSU course programming assignments in this handout for
more details on the exact assignments. Section G also gives the
instructor's clarifications and directions to students to help them
specify and document their projects.
C) SUE Workshop Overview
This one-week workshop overviewed the many resources available at the
San Diego Supercomputer Center. In general, he mornings involved pre-
sentations by consultants from the following institutions:
San Diego Supercomputer Center (SDSC) Cray Research, Incorporated (CRI) Intel Supercomputer Systems nCUBE CorporationIn the afternoons, software demonstrations or open labs were orga- nized. Due to the diversity of backgrounds of the workshop partici- pants, we covered a very wide variety of information at a basic level. Participants were encouraged to discuss the information in more detail with the consultants in the afternoon laboratory ses- sions. We also provided information on resources available over the Internet to enhance curriculum development in supercomputing.
Technical support was provided by Cray Research, Incorporated Intel Corporation nCUBE Corporation
D) SUE Workshop Schedule
MONDAY am Welcome What is a Supercomputer? (Dan Sulzbach) Business (Kris Stewart) How to login, how to print DataTree file storage Effective use of resources/accounting (batch queues) at SDSC SDSC Cray User Guide CS 575: Supercomputing for the Sciences (Kris Stewart) Hennessy & Patterson text: Computer Architecture: A Quantitative Approach (Morgan Kaufmann) Responsibility and ethics Robbins & Robbins, Cray X-MP/Model 24 (Springer-Verlag) Cray TR-OPT examples Fortran/C exercises Dr. Lloyd Fosdick's HPSC Overview pm Afternoon lab session Run Fosdick-HPSC ch. 7 examples. Run TR-OPT examples and use performance tools Run your own "student project" codes TUESDAY Etan Scherzer (CRI instructor) covers TR-OPT TR-OPT is an extensive Cray software training workbook which is typically covered at a more leisurely pace over a one-week period. Etan will try to highlight crucial portions and will then be avail- able for individual discussions this afternoon and tomorrow. Etan will be available all afternoon to answer any Cray-specific questions. WEDNESDAY am Cray Application Packages (SDSC Consultants) Biology (Jack Rogers) Chemistry (Jerry Greenberg) CFD (Rich Charles) Math (Bob Leary) pm "Teaching Chemistry" (Rozeanne Steckler) "Teaching Advanced Graphics" (Michael Bailey) VisLab reserved: (AVS, Insight) Intro to VisLab and Workstations THURSDAY am Access to Parallel Machines at SDSC nCUBE Introduction and Examples (Chuck Niggley, Consultant, nCUBE) Intel iPSC/860 (Dancil Strickland, Regional Parallel Systems Engineer, Intel Corp.) pm Scalable Version of Wave Equation (Carl Scarbnick, SDSC) VisLab reserved Work through examples on the parallel machines. nCUBE and Intel consultants will be available for assistance. FRIDAY am Discussions What do you see as the future of high-performance computing? Parallel vs. vector HPSC Curriculum, Dr. Lloyd Fosdick, U. Colorado, Boulder Computational Science, Dr. Geoffrey Fox, Syracuse University Parallel Computing, Dr. Chris Nevison, Colgate University Different Curricula Orientations Discipline-specific B.S. in Computational Chemistry or Computational Science?
E) SUE workshop materials via anonymous FTP
Most of the files from the SUE workshop can be accessed via anonymous
ftp. To access them, FTP to the host rohan.sdsu.edu then retrieve
from the directories /pub/sdscinfo/SUE-notes, /pub/sdscinfo/SDSC-
info-files or /pub/sdscinfo/Supercomputing-Course-Notes.
A short description of the individual files is contained at the end
of this handout.
F) Lecture materials and readings
The sections from the following chapters of Computer Architecture: A
Quantitative Approach (Hennessy & Patterson) were covered in the
first nine weeks of lectures in the course. See the lecture notes
available via anonymous FTP for more details on the sections covered.
Chap. 1 Fundamentals of Computer Design Chap. 2 Performance and Cost Chap. 3 Instruction Set Design: Alternatives and Principles Chap. 4 Instruction Set Examples and Measurements of Use Chap. 5 Basic Processor Implementation Techniques Chap. 6 Pipelining Chap. 7 Vector Processors Chap. 8 Memory-Hierarchy The goal was to understand the following advanced topics: Pipelined (segmented) functional units Chaining of functional units Memory bank conflicts Compiler capabilities
G) SDSU course programming assignments
As the lecture material was covered, students worked on their first
two programming assignments. The goal was to develop a feeling for
timings on the SDSU mainframe compared to accuracy of approximation
schemes. Students were asked to assess how much it costs (measured in
CPU time for now) to get a good answer (measured by true error).
Since most students had little numerical analysis background, Dr.
Stewart provided the original code for the "First Program" problem,
and the students were instructed to insert the appropriate timing
calls.
First Program 1991 Course
Run and time a Fortran code that solves a two-point boundary value problem y''(t) + pi^2 y(t) = 0 y(0) = 0 y(1/2) = 1 True solution: y(t) = sin (pi t) via finite differences.
First Program 1992 Course Consider the linear system A x = b, where the N by N matrix A is given by aij = 1/(i+j-1) (the notorious Hilbert matrix) The right-hand side, b, will be chosen so that the true solution, x, will be all 1s. Therefore, bj = S aij You should solve this linear system for various values of N and observe the error incurred (by computing true error, since we know the true solution should have X = 1s) and the performance (measured by the elapsed CPU time).
Good sources for problem statements were: "Computing Applications to Differential Equations: Modelling in the Physical and Social Sciences," by J.M.A. Danby; Reston Publishing "Numerical Methods and Software" by Kahaner, Moler and Nash; Pren- tice-Hall Publishers. "Computational Physics" by Koonin and Meredith; Addison Wesley Pub- lishing Company.
a) Get your "science" program running on the SDSU mainframe. b) Write a report describing your problem and your program's performance on the SDSU mainframe. c) Get your program running on the Cray. d) Extend your problem. For example, use a finer grid spacing or use more species in an interaction. (This will depend on your particular problem.) e) Submit a final report on your Cray project.
H) SDSU course take-home exam
Midterm Exam for the 1992 course This midterm focused on compiler terminology and related concepts from the Hennessy/Patterson text with the Cray document TR-OPT, including the following : Jamming Vectorizing Loops Separating Loops into Vectorizable and Nonvectorizable Iterations Linearizing Nested Loops Unrolling Loops (vertically and horizontally) Midterm Exam from the 1991 Course The midterm from the 1991 course asked students to write and time DLXV assembly code (developed in great detail in the Hennessy and Patterson text with numerous examples) for the translation of For- tran to perform the matrix/vector multiply, Ax = b, in two differ- ent manners. Row-oriented manner we learn first:= xj , j=1,...,N Column-oriented manner more suitable for vector processors: b A + ... + b A = x 1 *1 N *N This is a demanding problem, but most students gain a deep under- standing of the Cray's vector processor structure.
I) Other Educational Programs at SDSC
HPCC and K-8 Education -
Jayne Keller, SDSC Education Coordinator (jaynek@sdsc.edu)
Integrating high-performance computing and communications (HPCC)
into the curriculum of primary and secondary schools is critical to
the development of the technicians, scientists, and engineers of
the future. SDSC offers the following activities to address this
need: Supercomputer Center field trip, HPCC half-day in-service
workshop, the SDSC road show, and a technology checklist.
Computational Chemistry -
Rozeanne Steckler, SDSC Manager, Applications R & D (steckler@sdsc.edu)
SDSU Adjunct Professor, Chemistry
The SDSU course Chemistry 596: Chemistry on Supercomputers is
designed as an overview of modern computational chemistry with an
emphasis on learning to use the major chemistry software packages.
This course is not designed as an introduction to theoretical chem-
istry, but rather a course to introduce experimental chemists to
the computational tools available and how to use them in an
informed manner. Many aspects of computational chemistry will be
introduced with each topic presented in coordinated lectures and
labs.
Computer Graphics -
Michael J. Bailey, SDSC Manager of Scientific Visualization (mjb@sdsc.edu)
The UCSD course AMES 293 Advanced Computer Graphics for Engineers
and Scientists is targeted towards students in engineering or
science majors who are interested in applying advanced visualiza-
tion techniques to solving scientific problems. It is not oriented
towards any one major in particular, but is instead directed
towards science in general. Students in this course will learn
techniques that will allow them to develop and use scientific
graphics programs effectively.
Research Experience for Undergraduates at the San Diego
Supercomputer Center
Hassan Aref, SDSC Chief Scientist, and
Rozeanne Steckler, SDSC Manager, Applications R & D (see above)
SDSU Adjunct Professor, Chemistry
Students work on research projects in fields of interest within the
disciplines that make up computational science. Supervisors are
faculty at the student's home institution and SDSC staff. Included
are workshops on high-performance computing and special lectures on
such topics as parallel computing, graphics and scientific visual-
ization, and numerical analysis. Some students, already engaged in
computational research with a faculty member, select a topic within
that project. Others, with an interest in a certain area, use the
REU program to take the first steps. The objective is to give each
student a taste of research in computational science, albeit within
a condensed time frame. All students are given access to
appropriate computer resources at SDSC.
CERFnet -
Susan Estrada, Executive Director (estradas@sdsc.edu)
The California Education and Research Federation Network (CERFnet)
provides a connection to the world via Internet giving access to
hundreds of databases and over a million users worldwide. Its goal
is to promote collaboration among scientists, engineers, and educa-
tors in commercial, government, and academic sectors.
CERFnet provides a 24-hour hotline, continuous network monitoring
and management, an expert staff, and maintenance support. Begun
with the support from the National Science Foundation, CERFnet is a
project of General Atomics, a San Diego-based research and develop-
ment company.
Reuben H. Fleet Space Theater & Science Center: Project Oasis -
Joseph Deken, Senior Fellow, SDSC (dekenj@sdsc.edu)
Senior Scientist, RHF
The Reuben H. Fleet Space Theater and Science Center is one of the
most highly respected informal science education centers in the
nation. Located in Balboa Park's cultural complex, it houses the
world's first OMNIMAX theater, more than 60 "hands-on" science
exhibits which encourage visitor participation, as well as a multi-
media planetarium show.
In July of 1992, SDSC and the Reuben Fleet Center launched a formal
collaboration called Project Oasis. The focus of this collaboration
is twofold:
1 To develop interactive exhibits and educational programs
about high-performance computing and communications for the general
public.
2 To develop new technology for interactive exhibits and educa-
tional programs using leading edge computing and communications
systems, especially computer networking and visualization.
As part of Project Oasis activities, two SDSC consultants gave lec-
tures at the Reuben H. Fleet Space Theater & Science Center
recently:
"Computer Visualization I: The Solar System" -
by Dave Nadeau, Visualization Specialist at SDSC
"Computer Visualization II: The Antarctic Seafloor" -
by Jim McLeod, Visualization Specialist at SDSC
Overview of SUE Kris Stewart, Assoc. Prof., SDSU (stewart@cs.sdsu.edu) Senior Fellow, SDSC (619) 942-1012 This file (readme.SUE) presents a description of the files distributed at the 1991 Supercomputer and Undergraduate Education (SUE) Workshop. These files are in /pub/sdscinfo/SUE-notes anonymous FTP from rohan.sdsu.edu Access and use of documents statement (accesuse.asc) Schedule of events for 1992 SUE workshop (agenda.92) Brief overview of SDSC accounting and SUE resources (accounti.sds) Obtaining Cray documents (cost) (cray-man.cst) Overview of Dr. Lloyd Fosdick's HPSC program from U. Colorado Boulder. This is a terrific program in undergraduate High Performance Scientific Computing. (hpsc-ks.rme) Intuitive introduction to ODEs, Euler's method, SAXPY and their connection with vector operations. This fits well with Chapter 7 from Hennessy and Patterson. (my-chap7.asc) Kay A. Robbins and Steven Robbins wrote a book that serves as a good teaching tool for understanding the details of the Cray at the assembler level. The Cray X-MP/Model 24: A Case Study in Pipelined Architecture and Vector Processing was published by Springer-Verlag in 1987. (xmpsim1.asc) 1. Organization of CS 575 Supercomputing for the Sciences, taught at San Diego State University, Spring 1991 and 1992, by Kris Stewart a. Outline of course (575ovrvw.s92) b. Initial handout to students (575init.asc) c. List of student programming projects - most final reports and source code are available as tar files (575proj.asc) d. A road-map through the lecture notes and how they were used in the CS 575 course (readme.575) 2. Actual course notes for CS 575 based on the text, Computer Architecture: A Quantitative Approach by John Hennessy and David Patterson, Morgan Kaufmann Publisher, 1990, coupled with handouts from Cray Research Incorporated from two documents, TR-OPT and TR- YSAAP. The files associated with the Patterson and Hennessy text are named ph-something.asc, those associated with Cray documents are name cray-something.asc. There are many files - see readme.575 in the anonymous ftp directory /pub/sdscinfo/Supercomputing-Lecture-Notes from rohan.sdsc.edu Of particular interest are the files tr-opt3.asc and tr-opt7.asc Chapter 3 of the CRI document TR-OPT presents an overview of vector- ization terminology and examples. This includes a Fortran code which is useful for showing different types of loops and how the compiler identifies them in the output listing. This code should only be com- piled not executed since the arrays involved are never initialized. Chpater 7 of TR-OPT presents the fundamental optimization techniques. The file tr-opt7.asc contains Fortran code that should be executed since timing statistics are collected to demonstrate the effects of a programmer's source code on the Cray's performance and the ability of the Fortran compiler to automatically optimize source code. 3. Computer Ethics and Responsibility Section. It was felt that before students were given access to the Cray Y-MP it was essential to have an explicit discussion of "ethics" and "responsibility" coupled with a written essay assignment. a. Computer ethics assignment (four scenarios) (ethicasg.asc) b. Handout of the lecture given by Dan Sulzbach (ethic-l2.asc) 4. Using the Cray - note students will have spent 6 weeks programming on the SDSU mainframe in Unix prior to moving to the Cray. (readme.3rd) a. Initial handout to students on Cray use (crayinit.s92) This handout also discusses the files: crayfopt.asc (examples of Fortran optimization) crayc-ex.asc (samples of c codes and techniques) my-optim.tar (sample tar file for students to to use to become familiar with tar) b. Man pages for Cray Fortran (cf77, fpp, fmp, cft77) compiling environment (crayacce.asc) c. Location of sample codes and how to create sample run of Fortran to get listing, marked loops and diagnostics (cf77 -ZV -Wf"-emx") (crayacce.asc) d. Man pages for cc and cl for C compiler with listing (crayacce.asc) e. As in c) above for C to get listing (cl) and diagnostics (cc -h report=vsi) (crayacce.asc) I recommend that instructors take extra time to explain reslist (a relatively expensive command which gives students information on their remaining resources), ja (an inexpensive Unix system call with various parameters) and the NQS batch system. You are charged double for all interactive jobs on the Cray at SDSC. Students usually are not familiar with using Batch Queues, which can reduce the charges on a job to 0.5 times actual use (therefore a four-fold decrease over interactive user). Students need to become experienced with these queues to effectively use their finite amount of Cray time. SDSC Documents (note new users of the SDSC Cray will receive the SDSC User Guide. You can obtain additional copies of the User Guide through the doc processor on the Y-MP. Type doc. The file you want is usrguide. This is a very long file, so I would not recommend getting the whole thing. The individual chapters of the User Guide are available as separate files, e.g. ugoptim, ugtools, ugunicos, etc). Other files available from the doc processor (and in the directory /pub/sdscinfo/SDSC-info-files anonymous FTP from rohan.sdsu.edu): f. Introduction to UNICOS (unicos) g. EZFortran, EZC, EZDebug (ezfortrn, ezc, ezdebug) h. EZStorage (DTI, data tree documentation) (ezstorag) EZBatch (NQS, Networking Queueing System) (ezbatch) i. EZMath (math libraries available) (ezmath) EZGraphics (graphics capabilities) (ezgrphcs) j. EZShell (for those brave Unix hackers who want to customize their environment) (ezshell) EZTools (make, fmgen, fsplit, tar and cpio) (eztools) k. Optimization - an excellent document, oriented towards Fortran, on how to produce optimized Fortran source code on the Cray Y-MP (optimiz)
Overview of CS 575 Lecture Notes Kris Stewart, Assoc. Prof., SDSU (stewart@cs.sdsu.edu) Senior Fellow, SDSC (619) 942-1012 This file (readme.2nd) presents a detailed overview of the actual lectures of the CS 575 course. These files are in /pub/sdscinfo/Supercomputing-Course-Notes anonymous FTP from rohan.sdsu.edu The files fall into three classiciations: a) course information and additional examples from the instructor b) those related directly to the Hennessy & Patterson text describing which sections/topics/concepts were used from each chapter c) xeroxed copied of pages from the Cray documents TR-OPT Info and Examples from instructor 575init.asc Initial handout given to students the first day of classes. charac.asc Handout given to students the second week as we discussed "What is a Supercomputer?". These were notes taken from Dr. Dan Sulzbach talk at the SDSC Summer Institute in 1990. assignmt.asc Describes the programming assignments students were asked to complete during the semester. my-chap7.asc I wrote this section to try to motivate the idea of vector registers and operations from the point of view of science. A major computation performed repeatedly in scientific computation involves solving ordinary differential equations (ODE). Presents the idea of an ODE as a vector system of equations and shows how Euler's method can be visualized as doing simple vector operations, a saxpy with scalar h and vectors y/current, f/current and y/next. Although students do not have a deep background in numerical analysis, this has been successful in relating the saxpy to scientific computation at an intuitive level. optimiz.doc This is an SDSC document available via the doc processor on the Cray Y-MP. This was FTPed from the Cray to the SDSU mainframe and students students were encouraged to obtain their own copy. crayfopt.asc I coded up the examples from the SDSC Optimization document. These code are presented in the appendix of that document and available on the Cray. This handout has details on accessing the codes, untarring the codes, running the make utility to compile with various Fortran optimization flags on or off crayc-ex.asc Handout on C codes and how to rewrite them to improve performance. These are concepts discussed in TR-OPT which I coded up to give students examples of C codes and the Cray tools to analyze their performance. These were provided by Etan Scherzer, CRI. xmpsim.asc The text The Cray X-MP/Model 24, A Case Study in Pipelined Architecture and Vector Processing by Kay A. Robbins and Steven Robbins (Springer-Verlag) is an excellent source for explanations and examples of the performance of the Cray at the assembler level. Details on sections used from Hennessy & Patterson ph-intro.asc Introductory discussion of the aims and orientation of the course and the use of the text Computer Architecture: A Quantitative Approach by Hennessy and Patterson (Morgan-Kaufmann, Pub., San Mateo, CA) ph-chap1.asc This chapter establishes a definition of performance, presents Amdahl's Law, and defines and uses terms such as latency and throughput. I added Gantt charts to illustrate the assembly line example to agree with later discussions of pipelined CPUs. (the handout cray-arc.asc from TR-OPT fits in well here also) ph-chap2.asc We are only interested in performance in this chapter. The treatment of cost is oriented toward someone designing a computer architecture. We are CONSUMERS of an architecture, not its designer in this course (of course it's a very complex architecture we are studying). This chapter discusses MIPS, MFLOPS and their limitations as measures of performance. ph-chap3.asc Really only interested in Section 3.7 - The Role of High-Level Languages and Compilers. The Cray Y-MP is a register- register machine. There is a nice example using a graph coloring algorithm for register allocation. Students should try to develop an intuitive idea of what the compiler really does for them. The compiler is the major software tool that aids in effective use of the supercomputer. (the handout cray-com.asc fits well here) ph-chap4.asc This chapter presents and discusses instruction sets for the VAX, IBM 360/370, Intel 8086 and DLX. DLX is the architecture the book develops for all its examples and is our only interested in this chapter. Section 4.5 present the DLX instruction set and gives examples of its use. In Chapter 6, pipelining will be presented using this instruction set. This is an important concept for understanding the performance of the Cray. In Chapter 7, this instruction set is extended to DLXV to implement vector processing (and descriptions of chaining, vector stride, strip mining vector loops, and more). So it is essential to work through lots of examples using DLX so that students are comfortable with the material in Chapter 6 and the extensions in Chapter 7. (add cray-scl.asc here) ph-chap5.asc This chapter establishes the basic steps of execution: instruction fetch, instruction decode and register fetch, execution of effective address, memory access and branch completion, write-back step. Students probably don't realize that at the machine (or assembler) level there are many tasks to be performed to accomplish something as simple as ADD R1,R2,R3 This will be important when Chapter 6 discusses pipelining. (add the handout cray-con.asc here) ph-over6.asc Chapters 6 and 7 are the main goal of the course lecture material. The concepts of: pipeline operation of segmented scalar computational units vector registers segmented vector functional units and chained pipelines vector stride, memory bank conflicts and stalls in the pipe strip mining vector loops are covered in these two chapters. Our recurring example will be the saxpy. We first examine Chapter 6 and its sections: 6.1 What is pipelining? 6.2 The basic pipeline for DLX 6.3 Making the pipeline work 6.4 The major hurdle of pipelinig - hazards Introduces forwarding which is crutial for understanding chaining of functional units 6.6 Extending the DLX pipeline to handle multicycle operations 6.8 Advanced pipeling - taking advantage of more instruction-level parallelism 6.12 Historical perspective and references Important homework problems: 6-11 to 6-19 on saxpy ph-over7.asc This chapter presents the DLVX instruction set that is used in the examples and homework problems (add handouts cray-ch3.asc and tropt-3.asc here). phover7b.asc This covers section 7.6 of the text - Enhancing Vector Performance. This discusses the VERY important concept of chaining. Important homework problems: 7-1 to 7-6, 7-8 to 7-10. Most of these were worked in lecture so students would get lots of practice with timings. (add handout cray-ch7.asc and tropt-7.asc here) phover7c.asc The real guts of the classes - end of Chapter 7 of Hennessy and Patterson 7.7 Putting it all together: Evaluating the performance of vector processors 7.8 Fallacies and pitfalls - interesting reading 7.9 Concluding remarks - interesting reading 7.10 Historical perspective and references. VERY interesting reading
Pages from Cray documents TR-OPT and TR-YSAAP cray-arc.asc These pages included a diagram of the hardware components of the Cray Y-MP, diagram of the 8 CPUs and their memory and communcations sections, a table of the 14 separate functional units with the registers they use and time (in clock periods) and a block diagram of a single CPU. cray-com.asc These pages included a quick look at the Fortran Compiling System (cf77) and the Standard C system. This presented some standard ideas for scalar optimization. These are operations performed by the compilers students are accustomed to using on the scalar machines at SDSU, e.g. expression reordering, constant folding, common subexpression elimination). This handout also discusses the phases of the compilation process: source statement processing, scalar optimization, vectorization, code generation. cray-scl.asc These pages covered another look at the registers, functional units and memory access paths for the Cray Y-MP. cray-con.asc These pages cover the control section of the CPU. cray-ch3.asc Students were given copies of Chapter 3 of TR-OPT. This chapter presents a view of Vectorization on the Cray Y-MP. It has many examples and I found it very readable. tropt-3.asc I coded up a Fortran code from the examples in Chapter 3 and we went over this in class. I also showed students the tropt3.m file which was produced by cf77's Fortran vectorization preprocessor (fpp). This gets students accustomed to the powerful tools provided on the Cray to aid in optimizing codes. cray-ch7.asc Students were given copies of Chapter 7 of TR-OPT. This chapter presents a vector of Common CPU optimization techniques. tropt-7.asc I coded up Fortran code from the examples in Chapter 7 and ran before and after timings for the standard (original) coding and the optimized (modified) coding. Also there were two setting used in compiling the code - one with all optimization turned off and one with standard optimization. So there are four sets of execution times to be examined. The base case is original/no-optimization, but students can gain a feeling for how smart the cf77 compiling environment is by comparing the execution times for original with optimization-on. The source code (tropt7.f) and tropt7.m which is the translated code from fpp are both included.
Actually running on the Cray Kris Stewart, Assoc. Prof., SDSU (stewart@cs.sdsu.edu) Senior Fellow, SDSC (619) 942-1012 This file (readme.3rd) provides detail pertinent to actually running codes on the Cray Y-MP at SDSC. These files are in /pub/sdscinfo/Supercomputing-Course-Notes anonymous FTP from rohan.sdsu.edu Students received their own copy of SDSC User's Guide. This is an excellent document written by the SDSC consultants giving a thorough introduction to the Cray, its programming tools, applications packages and more. Given human nature, the User's Guide's thoroughness tends to dissuade users from just sitting down and reading it cover to cover. I therefore e-mailed students a copy of sdsc-gid.asc A Road Map for the SDSC User's Guide highlighting portions students should pay particular attention to. crayacce.asc This gives information which instructors could obtain and then make available to their students on their home machines. Includes especially useful man pages, the location on the Cray Y-MP of some sample Fortran and C codes used in TR-OPT. Also gives my recommended calling options for cf77 and cc/cl for generating informative examples and listings. Students were given the location of crayfopt.asc and cray-exc.asc on our home machine since they are very long. They contain detailed information that students could examine to familiarize themselves with the SDSC Cray tools without using up their own CPU allocation. Since students will have a finite amount of CPU time on the Cray, I tried to provide a lot of information that most users would obtain in initial explorations on a new machine. I wanted to avoid having 30 separate students repeat the same explorations. Therefore, students were e-mailed a copy of initcray.asc a quick update on accessing the Cray from SDSU. Highlights the Unix processors on the Cray and other SDSC specific things - like DTI - that Cray users need to know about. I think most people will find the initcray.asc file informative, though parts of it are specific to accessing the SDSC Cray from facilities at SDSU. It has hints on Cray usage. It presents a log of an actual session, so that you can see how the logon process proceeds. The Cray news file is read. The doc processor is used to find the file optimiz which is a very informative SDSC Fortran document). This file is copied to a local file (since we see that it is approximately 100 pages long) so that you can edit it, or download it to another machine, or whatever. initcray.asc has names for the off-site printers (at SDSU) that you can have your output sent to. If you generate output on the Cray and do not specify that it be routed to a machine to your particular site, then the output will be sent to you via U.S. mail. This will take a couple of days.return to Kris' Home Page