CS 575 Supercomputing - Lecture Outline
Chapter 2 - High Performance Microprocessors
BAM 251, MW 4-5:15pm, 09Sept2002

Dr. Kris Stewart (stewart@sdsu.edu)
San Diego State University
Office Hours: MW 2-3:30pm GMCS 535

This URL is stewart.sdsu.edu/cs575/lecs/ch02.html

Conceptual Computer

Central Processing Unit (CPU)
Main Memory
File Space/Hard Drive Space (for users; for sysem software)
I/O Connections (to monitor, to keyboard, to Internet and other computers)

Why CISC? (Complex Instruction Set Computer)

In 1996, the IEEE celebrated the 50th anniversary of the Eniac computer, ; IEEE CS Timeline 76pg local copy
In the days of the Eniac (1943, John Mauchly and J. Presper Eckert), the U. S. Army wished to have a mechanical computing engine to replace the human computing engines currently used to produce ballastics charts for cannons. Computer time was scarce and people time was available. Assembly language programming was the only higher level language available and powerful instructions would benefit the human programmers.

Space and Time
The CPU clock refers to the synchronization of instructions, think of this as a drummer keeping time for a marching band. As each "click of the clock" an action is completed. Suppose you have a 12.5 ns clock (ns = nanosecond = 10^-9 second). How many instructions will complete in one second?
1/12.5*10^-9 = 0.08 * 10^9 = 80 * 10^6 = 80 MHz
A Hertz is the unit of frequency equal to one cycle per second.
Beliefs about Complex Instruction Sets
They were right for the early days (1950's, 60's, 70's?).
Note: Knuth's fundamental paper on LR parsing for compilers appeared in 1968.

Fundamentals of RISC

Characterizing RISC

Instruction Pipelining - Subdivide an operation into equal amounts of work (analagous to the assembly line for auto manufacturing).

Fig2.1 a Pipeline
© O'Reilly Publishers (Used with permission)
Variable length instructions pose questions, such as
- how many bytes the instruction uses?;
- what type of instruction is it?;
- where do the operands come from?
- where does the result go?
Fig2.4 Variable length instructions make pipelining difficult
© O'Reilly Publishers (Used with permission)
Stages of 3 pipelined instruction sequences, in time
Fig. 2.2 Thres instructions in flight throuigh pipeline
© O'Reilly Publishers (Used with permission)

Pipelining in floating-point execution. As text indicates, if you throw enough hardware at the floating-point operations, they can produce a result every clock. This characteritized the unafforable Cray Floating Point Functional Units on the Cray T90, which we will not be using since it has been phased out at the San Diego Supercomputer Center.

Uniform instruction length

Variable length versus Fixed length

Fig 2.5 Variable-length CISC versus Fixed length RISC instructions Note typo: Second R3 should be R4
© O'Reilly Publishers (Used with permission)

Detailed example from instructor Expanded example Fig. 2-5
Note: Our goal is to interact effectively with the compiler, which is charged with generating efficient assembly language when translating our high level language programs in C or Fortran 90. You are invited to examine the wealth of options available to you for interacting with the compiler, using the Unix command

man cc | more

or wait for our lab and explorations.

Delayed Branches

Fig 2.3 Detecting a branch © O'Reilly Publishers (Used with permission)
A delayed branch instruction would interrupt the "pipeline" and might delay all the processing of the following instructions.

Load/Store architecture: Memory references limited to loading to a register, for subsequent operations, or storing the results from a register back to memory.

Uniform length instructions impose this budget of bits
Reduce decoding of what instruction will accomplish
Load/store from memory typically take much longer than arithmetic

Simple address modes: Completes the RISC goal of FAST execution of a very large number of simple instructions.

Second-Generation RISC Processors

Three basic methods:

Make clock rate faster - design technique for DEC Alpha processors
Duplicate compute elements with freed-up chips space (Superscalar)

Fig 2.6 Decomposing a serial stream
© O'Reilly Publishers (Used with permission)
Increase number of stages in the pipeline

Fig 2.7 MIPS R4000 instruction pipeline
© O'Reilly Publishers (Used with permission)

CS 575 Supercomputing - Lecture Outline Chapter 2 - High Performance Microprocessors BAM 251, MW 4-5:15pm, 09Sept2002

Dr. Kris Stewart (stewart@sdsu.edu) San Diego State University Office Hours: MW 2-3:30pm GMCS 535