CS 575 Supercomputing - Lecture Outline
Chapter 2 - High Performance Microprocessors
BAM 251, MW 4-5:15pm, 09Sept2002

Dr. Kris Stewart (stewart@sdsu.edu)
San Diego State University
Office Hours: MW 2-3:30pm GMCS 535

This URL is stewart.sdsu.edu/cs575/lecs/ch02.html

Conceptual Computer

Why CISC?

(Complex Instruction Set Computer)
In 1996, the IEEE celebrated the 50th anniversary of the Eniac computer, ; IEEE CS Timeline 76pg local copy
In the days of the Eniac (1943, John Mauchly and J. Presper Eckert), the U. S. Army wished to have a mechanical computing engine to replace the human computing engines currently used to produce ballastics charts for cannons. Computer time was scarce and people time was available. Assembly language programming was the only higher level language available and powerful instructions would benefit the human programmers.

Fundamentals of RISC

Characterizing RISC

  • Pipelining in floating-point execution. As text indicates, if you throw enough hardware at the floating-point operations, they can produce a result every clock. This characteritized the unafforable Cray Floating Point Functional Units on the Cray T90, which we will not be using since it has been phased out at the San Diego Supercomputer Center.

  • Uniform instruction length

    Variable length versus Fixed length


    Fig 2.5 Variable-length CISC versus Fixed length RISC instructions Note typo: Second R3 should be R4
    © O'Reilly Publishers (Used with permission)

    Detailed example from instructor Expanded example Fig. 2-5
    Note: Our goal is to interact effectively with the compiler, which is charged with generating efficient assembly language when translating our high level language programs in C or Fortran 90. You are invited to examine the wealth of options available to you for interacting with the compiler, using the Unix command

    man cc | more

    or wait for our lab and explorations.

  • Delayed Branches


    Fig 2.3 Detecting a branch © O'Reilly Publishers (Used with permission)
    A delayed branch instruction would interrupt the "pipeline" and might delay all the processing of the following instructions.

  • Load/Store architecture: Memory references limited to loading to a register, for subsequent operations, or storing the results from a register back to memory.
    • Uniform length instructions impose this budget of bits
    • Reduce decoding of what instruction will accomplish
    • Load/store from memory typically take much longer than arithmetic

  • Simple address modes: Completes the RISC goal of FAST execution of a very large number of simple instructions.

    Second-Generation RISC Processors

    Three basic methods:

    1. Make clock rate faster - design technique for DEC Alpha processors
    2. Duplicate compute elements with freed-up chips space (Superscalar)


      Fig 2.6 Decomposing a serial stream
      © O'Reilly Publishers (Used with permission)

    3. Increase number of stages in the pipeline


      Fig 2.7 MIPS R4000 instruction pipeline
      © O'Reilly Publishers (Used with permission)