CS 575 Supercomputing for the Sciences
Ch. 12 Large Scale Parallel Computing
10 November 2003

Dr. Kris Stewart (stewart@rohan.sdsu.edu)
San Diego State University

This URL is http://www.stewart.cs.sdsu.edu/cs575/lecs/ch12.html

MPI on Rohan

Fig. 12-1: Amdahl's Law for a 95% parallel application
© O'Reilly Publishers (Used with permission)

Gene Amdahl, architect of IBM 360 computers, characterized how well an application can make sure of multiple processors. This was in the very early days of computing and focussed on the ratio of the "parallel portion" of a code to the "serial portion" of a code and gave pessimistic predictions of achievable performance, before the existence of parallel computing hardware. Now that the hardware has been developed and tools are available to assist the programmer the focus is more on rethinking algorithms to be parallel from the start.

Recall, the Bus Architecture of the PC

Fig. 10-2: A typical bus architecture

© O'Reilly Publishers (Used with permission)

as well as the concept of the Crossbar, quite expensive, which are discussed in Chapter 10.

Fig. 10-3: A crossbar

© O'Reilly Publishers (Used with permission)
these are two examples of the sort of interconnect that have been used to link electronic devices.

Fig. 12-2: Connecting processors to memory
© O'Reilly Publishers (Used with permission)

Processors will be wired to use the interconnect to communicate with other processors and to communicate with memory.

Fig. 12-3: Connecting nodes to one another
© O'Reilly Publishers (Used with permission)

Processors can also have their own local memory and the ability to communicate with other processors and their individual local memory.


Fig. 12-4: Pipelined multistage interconnection
© O'Reilly Publishers (Used with permission)

To help manage the latency for communications, the message from source to destination can use the first six bits, e.g. "011011" to specify the second (01) device, then the third (10) and four (11) devices on the path to the destination.

Fig. 12-5: Multistage interconnection network
© O'Reilly Publishers (Used with permission)

The MIN - Multistage Interconnect Network - scales better than the bus or crossbar - avoiding the expensive hardware of the crossbar and the contention of sharing the bus.

Michael Flynn developed a taxonomy to characterize the possible parallel architectures, based on single instruction (SI) stream or multiple instruction (MI) stream. Flynn also characterized parallel systems using a single data (SD) stream or multiple data (MD) stream.

Fig. 12-10: Flynn's taxonomy

© O'Reilly Publishers (Used with permission)

Table 12.1 Features of Shared Uniformed Memory Systems
------------------------------------------------------------------
System		|Processor	| Max CPUs	| Memory Bandwidth
------------------------------------------------------------------
SGI Power 	| MIPS-R1000	| 36		| 1.2 GB/sec (bus)
  Challenge	|		|		|
DEC 8400	| Alpha-21164q	| 14		| 1.8 GB/sec (bus)
Sun E6000	| UltraSparc-2	| 30		| 2.5 GB/sec (bus)
Sun E10000	| UltraSparc-2	| 64		| 13 GB/sec (crossbar)
*** Note: SDSU has an E10000 - what do you think it is used for? **** 
HP Exemplar	| PA-8000	| 16		| 15 GB/sec (crossbar)
Cray T90	| Cray Vector	| 32		| 800 GB/sec (crossbar

Fig. 12-12: Architecture versus time: top 500 report

© O'Reilly Publishers (Used with permission)
Top 500 online Where is SDSC?

Return to CS 575 Home Page