1 High Performance Computing - 2nd Edition 1998 O'Reilly Publishers Kevin Dowd & Charles Severansen 2 Introduction to High Performance Computing 3 Modern Computer Architectures 3.1 Memory 3.1.1 Introduction 3.1.2 Memory Technology 3.1.3 Registers 3.1.4 Caches 3.1.5 Cache Organization 3.1.6 Virtual Memory 3.1.7 Improving Memory Performance 3.1.8 Closing Notes 3.1.9 Exercises 3.2 Floating-Point Numbers 3.2.1 Introduction 3.2.2 Reality 3.2.3 Representation 3.2.4 Effects of Floating-Point Representation 3.2.5 More Algebra That Doesn't Work 3.2.6 Improving Accuracy Using Guard Digits 3.2.7 History of IEEE Floating-Point Format 3.2.8 IEEE Operations 3.2.9 Special Values 3.2.10 Exceptions and Traps 3.2.11 Compiler Issues 3.2.12 Closing Notes 3.2.13 Exercises 4 Programming and Tuning Software 4.1 What a Compiler Does 4.1.1 Introduction 4.1.2 History of Compilers 4.1.3 Which Language To Optimize 4.1.4 Optimizing Compiler Tour 4.1.5 Optimization Levels 4.1.6 Classical Optimizations 4.1.7 Closing Notes 4.1.8 Exercises 4.2 Timing and Profiling 4.2.1 Introduction 4.2.2 Timing 4.2.3 Subroutine Profiling 4.2.4 Basic Block Profilers 4.2.5 Virtual Memory 4.2.6 Closing Notes 4.2.7 Exercises 4.3 Eliminating Clutter 4.3.1 Introduction 4.3.2 Subroutine Calls 4.3.3 Branches 4.3.4 Branches With Loops 4.3.5 Other Clutter 4.3.6 Closing Notes 4.3.7 Exercises 4.4 Loop Optimizations 4.4.1 Introduction 4.4.2 Operation Counting 4.4.3 Basic Loop Unrolling 4.4.4 Qualifying Candidates for Loop Unrolling Up one level 4.4.5 Nested Loops 4.4.6 Loop Interchange 4.4.7 Memory Access Patterns 4.4.8 When Interchange Won't Work 4.4.9 Blocking to Ease Memory Access Patterns 4.4.10 Programs That Require More Memory Than You Have 4.4.11 Closing Notes 4.4.12 Exercises 5 Shared-Memory Parallel Processors 5.1 Understanding Parallelism 5.1.1 Introduction 5.1.2 Dependencies 5.1.3 Loops 5.1.4 Loop-Carried Dependencies 5.1.5 Ambiguous References 5.1.6 Closing Notes 5.1.7 Exercises 5.2 Shared-Memory Multiprocessors 5.2.1 Introduction 5.2.2 Symmetric Multiprocessing Hardware 5.2.3 Multiprocessor Software Concepts 5.2.4 Techniques for Multithreaded Programs 5.2.5 A Real Example 5.2.6 Closing Notes 5.2.7 Exercises 5.3 Programming Shared-Memory Multiprocessors 5.3.1 Introduction 5.3.2 Automatic Parallelization 5.3.3 Assisting the Compiler 5.3.4 Closing Notes 5.3.5 Exercises 6 Scalable Parallel Processing 6.1 Language Support for Performance 6.1.1 Introduction 6.1.2 Data-Parallel Problem: Heat Flow 6.1.3 Explicity Parallel Languages 6.1.4 FORTRAN 90 6.1.5 Problem Decomposition 6.1.6 High Performance FORTRAN (HPF) 6.1.7 Closing Notes 6.2 Message-Passing Environments 6.2.1 Introduction 6.2.2 Parallel Virtual Machine 6.2.3 Message-Passing Interface 6.2.4 Closing Notes 7 Appendixes 7.1 Appendix C: High Performance Microprocessors 7.1.1 Introduction 7.1.2 Why CISC? 7.1.3 Fundamental of RISC 7.1.4 Second-Generation RISC Processors 7.1.5 RISC Means Fast 7.1.6 Out-of-Order Execution: The Post-RISC Architecture 7.1.7 Closing Notes 7.1.8 Exercises 7.2 Appendix B: Looking at Assembly Language 7.2.1 Assembly Language