Parallel Pearls from Practical Parallel Programming by Barr E. Bauer, 1992 - Parallel-safe loops can be run serially in reverse index order and give correct results. page 49 - Do I/O outside parallel regions. page 87 - Avoid indirect indexing from within parallel regions. page 95 - Do precision-sensitive reductions serially. page 98 - Avoid convoluting code just to introduce parallelization. page 99 - Proper cache management significantly contributes to performance. page 104 - Avoid requiring specific numbers of threads in the code. page 113 - Run-time bugs in programs containing local common blocks should be checked carefully for coding bugs that modify block members. page 136 - Local code executes on ALL threads. page 155 - Check pointers carefully for dependences. page 195 - Consolidate adjacent loops, where appropriate. page 199 - Isolate single-variable dependences in independent blocks. page 206 - Keep critical blocks out of pfor blocks. page 208 - Liberally comment directives used globally. page 258 - Incorrect assertions result in incorrect programs. page 259 - Use roundoff=2 for all code if roundoff error is not important. page 271 - Let the clock determine the optimum mode of execution for a loop nest. page 298 - Convert pointer arithmetic to array form within for loops before PCA analysis. page 332 - Declare parallel-safe functions in a header file using +pragma no side effects. page 348 - Let the PCA do the unrolling. page 370 - Fuse adjacent loops, when appropriate. page 373 - Parallel program debugging is far more difficult than serial program debugging. page 419 Other "Parallel Pearls" that I like from a different source: Origin 2000 Quick Reference: Multiprocessor Tuning - Step One: Tune Single Processor Performance See other side. - Step Two: Parallelize Code - Step Three: Identify Bottlenecks - Step Four: Fix False Sharing - Step Five: Tune for Data Placement Origin 2000 Quick Reference: Single-Processor Tuning (other side) - Step One: Get the Right Answer - Step Two: Use Existing Tuned Code - Step Three: Find Out Where to Tune - Step Four: Get the Compiler to Do the Work - Step Five: Tune Cache Performance