Summary
- Auto-Parallelizing Compilers can do a lot, but they are limited to loop level parallelism and in analysis of data dependencies
- Manual parallelization with directives is easy to implement, however care must be given to the treatment of data
- It is important to understand the exact semantics of the parallelization directives
- To obtain scalable performance it is often necessary to make program parallel at coarse-grain level, e.g. across subroutine invocations
- To obtain scalable performance it is often necessary to take care of proper data distribution on the machine
- It is necessary to profile parallel programs to diagnose cases of load imbalance and parallelization overhead.