Program runs slow because not all resources are used:
- not using opportunities to go superscalar (ILP)
- scheduling of instructions is not optimal (too many wait states)
- memory access:
- not all data in cache line is used (spatial locality)
- data in the cache in not reused (temporal locality)
Performance analysis is used to diagnose the problem.
Compiler will attempt to optimize the program for the given Architecture:
- data structure can inhibit compiler optimizations
- algorithm presentations can inhibit compiler optimizations
Often it is necessary to rewrite critical part of code (loops) in the program so that compiler can do better performance optimization.
Understand compiler optimizations techniques