Software Pipelining (SWP)
The software pipelining is the way to mix iterations in a loop such that all processor execution slots are filled:
- SWP is performed by the Code Generator (CG), that also unrolls inner loop to achieve the best SWP schedule (-O3 opt level). This can be computationally intensive.
- Vector loops well-suited for SWP; short loops may run slower with SWP
Inhibitors to SWP:
- loops with subroutine (or intrinsic) calls cannot be SWP-ed
- loops with complicated conditionals or branching
- loops that are too long cannot be software pipelined because compiler runs out of available registers (loop fission)
- data dependence between iterations are harder to SWP