All: Design and Analysis of Profile-Based Optimization in Compaq's Compilation Tools for Alpha; Journal of Instruction-Level Parallelism 3 (2000) 1-25
The above paper based on this paper the existing tracer pass (This pass performs the tail duplication needed for superblock formation.) is Implemented in the GCC. There is another optimization that of interest in the above paper is the following. Live on Exit Renamer: This optimizations tries to remove a constraint that force the compiler to create long dependent chains of operations in unrolled loops. The following example While (a[i] != key) Return I; Fig(1) Unrolled Loop: 1.While (a[i] == key) { 2.I = I +1; 3. If(a[i] == key ) goto E 4. I = i+1; 5. If(a[i] == key) goto E 6.I = i+1; 7.} 8.E: return; Fig(2) Live on Exit renamer transformation. While ( a[i] == key) { I1 = I +1; If( a[i1] == key) goto E1 I2 = i+2; If(a[i2] == key) goto E2; I3 = i+3; } E: return I; E1: I = i1 goto E E2: I = i2 goto E Fig(3). The above transformation removes the Liveness of exits and make the unrolled loop non-overlapping. Thus the line 4 in Fig(2) cannot be moved Above 3 because of Live on Exit. The transformation in the Fig(3) remove the Live on Exits and the register allocator can be allocated with optimized Register sets. This can form the non-overlapping regions in the unrolled loop. I am not sure why the above optimization is not implemented in GCC. If there is no specific reasons I would like the implement the same. Thanks & Regards Ajit