http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51638
--- Comment #7 from Tobias Burnus <burnus at gcc dot gnu.org> 2012-01-05 14:26:25 UTC --- (In reply to comment #6) > > Fortran code is invalid. > This code is like it is since its first commit on 11-Jun-93 This does not mean that was and/or is valid. Newer compilers tend to optimize more aggressively than older. > I still do think there is a problem with gfortran, because 1) it is the only > one which breaks this simple code regarding our 25 years experience, and 2) > because it randomly breaks it depending on the architecture and options. Well, as long as there is no valid program which breaks, I believe that gfortran does not have a bug (with regards to the this problem report). In particular, as Steve quoted in comment 2 the 12.4.1.7 guarantees that nontarget/nonpointer arguments either do not alias or are not modified. With this in mind, let's look at the generated assembler (compile with "-S" to generate the assembler file). If one looks at the assembler for subroutine iei4ei(inpu,oupu) integer(kind=1) :: inpu(4) integer(kind=1) :: oupu(4) integer(kind=1) :: inte(4) inte(:) = inpu(:) oupu(4) = inte(1) oupu(3) = inte(2) oupu(2) = inte(3) oupu(1) = inte(4) end subroutine iei4ei one gets with -O1 the following (i386-Linux): .LFB0: subl $8, %esp ! Make space on the stack for 8 byte .LCFI0: movl %ebx, (%esp) ! Save %ebx register on the stack movl %esi, 4(%esp) ! Save %esi register on the stack .LCFI1: movl 12(%esp), %esi ! %esi = addr of inpu movl 16(%esp), %ebx ! %ebx = addr of outpu movzbl 1(%esi), %ecx ! %ecx = inpu(2) ! (zero-extended load) movzbl 2(%esi), %edx ! %edx = inpu(3) ! (zero-extended load) movzbl 3(%esi), %eax ! %ecx = inpu(4) ! (zero-extended load) movb %al, (%ebx) ! outpu(1) = inpu(4) ! al == lowest byte of eax movzbl (%esi), %eax ! %eax = inpu(1) movb %al, 3(%ebx) ! outpu(4) = inpu(1) movb %cl, 2(%ebx) ! outpu(3) = inpu(2) movb %dl, 1(%ebx) ! outpu(2) = inpu(3) movl (%esp), %ebx ! Restore ebx register movl 4(%esp), %esi ! Restore esi register addl $8, %esp ! Free the allocated stack .LCFI2: ret ! Return With x86-64-Linux, the assembler is as follows (again with -O1): .LFB0: movzbl 1(%rdi), %ecx ! ecx = inpu(2) movzbl 2(%rdi), %edx ! edx = inpu(3) movzbl 3(%rdi), %eax ! eax = inpu(4) movzbl (%rdi), %edi ! edi = inpu(1) movb %dil, 3(%rsi) ! outpu(4) = inpu(3) movb %cl, 2(%rsi) ! outpu(3) = inpu(2) movb %dl, 1(%rsi) ! outpu(2) = inpu(3) movb %al, (%rsi) ! outpu(1) = inpu(4) ret Thus, it is not surprising that it works with x86-64-Linux and that it fails with i386-Linux. However, both assembler outputs look perfectly valid - only that the second version also works if the arguments alias while the first one does not. For completeness, ifort 11.1 (ia32) generates with -O2 the code: pushl %edi ! Save %edi on the stack pushl %esi ! Save %eds on the stack pushl %ebx ! Save %ebx on the stack movl 16(%esp), %eax ! eax = inpu movl 20(%esp), %esi ! esi = outpu movzbl (%eax), %edx ! edx = inpu(1) movzbl 1(%eax), %ecx ! ecx = inpu(2) movzbl 2(%eax), %ebx ! ebx = inpu(3) movzbl 3(%eax), %eax ! eax = inpu(4) movb %dl, iei4ei_$INTE.0.1 ! inte(1) = dl(1) movb %al, (%esi) ! outpu(1) = inpu(4) movb %cl, 1+iei4ei_$INTE.0.1 ! inte(2) = dl(2) movb %dl, 3(%esi) ! outpu(4) = inpu(1) movb %cl, 2(%esi) ! outpu(3) = inpu(2) movb %bl, 2+iei4ei_$INTE.0.1 ! inte(3) = dl(3) movb %al, 3+iei4ei_$INTE.0.1 ! inte(4) = dl(4) movb %bl, 1(%esi) ! outpu(2) = inpu(3) popl %ebx ! Restore %ebx popl %esi ! Restore %esi popl %edi ! Restore %edi ret ! Return Thus, in principle ICC does the same, except that it the generated assembler is slightly differed such that it works without running into the alias issues. (I wonder why "ifort" does not optimize the "inte = inpu" away as it is never used in the assembler.) > We have never observed this behavior anywhere else for the past 25 years. I > won't be so confident that the other compilers perform the same optimization > (at least at level O1), exactly because of all the Fortran 66/77 legacy code > the Fortran community uses... I think any well-optimizing compiler takes into account that any pair of nontarget, nonpointer arguments do not alias (if one of them gets modified). The C equivalent is that the pointer arguments have the "restrict" attribute. Thus, unless the compiler is old or not well optimizing, a similar optimization should be done by other compilers. Whether that causes alias issues or not (for a given program) depends on the generated assembler instructions. As the three versions above show, it happens that only GCC's i386 assembler exposes the alias issues while Intel's version and GCC's x86-64 version do not. With a slightly different register allocation, the situation could be different. > > or make "inte" VOLATILE. > This seems an interesting path. I am just wondering about computing efficiency > then, but since this avoids making copies out or in the subroutine, this might > be the solution. Well, I think it should not be too bad, though more memory reads are required. I think the better solution is to mark the two arguments as possibly aliasing using the TARGET attribute. However, that only works if you do not have the caller and the callee in the same file - otherwise some compilers will warn or print an error that an explicit interface is required. For instance, gfortran or NAG will print an error - while "ifort -warn all" will warn. (One could argue that gfortran and NAG should only warn, however, it is always difficult to decide what to allow as vendor extension. The standard simply states that an explicit interface is required with TARGET. Additionally, if the caller does not know certain attributes, it might generate wrong code. Especially in obvious cases like VALUE, if a compiler only warns, users might miss/ignore the warning and get wrong results [e.g. ifort only warns and not even with default options].) > > (You still have the issue of passing an > > INTEGER(4) to an INTEGER(1) > > This subroutine is part of our low-level set of subroutines (here: IEEE <-> > EEEI byte swapping), which have to be as simple and efficient as possible > (they can be used on large amount of data). I do not think that the different integer kinds cause a problem in practice. Still it's invalid and might cause some trouble in particular cases. For EQUIVALENCE, the situation is similar - albeit the chance of getting into trouble is higher. As the elements of equivalence are of different type, the compiler might do some optimizations which break assumptions that setting one equivalenced variable will modify the bitpattern of the other. That's the same with C's union. With -fstrict-aliasing (enabled with -O2), the compiler does type-based analysis [see -fstrict-aliasing in the GCC manual].