http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55916
Jouko Orava <jouko.orava at iki dot fi> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jouko.orava at iki dot fi --- Comment #8 from Jouko Orava <jouko.orava at iki dot fi> --- These issues affect probably all versions of gfortran, but I have verified this occurs with 4.6.4, 4.7.3, 4.8.1, and trunk revision 207606. It does seem unlikely that GNU libc malloc() will ever be modified to return sufficiently aligned pointers for vector types (or for long double on 32-bit ppc). I have a patch under testing against trunk that modifies libgfortran internal xmalloc() and xcalloc() calls, as well as the intrinsic malloc() calls, to use GNU libc specific memalign() call. I will attach it as soon as I verify it works correctly. If using memalign() is not acceptable, say so. The reason for using memalign() instead of posix_memalign() is that my (limited!) testing indicates that using memalign() has the least overhead. posix_memalign() has relatively larger overhead, possibly due to its call mechanism. aligned_alloc() is not a real possibility for now, because it was only added to glibc-2.16 (summer 2012). Computing clusters often have older libc versions, and requiring glibc-2.16 and newer would stop binaries compiled on newer gfortran versions from working. (And binaries are often compiled on newer compilers.) GCC generated code seems to assume malloc() returns a pointer aligned to at least __BIGGEST_ALIGNMENT__. Therefore marking the allocation functions with the malloc attribute (or perhaps with malloc and alloc_size) should be sufficient for GNU Fortran. (I shall also do some experiments with __builtin_assume_aligned() to see if there is any impact to the generated code, but I don't see any reason there should be. The compilation units are separate; AFAICT GCC must determine the pointer alignment from the pointer type only. That also matches current behaviour, and is the reason this bug exists.) Obviously, memalign() is obsolete, but I see the performance outweighing that here. After all, removing memalign() in a future glibc would be seen as petty, considering how tenderly emacs users have been treated; tolerating bugs for years, in order to not break existing emacs state dumps. My hardware selection is very limited, but if someone wishes to test the possibilities on other hardware, I'd be happy to clean up my benchmarking code.