http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54894
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-10-11 12:22:47 UTC --- You should be using __builtin_assume_aligned builtin, i.e. double *Ap = __builtin_assume_aligned (&A[ih+il][kh], 16); instead of the hacks with the overaligned scalar pointer, that doesn't improve vectorization, but actually prohibits it. That said, of course gcc shouldn't ICE on this.