http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58280
--- Comment #5 from Freddie Witherden <freddie at witherden dot org> --- Thank you for this information. As an alternative would it be worth considering a pragma along the lines of: #pragma gcc aligned(32) which would confer that "in the first iteration of the loop which follows all relevant variables can be taken as having 32-byte alignment." This would provide quite a nice way of allowing loops like the above to be fully vectorized and further avoid the need for explicit calls to __builtin_assume_aligned. ICC has a similar directive but it only applies to the base pointers. So it would assume that "a" is aligned but not "a + i*ldim".