https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91573
Bug ID: 91573 Summary: Vectorization failure for a loop to do multiply-add Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com Target Milestone: --- The following code can not be vectorized ( compiling with gcc -O3 ): === begin code === char src[512]; char dst[512]; #define WIDTH 8 void foo(int height, int a, int b, int c, int d, int dst_stride) { char * ptr_src = src; char * ptr_dst = dst; for( int y = 0; y < height; y++ ) { for( int x = 0; x < WIDTH; x++ ) ptr_dst[x] = ( a*ptr_src[x] + b*ptr_src[x+1] + c*ptr_src[x] + d*ptr_src[x+1]) >> 6; ptr_dst += dst_stride; ptr_src += 32; } } === end code === However, the case can be vectorized with either following modifications: 1) If the calculation is simpler, e.g. ptr_dst[x] = ( a*ptr_src[x] + c*ptr_src[x] ) >> 6; 2) If WIDTH is larger. e.g. #define WIDTH 16 This case is a hot loop from real application. It can be exposed on both AArch64 and X86-64 platform.