https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121225
--- Comment #2 from Dusan Stojkovic <dusan.stojko...@rt-rk.com> --- Created attachment 62005 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62005&action=edit Defer POINTER_PLUS_EXPR bswap implementations > The bswap pass in this case should not insert another load but should > perform an appropriate conversion (and shift). Or refrain from doing > anything at all. Something like this might be a start. It follows the refraining principle. This patch looks for an input GIMPLE instruction statement which is a MEM_REF. If the underlying rhs SSA variable is a POINTER_PLUS_EXPR, the bswap transform doesn't get finalized. This effectively saves the opportunity to vectorize the original vbswap8, which was submitted as an example to the bug report. > The vectorizer doesn't handle the case when the same memory area is > referenced with different type (sizes). That can be seen with the > following simplified testcase: > > void vbswap8(unsigned int* in, int len) > { > for (int i = 0; i < len; i++) > in[i] = (in[i] & 0xffff0000) | *(char *)&in[i]; > } The input to bswap for this example: void vbswap8 (unsigned int * in, int len) { int i; long unsigned int _1; long unsigned int _2; unsigned int * _3; unsigned int _4; unsigned int _5; char _6; unsigned int _7; unsigned int _8; <bb 2> [local count: 118111600]: # DEBUG BEGIN_STMT # DEBUG i => 0 # DEBUG BEGIN_STMT if (len_12(D) > 0) goto <bb 3>; [89.00%] else goto <bb 4>; [11.00%] <bb 3> [local count: 955630224]: # i_18 = PHI <i_15(3), 0(2)> # DEBUG i => i_18 # DEBUG BEGIN_STMT _1 = (long unsigned int) i_18; _2 = _1 * 4; _3 = in_13(D) + _2; _4 = *_3; _5 = _4 & 4294901760; _6 = MEM[(char *)_3]; _7 = (unsigned int) _6; _8 = _5 | _7; *_3 = _8; # DEBUG BEGIN_STMT i_15 = i_18 + 1; # DEBUG i => i_15 # DEBUG BEGIN_STMT if (len_12(D) > i_15) goto <bb 3>; [89.00%] else goto <bb 4>; [11.00%] <bb 4> [local count: 118111600]: return; } Already has a MEM_REF as the input to bswap (while the other example does not) so while this patch discovers the POINTER_PLUS_EXPR it changes nothing because a bswap implementation is not found here anyway. Regression tested for x86.