https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121225

--- Comment #2 from Dusan Stojkovic <dusan.stojko...@rt-rk.com> ---
Created attachment 62005
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62005&action=edit
Defer POINTER_PLUS_EXPR bswap implementations


> The bswap pass in this case should not insert another load but should
> perform an appropriate conversion (and shift).  Or refrain from doing
> anything at all.

Something like this might be a start. It follows the refraining principle.

This patch looks for an input GIMPLE instruction statement which is a MEM_REF.
If the underlying rhs SSA variable is a POINTER_PLUS_EXPR, the bswap transform
doesn't get finalized. This effectively saves the opportunity to vectorize the
original vbswap8, which was submitted as an example to the bug report.

> The vectorizer doesn't handle the case when the same memory area is
> referenced with different type (sizes).  That can be seen with the
> following simplified testcase:
>
> void vbswap8(unsigned int* in, int len)                
> {
>     for (int i = 0; i < len; i++)
>         in[i] = (in[i] & 0xffff0000) | *(char *)&in[i];
> }

The input to bswap for this example:

void vbswap8 (unsigned int * in, int len)
{
  int i;
  long unsigned int _1;
  long unsigned int _2;
  unsigned int * _3;
  unsigned int _4;
  unsigned int _5;
  char _6;
  unsigned int _7;
  unsigned int _8;

  <bb 2> [local count: 118111600]:
  # DEBUG BEGIN_STMT
  # DEBUG i => 0
  # DEBUG BEGIN_STMT
  if (len_12(D) > 0)
    goto <bb 3>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 3> [local count: 955630224]:
  # i_18 = PHI <i_15(3), 0(2)>
  # DEBUG i => i_18
  # DEBUG BEGIN_STMT
  _1 = (long unsigned int) i_18;
  _2 = _1 * 4;
  _3 = in_13(D) + _2;
  _4 = *_3;
  _5 = _4 & 4294901760;
  _6 = MEM[(char *)_3];
  _7 = (unsigned int) _6;
  _8 = _5 | _7;
  *_3 = _8;
  # DEBUG BEGIN_STMT
  i_15 = i_18 + 1;
  # DEBUG i => i_15
  # DEBUG BEGIN_STMT
  if (len_12(D) > i_15)
    goto <bb 3>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 4> [local count: 118111600]:
  return;

}

Already has a MEM_REF as the input to bswap (while the other example does not)
so while this patch discovers the POINTER_PLUS_EXPR it changes nothing because
a bswap implementation is not found here anyway.

Regression tested for x86.

Reply via email to