https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121225

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947
   Last reconfirmed|                            |2025-07-23
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The bswap pass recognizes the HImode swap and uses a rotate on this:

  load_dst_8 = MEM <short unsigned int> [(unsigned int *)_3];
  bswapdst_12 = load_dst_8 r>> 8;
  _13 = (unsigned int) bswapdst_12;
  _4 = *_3;
  _5 = _4 & 4294901760;
  _11 = _5 | _13;
  *_3 = _11;

The vectorizer doesn't handle the case when the same memory area is
referenced with different type (sizes).  That can be seen with the
following simplified testcase:

void vbswap8(unsigned int* in, int len)                
{
    for (int i = 0; i < len; i++)
        in[i] = (in[i] & 0xffff0000) | *(char *)&in[i];
}

I chose 'char' because that's free from TBAA issues.  The issue is

(compute_affine_dependence
  ref_a: MEM[(char *)_3], stmt_a: _6 = MEM[(char *)_3];
  ref_b: *_3, stmt_b: *_3 = _8; 
) -> dependence analysis failed
t.c:3:23: note:   dependence distance  = 0.
t.c:3:23: note:   dependence distance == 0 between *_3 and *_3
t.c:4:40: missed:   versioning for alias required: can't determine dependence
between MEM[(char *)_3] and *_3
consider run-time aliasing test between MEM[(char *)_3] and *_3
...
t.c:3:23: note:   === vect_prune_runtime_alias_test_list ===
t.c:3:23: note:   can tell at compile time that MEM[(char *)_3] and *_3 alias
t.c:4:15: missed:   not vectorized: compilation time alias: _6 = MEM[(char
*)_3];
*_3 = _8;

so we think we have to disambiguate (we don't) but fail to, so queue for
a runtime test but that figures well, there's always an overlap (of course).

The vectorizer currently would have difficulties handling this since it's
tied to a single vector size.

The bswap pass in this case should not insert another load but should
perform an appropriate conversion (and shift).  Or refrain from doing
anything at all.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Reply via email to