pr108692.c fails on LoongArch

tnfchris at gcc dot gnu.org via Gcc-bugs Mon, 03 Feb 2025 05:22:47 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118727


--- Comment #17 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Xi Ruoyao from comment #16)
> (In reply to Tamar Christina from comment #15)
> > (In reply to Xi Ruoyao from comment #13)
> > > For example for the original gcc.dg/pr108692.c:
> > > 
> > >   a.0_4 = (unsigned char) a_14;
> > >   _5 = (int) a.0_4;
> > >   b.1_6 = (unsigned char) b_16;
> > >   _7 = (int) b.1_6;
> > >   c_17 = _5 - _7; 
> > >   _8 = ABS_EXPR <c_17>;
> > >   r_18 = _8 + r_23;
> > > 
> > > becomes:
> > > 
> > >   patt_31 = .ABD (a.0_4, b.1_6);
> > > 
> > > so when we get the ABD we already stripped the redundant (int) conversion
> > > here, note that the (unsigned char) conversion is really needed.  And for 
> > > my
> > > twisted test case:
> > > 
> > >   a.0_4 = (unsigned char) a_14;
> > >   _5 = (unsigned int) a.0_4;
> > >   _6 = (unsigned int) b_16;
> > >   _7 = _5 - _6; 
> > >   c_17 = (int) _7; 
> > >   _8 = ABS_EXPR <c_17>;
> > > 
> > > becomes:
> > > 
> > >   patt_38 = (unsigned short) a.0_4;
> > >   patt_35 = (signed short) b_16;
> > >   patt_36 = (signed short) patt_38;
> > >   patt_33 = .ABD (patt_36, patt_35);
> > > 
> > > To me this is already the "optimal" way.  Correct me if I'm wrong, but
> > > anyway even if there's a redundant conversion it should be stripped in
> > > vect_recog_abd_pattern instead of vect_recog_sad_pattern.
> > 
> > No, that's not correct. SAD_EXPR is the most optimal version here as
> > SAD_EXPR performs double widening.  It removes the need for the intermediate
> > vector widening.
> > 
> >   a.0_4 = (unsigned char) a_14;
> >   patt_b_16 = (unsigned char) b_16
> >   patt_41 = SAD_EXPR <a.0_4, patt_b_16, r_23>;
> > 
> > is the most optimal form, as it does the operation on bytes rather than
> > short.
> 
> I don't think it's correct...  In the twisted test case we have something
> like
> 
> r += abs((unsigned int)(unsigned char) a - (signed int)(signed char) b)
> 
> So if a is 255 and b is 128, we should add r with abs(255 - (-128)) = 383,
> but with this form we'll add it with 127.

The operation you posted above is an unsigned operation, in the original
expression the subtract is done on zero extended bytes.

> > > so when we get the ABD we already stripped the redundant (int) conversion
> > > here, note that the (unsigned char) conversion is really needed.  And for 
> > > my
> > > twisted test case:
> > > 
> > >   a.0_4 = (unsigned char) a_14;
> > >   _5 = (unsigned int) a.0_4;
> > >   _6 = (unsigned int) b_16;
> > >   _7 = _5 - _6; 
> > >   c_17 = (int) _7; 



(In reply to Xi Ruoyao from comment #16)
> (In reply to Tamar Christina from comment #15)
> > (In reply to Xi Ruoyao from comment #13)
> > > For example for the original gcc.dg/pr108692.c:
> > > 
> > >   a.0_4 = (unsigned char) a_14;
> > >   _5 = (int) a.0_4;
> > >   b.1_6 = (unsigned char) b_16;
> > >   _7 = (int) b.1_6;
> > >   c_17 = _5 - _7; 
> > >   _8 = ABS_EXPR <c_17>;
> > >   r_18 = _8 + r_23;
> > > 
> > > becomes:
> > > 
> > >   patt_31 = .ABD (a.0_4, b.1_6);
> > > 
> > > so when we get the ABD we already stripped the redundant (int) conversion
> > > here, note that the (unsigned char) conversion is really needed.  And for 
> > > my
> > > twisted test case:
> > > 
> > >   a.0_4 = (unsigned char) a_14;
> > >   _5 = (unsigned int) a.0_4;
> > >   _6 = (unsigned int) b_16;
> > >   _7 = _5 - _6; 
> > >   c_17 = (int) _7; 
> > >   _8 = ABS_EXPR <c_17>;
> > > 
> > > becomes:
> > > 
> > >   patt_38 = (unsigned short) a.0_4;
> > >   patt_35 = (signed short) b_16;
> > >   patt_36 = (signed short) patt_38;
> > >   patt_33 = .ABD (patt_36, patt_35);
> > > 
> > > To me this is already the "optimal" way.  Correct me if I'm wrong, but
> > > anyway even if there's a redundant conversion it should be stripped in
> > > vect_recog_abd_pattern instead of vect_recog_sad_pattern.
> > 
> > No, that's not correct. SAD_EXPR is the most optimal version here as
> > SAD_EXPR performs double widening.  It removes the need for the intermediate
> > vector widening.
> > 
> >   a.0_4 = (unsigned char) a_14;
> >   patt_b_16 = (unsigned char) b_16
> >   patt_41 = SAD_EXPR <a.0_4, patt_b_16, r_23>;
> > 
> > is the most optimal form, as it does the operation on bytes rather than
> > short.
> 
> I don't think it's correct...  In the twisted test case we have something
> like
> 
> r += abs((unsigned int)(unsigned char) a - (signed int)(signed char) b)
> 
> So if a is 255 and b is 128, we should add r with abs(255 - (-128)) = 383,
> but with this form we'll add it with 127.

Ah right, because b didn't have an explicit cast to unsigned char first and
it's unsigned in this case.

I was however arguing about:

int
foo (unsigned char *x, unsigned char *y, int n)
{
  int i, r = 0;
  unsigned char a, b;
  for (i = 0; i < n; i++)
    {
      a = x[i];
      b = y[i];
      int c = (unsigned char) a - (unsigned char) b;
      r = r + (c < 0 ? -c : c);
    }
  return r;
}

where vect_recog_over_widening_pattern will split the zero extension into:

/app/example.c:6:17: note:   Splitting statement: _4 = (int) a_12;
/app/example.c:6:17: note:   into pattern statements: patt_43 = (unsigned
short) a_12;
/app/example.c:6:17: note:   and: patt_42 = (int) patt_43;

and so SAD_EXPR needs to recover from the intermediate split.

However you're right that ABD already does this correctly because
vect_recog_absolute_difference handles it through vect_widened_op_tree.

So indeed I agree that the calls are redundant for vect_recog_sad_pattern

[Bug tree-optimization/118727] [15 Regression] gcc.dg/pr108692.c fails on LoongArch

Reply via email to