[RFC] Implementing detection of saturation and rounding arithmetic

2021-05-10 Thread Tamar Christina via Gcc
Hi All,

We are looking to implement saturation support in the compiler.  The aim is to
recognize both Scalar and Vector variant of typical saturating expressions.

As an example:

1. Saturating addition:
   char sat (char a, char b)
   {
  int tmp = a + b;
  return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
   }

2. Saturating abs:
   char sat (char a)
   {
  int tmp = abs (a);
  return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
   }

3. Rounding shifts
   char rndshift (char dc)
   {
  int round_const = 1 << (shift - 1);
  return (dc + round_const) >> shift;
   }

etc.

Of course the first issue is that C does not really have a single idiom for
expressing this.

At the RTL level we have ss_truncate and us_truncate and float_truncate for
truncation.

At the Tree level we have nothing for truncation (I believe) for scalars. For
Vector code there already seems to be VEC_PACK_SAT_EXPR but it looks like
nothing actually generates this at the moment. it's just an unused tree code.

For rounding there doesn't seem to be any existing infrastructure.

The proposal to handle these are as follow, keep in mind that all of these also
exist in their scalar form, as such detecting them in the vectorizer would be
the wrong place.

1. Rounding:
   a) Use match.pd to rewrite various rounding idioms to shifts.
   b) Use backwards or forward prop to rewrite these to internal functions
  where even if the target does not support these rounding instructions they
  have a chance to provide a more efficient implementation than what would
  be generated normally.

2. Saturation:
   a) Use match.pd to rewrite the various saturation expressions into min/max
  operations which opens up the expressions to further optimizations.
   b) Use backwards or forward prop to convert to internal functions if the
  resulting min/max expression still meet the criteria for being a
  saturating expression.  This follows the algorithm as outlined in "The
  Software Vectorization handbook" by Aart J.C. Bik.

  We could get the right instructions by using combine if we don't rewrite
  the instructions to an internal function, however then during 
Vectorization
  we would overestimate the cost of performing the saturation.  The 
constants
  will the also be loaded into registers and so becomes a lot more difficult
  to cleanup solely in the backend.

The one thing I am wondering about is whether we would need an internal function
for all operations supported, or if it should be modelled as an internal FN 
which
just "marks" the operation as rounding/saturating. After all, the only 
difference
between a normal and saturating expression in RTL is the xx_truncate RTL 
surrounding
the expression.  Doing so would also mean that all targets whom have saturating
instructions would automatically benefit from this.

But it does mean a small adjustment to the costing, which would need to cost the
internal function call and the argument together as a whole.

Any feedback is appreciated to minimize the number of changes required to the
final patch.  Any objections to the outlined approach?

Thanks,
Tamar


RE: [RFC] Implementing detection of saturation and rounding arithmetic

2021-05-12 Thread Tamar Christina via Gcc
Hi David, 

> -Original Message-
> From: David Brown 
> Sent: Tuesday, May 11, 2021 11:04 AM
> To: Tamar Christina ; gcc@gcc.gnu.org
> Cc: Richard Sandiford ; Richard Biener
> 
> Subject: Re: [RFC] Implementing detection of saturation and rounding
> arithmetic
> 
> On 11/05/2021 07:37, Tamar Christina via Gcc wrote:
> > Hi All,
> >
> > We are looking to implement saturation support in the compiler.  The
> > aim is to recognize both Scalar and Vector variant of typical saturating
> expressions.
> >
> > As an example:
> >
> > 1. Saturating addition:
> >char sat (char a, char b)
> >{
> >   int tmp = a + b;
> >   return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >}
> >
> > 2. Saturating abs:
> >char sat (char a)
> >{
> >   int tmp = abs (a);
> >   return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >}
> >
> > 3. Rounding shifts
> >char rndshift (char dc)
> >{
> >   int round_const = 1 << (shift - 1);
> >   return (dc + round_const) >> shift;
> >}
> >
> > etc.
> >
> 
> I can't comment on the implementation part - I don't know anything about it.
> 
> However, in your examples above I see a few points.
> 
> One is your use of "char".  "char" is a type that varies in signedness from
> target to target (and also depending on compiler flags), and is slightly
> different in C and C++ ('a' has char type in C++, int type in C).  If you 
> must use
> "char" in arithmetic contexts, I recommend using "signed char" or "unsigned
> char" explicitly.
> 
> I would rather recommend you use the size-specific  types - int8_t,
> etc., - as being more appropriate for this kind of thing.
> (AFAIK all gcc targets have 8-bit CHAR.)  This also makes it easier to see the
> sizes you need for the "tmp" value as you make functions for bigger sizes -
> remember that on some gcc targets, "int" is 16-bit.

Indeed, but unfortunately we're targeting existing code that's quite old, so the
C99 fixed sized types may not have been used.

> 
> It is also worth noting that gcc already has support for saturating types on
> some targets:
> 
> <https://gcc.gnu.org/onlinedocs/gcc/Fixed-Point.html>
> 
> My testing of these (quite a long time ago) left me with a feeling that it was
> not a feature anyone had worked hard to optimise - certainly it did not make
> use of saturating arithmetic instructions available on some of the targets I
> tested (ARM Cortex M4, for example).  But it is possible that there are things
> here that would be of use to you.  (I am not convinced that it is worth
> spending time optimising the implementation of these - I don't think the
> N1169 types are much used by
> anyone.)
> 

I did notice these and was wondering if it makes sense to use them, but I'd need
to check how well supported they are.  For instance would need to check if they
don't block vectorization and stuff like that.

> 
> While it is always good that the compiler can spot patterns in generic C code
> and generate optimal instruction sequences, another possibility here would
> be a set of built-in functions for saturated and rounding arithmetic.  That
> would take the guesswork out of it for users - if there code requires 
> efficient
> saturated addition, they can use __builtin_add_sat and get the best their
> target can offer (just like __builtin_add_overflow, and that kind of thing).
> And it might be easier to implement in the compiler.
> 

We already have neon intrinics for this, but these operations happen quite often
in media processing and machine learning frameworks which don't use intrinsics
or builtins.  So the big gain for us here really is the idiom recognition 😊

Thanks for the feedback!

Cheers,
Tamar

> 
> I hope these comments give you a few ideas or useful thoughts.
> 
> David



RE: [RFC] Implementing detection of saturation and rounding arithmetic

2021-05-12 Thread Tamar Christina via Gcc



> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, May 11, 2021 12:45 PM
> To: Tamar Christina 
> Cc: gcc@gcc.gnu.org; Richard Sandiford ;
> Jakub Jelinek 
> Subject: Re: [RFC] Implementing detection of saturation and rounding
> arithmetic
> 
> On Tue, 11 May 2021, Tamar Christina wrote:
> 
> > Hi All,
> >
> > We are looking to implement saturation support in the compiler.  The
> > aim is to recognize both Scalar and Vector variant of typical saturating
> expressions.
> >
> > As an example:
> >
> > 1. Saturating addition:
> >char sat (char a, char b)
> >{
> >   int tmp = a + b;
> >   return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >}
> >
> > 2. Saturating abs:
> >char sat (char a)
> >{
> >   int tmp = abs (a);
> >   return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >}
> >
> > 3. Rounding shifts
> >char rndshift (char dc)
> >{
> >   int round_const = 1 << (shift - 1);
> >   return (dc + round_const) >> shift;
> >}
> >
> > etc.
> >
> > Of course the first issue is that C does not really have a single
> > idiom for expressing this.
> >
> > At the RTL level we have ss_truncate and us_truncate and
> > float_truncate for truncation.
> >
> > At the Tree level we have nothing for truncation (I believe) for
> > scalars. For Vector code there already seems to be VEC_PACK_SAT_EXPR
> > but it looks like nothing actually generates this at the moment. it's just 
> > an
> unused tree code.
> >
> > For rounding there doesn't seem to be any existing infrastructure.
> >
> > The proposal to handle these are as follow, keep in mind that all of
> > these also exist in their scalar form, as such detecting them in the
> > vectorizer would be the wrong place.
> >
> > 1. Rounding:
> >a) Use match.pd to rewrite various rounding idioms to shifts.
> >b) Use backwards or forward prop to rewrite these to internal functions
> >   where even if the target does not support these rounding instructions
> they
> >   have a chance to provide a more efficient implementation than what
> would
> >   be generated normally.
> >
> > 2. Saturation:
> >a) Use match.pd to rewrite the various saturation expressions into
> min/max
> >   operations which opens up the expressions to further optimizations.
> >b) Use backwards or forward prop to convert to internal functions if the
> >   resulting min/max expression still meet the criteria for being a
> >   saturating expression.  This follows the algorithm as outlined in "The
> >   Software Vectorization handbook" by Aart J.C. Bik.
> >
> >   We could get the right instructions by using combine if we don't 
> > rewrite
> >   the instructions to an internal function, however then during
> Vectorization
> >   we would overestimate the cost of performing the saturation.  The
> constants
> >   will the also be loaded into registers and so becomes a lot more 
> > difficult
> >   to cleanup solely in the backend.
> >
> > The one thing I am wondering about is whether we would need an
> > internal function for all operations supported, or if it should be
> > modelled as an internal FN which just "marks" the operation as
> > rounding/saturating. After all, the only difference between a normal
> > and saturating expression in RTL is the xx_truncate RTL surrounding
> > the expression.  Doing so would also mean that all targets whom have
> saturating instructions would automatically benefit from this.
> >
> > But it does mean a small adjustment to the costing, which would need
> > to cost the internal function call and the argument together as a whole.
> >
> > Any feedback is appreciated to minimize the number of changes required
> > to the final patch.  Any objections to the outlined approach?
> 
> I think it makes sense to pattern-match the operations on GIMPLE and follow
> the approach take by __builtin_add_overflow & friends.
> Maybe quickly check whether clang provides some builtins already which we
> could implement.
> 

Hmm so taking a look at __builtin_add_overflow and friends it looks like they 
use
Internal functions like ADD_OVERFLOW. Biggest difference being that it sets a 
condition
flag.  This does me wonder if something like

int f (int a, int b)
{
int res;
if (__builtin_add_overflow (a, b, &res))
  {
  if (res >= 0)
return INT_MAX;
  else
return INT_MIN;
  }
return res;
}

Should be recognized as saturating as well.  But yeah, following the same 
approach we
would rewrite the sequence to something like res = .ADD_SAT (a, b);

I know that in the past there was concerns about doing such pattern matching 
early which
is why I thought about following the same approach we do with the table based 
ctz recognition
and do it in backprop.

For completeness, the OVERFLOW version of ADD in AArch64 maps to ADDS, but the 
saturating to SQADD.

> There's some appeal to mimicing what RTL does - thus ha

RE: [RFC] Implementing detection of saturation and rounding arithmetic

2021-05-12 Thread Tamar Christina via Gcc
Hi,

> -Original Message-
> From: Segher Boessenkool 
> Sent: Tuesday, May 11, 2021 4:43 PM
> To: Tamar Christina 
> Cc: gcc@gcc.gnu.org; Richard Sandiford ;
> Richard Biener 
> Subject: Re: [RFC] Implementing detection of saturation and rounding
> arithmetic
> 
> Hi!
> 
> On Tue, May 11, 2021 at 05:37:34AM +, Tamar Christina via Gcc wrote:
> > 2. Saturating abs:
> >char sat (char a)
> >{
> >   int tmp = abs (a);
> >   return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >}
> 
> That can be done quite a bit better, branchless at least.  Same for all
> examples here probably.

Do you mean in the example? Sure, I wanted to keep it simple 😊

> 
> > 2. Saturation:
> >a) Use match.pd to rewrite the various saturation expressions into
> min/max
> >   operations which opens up the expressions to further optimizations.
> 
> You'll have to do the operation in a bigger mode for that.  (This is also 
> true for
> rounding, in many cases).
> 
> This makes new internal functions more attractive / more feasible.

True, but the internal function doesn't need to model the wider mode right?
So if you're saturating an int, the internal-fn would work on int,int,int.

> 
> >   We could get the right instructions by using combine if we don't 
> > rewrite
> >   the instructions to an internal function, however then during
> Vectorization
> >   we would overestimate the cost of performing the saturation.  The
> constants
> >   will the also be loaded into registers and so becomes a lot more 
> > difficult
> >   to cleanup solely in the backend.
> 
> Combine is almost never the right answer if you want to combine more than
> two or three RTL insns.  It can be done, but combine will always write the
> combined instruction in simplest terms, which tends to mean that if you
> combine more insns there can be very many outcomes that you all need to
> recognise as insns in your machine description.

Yeah, ideally I would like to handle it before it gets to expand.

> 
> > The one thing I am wondering about is whether we would need an
> > internal function for all operations supported, or if it should be
> > modelled as an internal FN which just "marks" the operation as
> > rounding/saturating. After all, the only difference between a normal
> > and saturating expression in RTL is the xx_truncate RTL surrounding
> > the expression.  Doing so would also mean that all targets whom have
> saturating instructions would automatically benefit from this.
> 
> I think you will have to experiment with both approaches to get a good
> feeling for the tradeoff.

Fair enough,

Thanks for the feedback!

Cheers,
Tamar
> 
> 
> Segher


RE: [RFC] Implementing detection of saturation and rounding arithmetic

2021-05-12 Thread Tamar Christina via Gcc



> -Original Message-
> From: Joseph Myers 
> Sent: Tuesday, May 11, 2021 6:01 PM
> To: David Brown 
> Cc: Tamar Christina ; gcc@gcc.gnu.org; Richard
> Sandiford ; Richard Biener
> 
> Subject: Re: [RFC] Implementing detection of saturation and rounding
> arithmetic
> 
> On Tue, 11 May 2021, David Brown wrote:
> 
> > It is also worth noting that gcc already has support for saturating
> > types on some targets:
> >
> > 
> >
> > My testing of these (quite a long time ago) left me with a feeling
> > that it was not a feature anyone had worked hard to optimise -
> > certainly it
> 
> The implementation isn't well-integrated with any optimizations for
> arithmetic on ordinary integer types / modes, because it has its own
> completely separate machine modes and operations on those.  I still think it
> would be better to have a GIMPLE pass that lowers from fixed-point types to
> saturating etc. operations on ordinary integer types, as I said in
> .
> 

Oh, thanks for the link! That's a very useful read!

Cheers,
Tamar

> Note however that such lowering should be more or less independent of
> what's being discussed in this thread - this thread is about better
> optimization of such operations on ordinary types (with or without built-in
> functions of some kind in addition to recognition of such operations written
> in generic C), which you can do independently of what's done with fixed-
> point types.
> 
> --
> Joseph S. Myers
> jos...@codesourcery.com


RE: [RFC] Implementing detection of saturation and rounding arithmetic

2021-05-12 Thread Tamar Christina via Gcc



> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, May 12, 2021 9:48 AM
> To: Tamar Christina 
> Cc: gcc@gcc.gnu.org; Richard Biener 
> Subject: Re: [RFC] Implementing detection of saturation and rounding
> arithmetic
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > We are looking to implement saturation support in the compiler.  The
> > aim is to recognize both Scalar and Vector variant of typical saturating
> expressions.
> >
> > As an example:
> >
> > 1. Saturating addition:
> >char sat (char a, char b)
> >{
> >   int tmp = a + b;
> >   return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >}
> >
> > 2. Saturating abs:
> >char sat (char a)
> >{
> >   int tmp = abs (a);
> >   return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >}
> >
> > 3. Rounding shifts
> >char rndshift (char dc)
> >{
> >   int round_const = 1 << (shift - 1);
> >   return (dc + round_const) >> shift;
> >}
> >
> > etc.
> >
> > Of course the first issue is that C does not really have a single
> > idiom for expressing this.
> >
> > At the RTL level we have ss_truncate and us_truncate and
> > float_truncate for truncation.
> >
> > At the Tree level we have nothing for truncation (I believe) for
> > scalars. For Vector code there already seems to be VEC_PACK_SAT_EXPR
> > but it looks like nothing actually generates this at the moment. it's just 
> > an
> unused tree code.
> >
> > For rounding there doesn't seem to be any existing infrastructure.
> >
> > The proposal to handle these are as follow, keep in mind that all of
> > these also exist in their scalar form, as such detecting them in the
> > vectorizer would be the wrong place.
> >
> > 1. Rounding:
> >a) Use match.pd to rewrite various rounding idioms to shifts.
> >b) Use backwards or forward prop to rewrite these to internal functions
> >   where even if the target does not support these rounding instructions
> they
> >   have a chance to provide a more efficient implementation than what
> would
> >   be generated normally.
> >
> > 2. Saturation:
> >a) Use match.pd to rewrite the various saturation expressions into
> min/max
> >   operations which opens up the expressions to further optimizations.
> >b) Use backwards or forward prop to convert to internal functions if the
> >   resulting min/max expression still meet the criteria for being a
> >   saturating expression.  This follows the algorithm as outlined in "The
> >   Software Vectorization handbook" by Aart J.C. Bik.
> >
> >   We could get the right instructions by using combine if we don't 
> > rewrite
> >   the instructions to an internal function, however then during
> Vectorization
> >   we would overestimate the cost of performing the saturation.  The
> constants
> >   will the also be loaded into registers and so becomes a lot more 
> > difficult
> >   to cleanup solely in the backend.
> >
> > The one thing I am wondering about is whether we would need an
> > internal function for all operations supported, or if it should be
> > modelled as an internal FN which just "marks" the operation as
> > rounding/saturating. After all, the only difference between a normal
> > and saturating expression in RTL is the xx_truncate RTL surrounding
> > the expression.  Doing so would also mean that all targets whom have
> saturating instructions would automatically benefit from this.
> 
> I might have misunderstood what you meant here, but the *_truncate RTL
> codes are true truncations: the operand has to be wider than the result.
> Using this representation for general arithmetic is a problem if you're
> operating at the maximum size that the target supports natively.
> E.g. representing a 64-bit saturating addition as:
> 
>   - extend to 128 bits
>   - do a 128-bit addition
>   - truncate to 64 bits
> 

Ah, that wasn't clear from the documentation.. The one for the normal truncate
mentions that the modes have to be wider but the _truncate and friends don't
mention this constraint.  That would indeed not work..

> is going to be hard to cost and code-generate on targets that don't support
> native 128-bit operations (or at least, don't support them cheaply).
> This might not be a problem when recognising C idioms, since the C source
> code has to be able do the wider operation before truncating the result, but
> it could be a problem if we provide built-in functions or if we want to
> introduce compiler-generated saturating operations.
> 
> RTL already has per-operation saturation such as ss_plus/us_plus,
> ss_minus/us_minus, ss_neg/us_neg, ss_mult/us_mult, ss_div,
> ss_ashift/us_ashift and ss_abs.  I think we should do the same in gimple,
> using internal functions like you say.

Oh, I didn't know about those. Indeed those look like a better fit here.
Having all the operations separate in RTL already does seem to imply
That separate internal-fns is the way to go.

Thanks!

Regard

RE: [PATCH] Port GCC documentation to Sphinx

2021-07-13 Thread Tamar Christina via Gcc
Hi Martin,

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Martin Liška
> Sent: Tuesday, June 29, 2021 11:09 AM
> To: Joseph Myers 
> Cc: GCC Development ; gcc-patc...@gcc.gnu.org
> Subject: Re: [PATCH] Port GCC documentation to Sphinx
> 
> On 6/28/21 5:33 PM, Joseph Myers wrote:
> > Are formatted manuals (HTML, PDF, man, info) corresponding to this
> > patch version also available for review?
> 
> I've just uploaded them here:
> https://splichal.eu/gccsphinx-final/
> 

Thanks for doing this 😊

I'm a primary user of the PDFs (easier to work offline) I do like the look and 
syntax highlighting of the new PDFs,
But I do prefer the way the current itemized entries are laid out.

See for instance ` vect_interleave` which before would have the description 
indented on the next line, vs the new docs which puts it on the same line.

The visual break in my opinion makes it easier to read.  It currently looks 
like a sea of text.  Also purely personal I expect, but could the weight of the 
bold entries be reduced a bit? They look very BOLD to me atm and when there's a 
lot of them I find it slightly harder to read.

Cheers,
Tamar

> Martin



RE: Enable the vectorizer at -O2 for GCC 12

2021-09-01 Thread Tamar Christina via Gcc
-- edit, added list back in --

Just to add some AArch64 numbers for Spec2017 we see 2.1% overall Geomean 
improvements (all from x264 as expected) with no real regressions (everything 
within variance) and only a 0.06% binary size increase overall (of which x264 
grew 0.15%) using the very cheap cost model.

So we'd be quite keen on this as well.

Cheers,
Tamar

> -Original Message-
> From: Gcc  On Behalf
> Of Florian Weimer via Gcc
> Sent: Monday, August 30, 2021 2:05 PM
> To: gcc@gcc.gnu.org
> Cc: ja...@redhat.com; Richard Earnshaw ;
> Segher Boessenkool ; Richard Sandiford
> ; premachandra.malla...@amd.com;
> Hongtao Liu 
> Subject: Enable the vectorizer at -O2 for GCC 12
> 
> There has been a discussion, both off-list and on the gcc-help mailing list
> (“Why vectorization didn't turn on by -O2”, spread across several months),
> about enabling the auto-vectorizer at -O2, similar to what Clang does.
> 
> I think the review concluded that the very cheap cost model should be used
> for that.
> 
> Are there any remaining blockers?
> 
> Thanks,
> Florian



RE: Complex multiply optimization working?

2022-04-11 Thread Tamar Christina via Gcc
HI,

> -Original Message-
> From: Andrew Stubbs 
> Sent: Monday, April 11, 2022 12:19 PM
> To: GCC Development 
> Cc: Tamar Christina 
> Subject: Complex multiply optimization working?
> 
> Hi all,
> 
> I've been looking at implementing the complex multiply patterns for the
> amdgcn port, but I'm not getting the code I was hoping for. When I try to use
> the patterns on x86_64 or AArch64 they don't seem to work there either, so
> is there something wrong with the middle-end? I've tried both current HEAD
> and GCC 11.

They work fine in both GCC 11 and HEAD https://godbolt.org/z/Mxxz6qWbP 
Did you actually enable the instructions?

The fully unrolled form doesn't get detected at -Ofast because the SLP 
vectorizer doesn't
detect TWO_OPERAND nodes as a constructor, see 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104406

note:   Final SLP tree for instance 0x2debde0:
note:   node 0x2cdf900 (max_nunits=2, refcnt=2) vector(2) double
note:   op template: _463 = _457 * _460;
note:   stmt 0 _463 = _457 * _460;
note:   stmt 1 _464 = _458 * _459;
note:   children 0x2cdf990 0x2cdfa20
note:   node 0x2cdf990 (max_nunits=2, refcnt=2) vector(2) double
note:   op template: _457 = REALPART_EXPR ;
note:   stmt 0 _457 = REALPART_EXPR ;
note:   stmt 1 _458 = IMAGPART_EXPR ;
note:   load permutation { 64 65 }
note:   node 0x2cdfa20 (max_nunits=2, refcnt=2) vector(2) double
note:   op template: _460 = IMAGPART_EXPR ;
note:   stmt 0 _460 = IMAGPART_EXPR ;
note:   stmt 1 _459 = REALPART_EXPR ;
note:   load permutation { 65 64 }

in the general case, were these to be scalars the benefits are dubious because 
of the moving between
register files.

At -O3 it works fine (no -Ofast canonization rules rewriting the form) but the 
cost of the loop is too high to be profitable.
You have to disable The cost model to get it to vectorize where it would use 
them https://godbolt.org/z/MsGq84WP9
And the vectorizer is right here, the scalar code is cheaper.

The various canonicalization differences at -Ofast makes many different forms
I.e. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104408

But yes, detection is working as intended, but some -Ofast cases are not 
detected yet.

> 
> The example shown in the internals manual is a simple loop multiplying two
> arrays of complex numbers, and writing the results to a third. I had expected
> that it would use the largest vectorization factor available, with the
> real/imaginary numbers in even/odd lanes as described, but the
> vectorization factor is only 2 (so, a single complex number), and I have to 
> set
> -fvect-cost-model=unlimited to get even that.
> 
> I tried another example with SLP and that too uses the cmul patterns only for
> a single real/imaginary pair.
> 
> Did proper vectorization of cmul ever really work? There is a case in the
> testsuite for the pattern match, but it isn't in a loop.
> 

There are both SLP and LOOP variants in the testsuite. All the patterns are 
inside of a loop
The mul tests are generated from 
https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.dg/vect/complex/complex-mul-template.c

Where the tests that use of this template instructs the vectorizer to unroll 
some cases
and others they're kept as a loop. So both are tested in the testsuite.

> Thanks
> 
> Andrew
> 
> P.S. I attached my testcase, in case I'm doing something stupid.

Both work https://godbolt.org/z/Mxxz6qWbP and https://godbolt.org/z/MsGq84WP9,

Regards,
Tamar

> 
> P.P.S. The manual says the pattern is "cmulm4", etc., but it's actually
> "cmulm3" in the implementation.


RE: gcc/config.in was not regenerated

2023-06-12 Thread Tamar Christina via Gcc
Hi Coudert,

Sorry, missed that one.

I'll fix that.

Tamar.

> -Original Message-
> From: FX Coudert 
> Sent: Saturday, June 10, 2023 9:21 PM
> To: Tamar Christina 
> Cc: gcc@gcc.gnu.org; Jeff Law ; gcc-
> patc...@gcc.gnu.org
> Subject: gcc/config.in was not regenerated
> 
> Hi,
> 
> Building GCC in maintainer mode leads to changes in gcc/config.in
> :
> 
> > diff --git a/gcc/config.in b/gcc/config.in index
> > 4cad077bfbe..25442c59aec 100644
> > --- a/gcc/config.in
> > +++ b/gcc/config.in
> > @@ -67,6 +67,12 @@
> >  #endif
> > +/* Define to larger than one set the number of match.pd
> > partitions to make. */
> > +#ifndef USED_FOR_TARGET
> > +#undef DEFAULT_MATCHPD_PARTITIONS
> > +#endif
> > +
> > +
> >  /* Define to larger than zero set the default stack clash protector
> > size. */  #ifndef USED_FOR_TARGET  #undef
> DEFAULT_STK_CLASH_GUARD_SIZE
> 
> which I think are because this commit
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0a85544e1aaeca41133ecfc4
> 38cda913dbc0f122
> should have regenerated and committed config.in 
> 
> Christina, can you please have a look?
> 
> FX


RE: Complex numbers support: discussions summary

2023-09-26 Thread Tamar Christina via Gcc
Hi,

I tried to find you two on Sunday but couldn't locate you. Thanks for the 
presentation!

> >
> > We had very interesting discussions during our presentation with Paul
> > on the support of complex numbers in gcc at the Cauldron.
> >
> > Thank you all for your participation !
> >
> > Here is a small summary from our viewpoint:
> >
> > - Replace CONCAT with a backend defined internal representation in RTL
> > --> No particular problems
> >
> > - Allow backend to write patterns for operation on complex modes
> > --> No particular problems
> >
> > - Conditional lowering depending on whether a pattern exists or not
> > --> Concerns when the vectorization of split complex operations
> > --> performs
> > better
> > than not vectorized unified complex operations
> >
> > - Centralize complex lowering in cplxlower
> > --> No particular problems if it doesn't prevent IEEE compliance and
> > optimizations (like const folding)
> >
> > - Vectorization of complex operations
> > --> 2 representations (interleaved and separated real/imag): cannot
> > impose one
> > if some machines prefer the other
> > --> Complex are composite modes, the vectorizer assumes that the inner
> > mode is
> > scalar to do some optimizations (which ones ?)
> > --> Mixed split/unified complex operations cannot be vectorized easely
> > --> Assuming that the inner representation of complex vectors is let
> > --> to
> > target
> > backends, the vectorizer doesn't know it, which prevent some
> > optimizations
> > (which ones ?)
> >
> > - Explicit vectors of complex
> > --> Cplxlower cannot lower it, and moving veclower before cplxlower is
> > --> a
> > bad
> > idea as it prevents some optimizations
> > --> Teaching cplxlower how to deal with vectors of complex seems to be
> > --> a
> > reasonable alternative
> > --> Concerns about ABI or indexing if the internal representation is
> > --> let
> > to the
> > backend and differs from the representation in memory
> >
> > - Impact of the current SLP pattern matching of complex operations
> > --> Only with -ffast-math
> > --> It can match user defined operations (not C99) that can be
> > simplified with a
> > complex instruction
> > --> Dedicated opcode and real vector type choosen VS standard opcode
> > --> and
> > complex
> > mode in our implementation
> > --> Need to preserve SLP pattern matching as too many applications
> > redefines
> > complex and bypass C99 standard.
> > --> So need to harmonize with our implementation
> >
> > - Support of the pure imaginary type (_Imaginary)
> > --> Still not supported by gcc (and llvm), neither in our
> > --> implementation Issues comes from the fact that an imaginary is not
> > --> a complex with
> > real part
> > set to 0
> > --> The same issue with complex multiplication by a real (which is
> > --> split
> > in the
> > frontend, and our implementation hasn't changed it yet)
> > --> Idea: Add an attribute to the Tree complex type which specify pure
> > real / pure
> > imaginary / full complex ?
> >
> > - Fast pattern for IEEE compliant emulated operations
> > --> Not enough time to discuss about it
> >
> > Don't hesitate to add something or bring more precision if you want.
> >
> > As I said at the end of the presentation, we have written a paper
> > which explains our implementation in details. You can find it on the
> > wiki page of the Cauldron
> >
> (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&tar
> get=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).
> 
> Thanks for the detailed presentation at the Cauldron.
> 
> My personal summary is that I'm less convinced delaying lowering is the way
> to go.

I personally like the delayed lowering for scalar because it allows us to 
properly
reassociate as a unit. That is to say, it's easier to detect a * b * c when they
are still complex ops. And the late lowering will allow beter codegen than 
today.

However I think we should *unconditionally* not lower them, even in situations
such as a * b  * imag(b). This situation can happen by late optimizations anyway
so it has to be dealt with regardless so I don't think it should punt.

I think you can then conditionally lower if the target does *not* implement the
optab.  i.e. for AArch64 the complex mode wouldn't be useful.

> I do think that if targets implement complex optabs we should use them but
> eventually re-discovering complex operations from lowered form is going to be
> more useful.
> That's because as you said, use of _Complex is limited and
> people inventing their own representation.  SLP vectorization can discover
> some ops already with the limiting factor being that we don't specifically
> search for only complex operations (plus we expose the result as vector
> operations, requiring target support for the vector ops rather than [SD]Cmode
> operations).

I don't think the two are mutually exclusive, I do think we should form complex
instructions from sc

RE: Complex numbers support: discussions summary

2023-09-26 Thread Tamar Christina via Gcc
> -Original Message-
> From: Gcc  On Behalf
> Of Paul Iannetta via Gcc
> Sent: Tuesday, September 26, 2023 9:54 AM
> To: Richard Biener 
> Cc: Sylvain Noiry ; gcc@gcc.gnu.org;
> sylvain.no...@hotmail.fr
> Subject: Re: Complex numbers support: discussions summary
> 
> On Tue, Sep 26, 2023 at 09:30:21AM +0200, Richard Biener via Gcc wrote:
> > On Mon, Sep 25, 2023 at 5:17 PM Sylvain Noiry via Gcc 
> wrote:
> > >
> > > Hi,
> > >
> > > We had very interesting discussions during our presentation with
> > > Paul on the support of complex numbers in gcc at the Cauldron.
> > >
> > > Thank you all for your participation !
> > >
> > > Here is a small summary from our viewpoint:
> > >
> > > - Replace CONCAT with a backend defined internal representation in
> > > RTL
> > > --> No particular problems
> > >
> > > - Allow backend to write patterns for operation on complex modes
> > > --> No particular problems
> > >
> > > - Conditional lowering depending on whether a pattern exists or not
> > > --> Concerns when the vectorization of split complex operations
> > > --> performs
> > > better
> > > than not vectorized unified complex operations
> > >
> > > - Centralize complex lowering in cplxlower
> > > --> No particular problems if it doesn't prevent IEEE compliance and
> > > optimizations (like const folding)
> > >
> > > - Vectorization of complex operations
> > > --> 2 representations (interleaved and separated real/imag): cannot
> > > impose one
> > > if some machines prefer the other
> > > --> Complex are composite modes, the vectorizer assumes that the
> > > --> inner
> > > mode is
> > > scalar to do some optimizations (which ones ?)
> > > --> Mixed split/unified complex operations cannot be vectorized
> > > --> easely Assuming that the inner representation of complex vectors
> > > --> is let to
> > > target
> > > backends, the vectorizer doesn't know it, which prevent some
> > > optimizations
> > > (which ones ?)
> > >
> > > - Explicit vectors of complex
> > > --> Cplxlower cannot lower it, and moving veclower before cplxlower
> > > --> is a
> > > bad
> > > idea as it prevents some optimizations
> > > --> Teaching cplxlower how to deal with vectors of complex seems to
> > > --> be a
> > > reasonable alternative
> > > --> Concerns about ABI or indexing if the internal representation is
> > > --> let
> > > to the
> > > backend and differs from the representation in memory
> > >
> > > - Impact of the current SLP pattern matching of complex operations
> > > --> Only with -ffast-math
> > > --> It can match user defined operations (not C99) that can be
> > > simplified with a
> > > complex instruction
> > > --> Dedicated opcode and real vector type choosen VS standard opcode
> > > --> and
> > > complex
> > > mode in our implementation
> > > --> Need to preserve SLP pattern matching as too many applications
> > > redefines
> > > complex and bypass C99 standard.
> > > --> So need to harmonize with our implementation
> > >
> > > - Support of the pure imaginary type (_Imaginary)
> > > --> Still not supported by gcc (and llvm), neither in our
> > > --> implementation Issues comes from the fact that an imaginary is
> > > --> not a complex with
> > > real part
> > > set to 0
> > > --> The same issue with complex multiplication by a real (which is
> > > --> split
> > > in the
> > > frontend, and our implementation hasn't changed it yet)
> > > --> Idea: Add an attribute to the Tree complex type which specify
> > > --> pure
> > > real / pure
> > > imaginary / full complex ?
> > >
> > > - Fast pattern for IEEE compliant emulated operations
> > > --> Not enough time to discuss about it
> > >
> > > Don't hesitate to add something or bring more precision if you want.
> > >
> > > As I said at the end of the presentation, we have written a paper
> > > which explains our implementation in details. You can find it on the
> > > wiki page of the Cauldron
> > >
> (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&tar
> get=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).
> >
> > Thanks for the detailed presentation at the Cauldron.
> >
> > My personal summary is that I'm less convinced delaying lowering is
> > the way to go.
> 
> This is not only delayed lowering, if the SPN are there, there is no lowering 
> at
> all.
> 
> > I do think that if targets implement complex optabs we should use them
> > but eventually re-discovering complex operations from lowered form is
> > going to be more useful.
> 
> I would not be opposed to rediscovering complex operations but I think that
> even though, rediscovering a + b, a - b is easy, a * b would still be doable, 
> but
> even a / b will be hard.  Even though, I doubt will see a hardware complex
> division but who knows.  However, once lowered, re-associating a * b * c and
> more complex expressions is going to be hard.
> 
> > That's because as you said, use of _Complex is limited and people
> > inventi

RE: Loop vectorizer optimization questions

2024-01-08 Thread Tamar Christina via Gcc
> 
> Also, another question is that I am working on min/max reduction with index, I
> believe it should be in GCC-15, but I wonder
> whether I can pre-post for review in stage 4, or I should post patch (min/max
> reduction with index) when GCC-15 is open.
> 

FWIW, We tried to implement this 5 years ago 
https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534518.html
and you'll likely get the same feedback if you aren't already doing so.

I think Richard would prefer to have a general framework these kinds of 
operations.  We never got around to doing so
and it's still on my list but if you're taking care of it 😊

Just though I'd point out the previous feedback.

Cheers,
Tamar

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai


Re: RE: Loop vectorizer optimization questions

2024-01-09 Thread Tamar Christina via Gcc
Hi,

The 01/08/2024 22:46, 钟居哲 wrote:
> Oh. It's nice to see you have support min/max index reduction.
> 
> I knew your patch can handle this following:
> 
> 
> int idx = ii;
> int max = mm;
> for (int i = 0; i < n; ++i) {
>   int x = a[i];
>   if (max < x) {
> max = x;
> idx = i;
>   }
> }
> 
> But I wonder whether your patch can handle this:
> 
> int idx = ii;
> int max = mm;
> for (int i = 0; i < n; ++i) {
>   int x = a[i];
>   if (max <= x) {
> max = x;
> idx = i;
>   }
> }
> 

The last version of the patch we sent handled all conditionals:

https://inbox.sourceware.org/gcc-patches/db9pr08mb6603dccb35007d83c6736167f5...@db9pr08mb6603.eurprd08.prod.outlook.com/

There are some additional testcases in the patch for all these as well.

> Will you continue to work on min/max with index ?

I don't know if I'll have the free time to do so, that's the reason I haven't 
resent the new one.
The engineer who started it no longer works for Arm.

> Or you want me to continue this work base on your patch ?
> 
> I have an initial patch which roughly implemented LLVM's approach but turns 
> out Richi doesn't want me to apply LLVM's approach so your patch may be more 
> reasonable than LLVM's approach.
> 

When Richi reviewed it he wasn't against the approach in the patch 
https://inbox.sourceware.org/gcc-patches/nycvar.yfh.7.76.2105071320170.9...@zhemvz.fhfr.qr/
but he wanted the concept of a dependent reduction to be handle more 
generically, so we could extend it in the future.

I think, from looking at Richi's feedback is that he wants 
vect_recog_minmax_index_pattern to be more general. We've basically hardcoded 
the reduction type,
but it could just be a property on STMT_VINFO.

Unless I'm mistaken the patch already relies on first finding both reductions, 
but we immediately try to resolve the relationship using 
vect_recog_minmax_index_pattern.
Instead I think what Richi wanted was for us to keep track of reductions that 
operate on the same induction variable and after we finish analysing all 
reductions we
try to see if any reductions we kept track of can be combined.

Basically just separate out the discovery and tieing of the reductions.

Am I right here Richi?

I think the codegen part can mostly be used as is, though we might be able to 
do better for VLA.

So it should be fairly straight forward to go from that final patch to what 
Richi wants, but.. I just lack time.

If you want to tackle it that would be great :)

Thanks,
Tamar

> Thanks.
> 
> juzhe.zh...@rivai.ai
> 
> From: Tamar Christina
> Date: 2024-01-09 01:50
> To: 钟居哲; gcc
> CC: rdapp.gcc; 
> richard.guenther
> Subject: RE: Loop vectorizer optimization questions
> >
> > Also, another question is that I am working on min/max reduction with 
> > index, I
> > believe it should be in GCC-15, but I wonder
> > whether I can pre-post for review in stage 4, or I should post patch 
> > (min/max
> > reduction with index) when GCC-15 is open.
> >
> 
> FWIW, We tried to implement this 5 years ago 
> https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534518.html
> and you'll likely get the same feedback if you aren't already doing so.
> 
> I think Richard would prefer to have a general framework these kinds of 
> operations.  We never got around to doing so
> and it's still on my list but if you're taking care of it 
> 
> Just though I'd point out the previous feedback.
> 
> Cheers,
> Tamar
> 
> > Thanks.
> >
> >
> > juzhe.zh...@rivai.ai

-- 


RE: [RFC] Merge strathegy for all-SLP vectorizer

2024-05-20 Thread Tamar Christina via Gcc



> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 17, 2024 1:54 PM
> To: Richard Sandiford 
> Cc: Richard Biener via Gcc ; Tamar Christina
> 
> Subject: Re: [RFC] Merge strathegy for all-SLP vectorizer
> 
> On Fri, 17 May 2024, Richard Sandiford wrote:
> 
> > Richard Biener via Gcc  writes:
> > > Hi,
> > >
> > > I'd like to discuss how to go forward with getting the vectorizer to
> > > all-SLP for this stage1.  While there is a personal branch with my
> > > ongoing work (users/rguenth/vect-force-slp) branches haven't proved
> > > themselves working well for collaboration.
> >

Yeah, It's hard to keep rebasing and build on top of.

> > Speaking for myself, the problem hasn't been so much the branch as
> > lack of time.  I've been pretty swamped the last eight months of so
> > (except for the time that I took off, which admittedly was quite a
> > bit!), and so I never even got around to properly reading and replying
> > to your message after the Cauldron.  It's been on the "this is important,
> > I should make time to read and understand it properly" list all this time.
> > Sorry about that. :(
> >
> > I'm hoping to have time to work/help out on SLP stuff soon.
> >
> > > The branch isn't ready to be merged in full but I have been picking
> > > improvements to trunk last stage1 and some remaining bits in the past
> > > weeks.  I have refrained from merging code paths that cannot be
> > > exercised on trunk.
> > >
> > > There are two important set of changes on the branch, both critical
> > > to get more testing on non-x86 targets.
> > >
> > >  1. enable single-lane SLP discovery
> > >  2. avoid splitting store groups (9315bfc661432c3 and 4336060fe2db8ec
> > > if you fetch the branch)
> > >

For no# is there a param or is it just the default?  I can run these through
regression today.

> > > The first point is also most annoying on the testsuite since doing
> > > SLP instead of interleaving changes what we dump and thus tests
> > > start to fail in random ways when you switch between both modes.
> > > On the branch single-lane SLP discovery is gated with
> > > --param vect-single-lane-slp.
> > >
> > > The branch has numerous changes to enable single-lane SLP for some
> > > code paths that have SLP not implemented and where I did not bother
> > > to try supporting multi-lane SLP at this point.  It also adds more
> > > SLP discovery entry points.
> > >
> > > I'm not sure how to try merging these pieces to allow others to
> > > more easily help out.  One possibility is to merge
> > > --param vect-single-lane-slp defaulted off and pick dependent
> > > changes even when they cause testsuite regressions with
> > > vect-single-lane-slp=1.  Alternatively adjust the testsuite by
> > > adding --param vect-single-lane-slp=0 and default to 1
> > > (or keep the default).

I guess which one is better depends on whether the parameter goes
away this release? If so I think we should just leave them broken for
now and fix them up when it's the default?

> >
> > FWIW, this one sounds good to me (the default to 1 version).
> > I.e. mechanically add --param vect-single-lane-slp=0 to any tests
> > that fail with the new default.  That means that the test that need
> > fixing are easily greppable for anyone who wants to help.  Sometimes
> > it'll just be a test update.  Sometimes it will be new vectoriser code.
> 
> OK.  Meanwhile I figured the most important part is 2. from above
> since that enables the single-lane in a grouped access (also covering
> single element interleaving).  This will cover all problematical cases
> with respect to vectorizing loads and stores.  It also has less
> testsuite fallout, mainly because we have a lot less coverage for
> grouped stores without SLP.
> 
> So I'll see to produce a mergeable patch for part 2 and post that
> for review next week.

Sounds good!

Thanks for getting the ball rolling on this.
It would be useful to have it in trunk indeed, off by default for now
sounds good because then I can work on trunk for the SLP support
for early break as well.

Cheers,
Tamar

> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Richard
> >
> > > Or require a clean testsuite with
> > > --param vect-single-lane-slp defaulted to 1 but keep the --param
> > > for debugging (and allow FAILs with 0).
> > >
> > > For fun I merged just single-lane discovery of non-grouped stores
> > > and have that enabled by default.  On x86_64 this results in the
> > > set of FAILs below.
> > >
> > > Any suggestions?
> > >
> > > Thanks,
> > > Richard.
> > >
> > > FAIL: gcc.dg/vect/O3-pr39675-2.c scan-tree-dump-times vect "vectorizing
> > > stmts using SLP" 1
> > > XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER
> > > LOOP VECTORIZED." 1
> > > FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect
> > > "Alignment of access forced using peeling" 2
> > > FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect
> > > "Vectorizing an unaligned acces

RE: Understanding bogus? gcc.dg/signbit-6.c

2024-10-02 Thread Tamar Christina via Gcc
> -Original Message-
> From: Georg-Johann Lay 
> Sent: Wednesday, October 2, 2024 3:06 PM
> To: GCC Development ; Tamar Christina
> 
> Subject: Re: Understanding bogus? gcc.dg/signbit-6.c
> 
> Am 02.10.24 um 15:55 schrieb Georg-Johann Lay:
> > I am having problems understanding test case gcc.dg/signbit-6.c
> > which fails on a 16-bit platform (avr).
> >
> > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/gcc.dg/signbit-
> 6.c;h=da186624cfa057dfc3780c8af4f6b1335ba07e7e;hb=HEAD
> >
> > The relevant part of the code is:
> >
> > int main ()
> > {
> >    TYPE a[N];
> >    TYPE b[N];
> >
> >    a[0] = INT_MIN;
> >    b[0] = INT_MIN;
> >
> >    for (int i = 1; i < N; ++i) ...
> >
> >    fun1 (a, N);
> >    fun2 (b, N);
> >
> >    if (DEBUG)
> >      printf ("%d = 0x%x == 0x%x\n", 0, a[0], b[0]);
> >
> >    if (a[0] != 0x0 || b[0] != -1)
> >      __builtin_abort ();   // <-- triggers
> >
> >
> > where TYPE=int32_t, and the error occurs for a[0] and b[0] so
> > the test can be compiled with -DN=1 and still fails.
> >
> > fun1() and fun2() have the same C code but different optimization level,
> > so how are a[0] and b[0] supposed to be different?
> >
> > __attribute__ ((noinline, noipa))
> > void fun1(TYPE *x, int n)
> > {
> >      for (int i = 0; i < n; i++)
> >    x[i] = (-x[i]) >> 31;
> > }
> >
> > __attribute__ ((noinline, noipa, optimize("O0")))
> > void fun2(TYPE *x, int n)
> > {
> >      for (int i = 0; i < n; i++)
> >    x[i] = (-x[i]) >> 31;
> > }
> >
> > IIUC the compiler may exploit that x and -x will always have different
> > sign bits because "- INT_MIN" is UB.  In fact, on x86_64 the test
> > passes, here with gcc v13.2:
> >
> > $ gcc signbit-6.c -O1 -o x.x -DN=1 && ./x.x
> > 0 = 0x0 == 0x
> >
> > With -fwrapv so that -INT_MIN is not more UB, it actually fails
> > due to the bad condition  "if (a[0] != 0x0 || b[0] != -1)" :
> >
> > $ gcc signbit-6.c -O1 -o x.x -DN=1 -fwrapv && ./x.x
> > 0 = 0x == 0x
> >
> > On avr (int=int16_t), the test aborts after printing "0 = 0x0 == 0x0".
> >
> >
> > So as far as I can see, that test has at least 3 bugs?
> >
> > 1) Missing -fwrapv so that -INT_MIN is no more UB, hence the
> > test would assert that a[0] == b[0].
> >
> > 2) When testing for a specific value of a[0] and b[0], it depends
> > on whether int == int32_t or smaller:  Better use INT32_MIN to
> > initialize a[0] and b[0] instead of just INT_MIN.
> >
> > 3) The test case has bogus printf format strings and should
> > use PRIx32 from inttypes.h instead of %x.
> >
> > With theses fixes, expected result for a[0] and b[0] is
> > (-INT32_MIN) >> 31  =  INT_MIN >> 31  =  INT32_C (-1)
> >
> > Johann
> 
> The test case also passes on avr when a[0] and b[0] are initialized
> to INT32_MIN (instead of INT_MIN), without -fwrapv, and fixing %x:
> 
> 0 = 0x0 == 0x
> 
> But IMO a test case should not rely on any specific behavior of
> code that is UB.
> 
> FYI, there is also gcc.dg/signbit-4.c that comes with -fwrapv
> and doesn't assume a specific result for a[0], b[0] and just
> that a[0] == b[0] (also uses INT_MIN for init instead of INT32_MIN).
> 

There are multiple tests because -fwrapv is not required to perform the
optimization.  By a target failing without -fwrapv it can save someone time
vs looking at a failing larger program.

The INT_MIN case was added because it was a concern during review.

Thanks,
Tamar

> Johann


RE: Understanding bogus? gcc.dg/signbit-6.c

2024-10-02 Thread Tamar Christina via Gcc
> -Original Message-
> From: Georg-Johann Lay 
> Sent: Wednesday, October 2, 2024 2:55 PM
> To: GCC Development ; Tamar Christina
> 
> Subject: Understanding bogus? gcc.dg/signbit-6.c
> 
> I am having problems understanding test case gcc.dg/signbit-6.c
> which fails on a 16-bit platform (avr).
> 
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/gcc.dg/signbit-
> 6.c;h=da186624cfa057dfc3780c8af4f6b1335ba07e7e;hb=HEAD
> 
> The relevant part of the code is:
> 
> int main ()
> {
>TYPE a[N];
>TYPE b[N];
> 
>a[0] = INT_MIN;
>b[0] = INT_MIN;
> 
>for (int i = 1; i < N; ++i) ...
> 
>fun1 (a, N);
>fun2 (b, N);
> 
>if (DEBUG)
>  printf ("%d = 0x%x == 0x%x\n", 0, a[0], b[0]);
> 
>if (a[0] != 0x0 || b[0] != -1)
>  __builtin_abort ();   // <-- triggers
> 
> 
> where TYPE=int32_t, and the error occurs for a[0] and b[0] so
> the test can be compiled with -DN=1 and still fails.
> 
> fun1() and fun2() have the same C code but different optimization level,
> so how are a[0] and b[0] supposed to be different?
> 
> __attribute__ ((noinline, noipa))
> void fun1(TYPE *x, int n)
> {
>  for (int i = 0; i < n; i++)
>x[i] = (-x[i]) >> 31;
> }
> 
> __attribute__ ((noinline, noipa, optimize("O0")))
> void fun2(TYPE *x, int n)
> {
>  for (int i = 0; i < n; i++)
>x[i] = (-x[i]) >> 31;
> }
> 
> IIUC the compiler may exploit that x and -x will always have different
> sign bits because "- INT_MIN" is UB.  In fact, on x86_64 the test
> passes, here with gcc v13.2:
> 
> $ gcc signbit-6.c -O1 -o x.x -DN=1 && ./x.x
> 0 = 0x0 == 0x
> 
> With -fwrapv so that -INT_MIN is not more UB, it actually fails
> due to the bad condition  "if (a[0] != 0x0 || b[0] != -1)" :
> 

Yes, because specifying -fwrapv changes the optimization and the
point of the test *is* to exploit UB.

> $ gcc signbit-6.c -O1 -o x.x -DN=1 -fwrapv && ./x.x
> 0 = 0x == 0x
> 
> On avr (int=int16_t), the test aborts after printing "0 = 0x0 == 0x0".
> 
> 
> So as far as I can see, that test has at least 3 bugs?
> 
> 1) Missing -fwrapv so that -INT_MIN is no more UB, hence the
> test would assert that a[0] == b[0].

That's not a bug.

> 
> 2) When testing for a specific value of a[0] and b[0], it depends
> on whether int == int32_t or smaller:  Better use INT32_MIN to
> initialize a[0] and b[0] instead of just INT_MIN.
> 
> 3) The test case has bogus printf format strings and should
> use PRIx32 from inttypes.h instead of %x.

These are I suppose for platforms that indeed have 16-bit int.

Thanks,
Tamar

> 
> With theses fixes, expected result for a[0] and b[0] is
> (-INT32_MIN) >> 31  =  INT_MIN >> 31  =  INT32_C (-1)
> 
> Johann


RE: We need to remove the Sphinx HTML docs

2024-11-15 Thread Tamar Christina via Gcc
> -Original Message-
> From: Gcc  On Behalf Of
> Jonathan Wakely via Gcc
> Sent: Friday, November 15, 2024 12:25 PM
> To: Gerald Pfeifer 
> Cc: gcc Mailing List 
> Subject: Re: We need to remove the Sphinx HTML docs
> 
> On Fri, 15 Nov 2024 at 12:22, Jonathan Wakely  wrote:
> >
> > On Fri, 15 Nov 2024 at 12:14, Jonathan Wakely  wrote:
> > >
> > > Hi Gerald,
> > >
> > > The HTML pages from Martin Liska's Sphinx doc experiment are still
> > > online, and Google thinks they are the canonical locatiosn for GCC
> > > docs.
> > > e.g. try https://www.google.com/search?client=firefox-b-
> d&q=%22inline+function+is+as+fast+as+a+macro%22++gcc
> > >
> > > The only hit from gcc.gnu.org is a snapshot of GCC 13.0.0 from
> > > 20221114, two years old.
> > >
> > > I want to remove those docs. As nice as they look, the content is
> > > outdated and getting more so by the day. They're doing more harm than
> > > good now.
> > >
> > > It looks like we removed all the /onlinedocs/gcc/*.html pages from the
> > > Sphinx docs, but we did NOT remove the /oneline/docs/gcc/*/*.html
> > > pages, e.g.
> > > https://gcc.gnu.org/onlinedocs/gcc/language-standards-supported-by-
> gcc.html
> > > is 404
> > > but all the pages below that section are still there e.g.
> > > https://gcc.gnu.org/onlinedocs/gcc/language-standards-supported-by-gcc/
> > > JOINME c-language.html
> > > works (if you remove " XXX " from the URL, I don't want to link to it
> > > so I don't give web crawlers ideas).
> >
> > Oops, s/XXX/JOINME/ or vice versa. I hope you get the idea.
> >
> > section.html is not present, but section/subsection.html is present.
> >
> >
> > >
> > > On IRC mjw suggested that you (Gerald) might object to breaking links
> > > by just removing them. I think the pages are already broken (the links
> > > in the sidebar are half missing already).
> > >
> > > If we think preserving those links is important (I don't) then we
> > > could add this to htdocs/.htaccess:

FWIW, Even though I was one of those that really liked and wanted this
documentation update to sphinx, I agree with removing them as it is just
confusing or misleading to users at this point.

I do hope that in the future we could try to modernize the documentations
again, using whatever format is best.

Cheers,
Tamar

> > >
> > > RedirectMatch permanent
> > > /onlinedocs/gcc/extensions-to-the-c-language-family/.*
> > > https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
> > > RedirectMatch permanent
> > > /onlinedocs/gcc/language-standards-supported-by-gcc/.*
> > > https://gcc.gnu.org/onlinedocs/gcc/Standards.html
> > > RedirectMatch permanent /onlinedocs/gcc/gcc-command-options/.*
> > > https://gcc.gnu.org/onlinedocs/gcc/Invoking-GCC.html
> > > RedirectMatch permanent
> > > /onlinedocs/gcc/c-implementation-defined-behavior/.*
> > > https://gcc.gnu.org/onlinedocs/gcc/C-Implementation.html
> > > RedirectMatch permanent
> > > /onlinedocs/gcc/c++-implementation-defined-behavior/.*
> > > https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Implementation.html
> > > RedirectMatch permanent
> > > /onlinedocs/gcc/extensions-to-the-c-language-family/.*
> > > https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
> > > RedirectMatch permanent
> > > /onlinedocs/gcc/extensions-to-the-c++-language/.*
> > > https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Extensions.html
> > > RedirectMatch permanent /onlinedocs/gcc/gnu-objective-c-features/.*
> > > https://gcc.gnu.org/onlinedocs/gcc/Objective-C.html
> > > RedirectMatch permanent /onlinedocs/gcc/gcov/.*
> > > https://gcc.gnu.org/onlinedocs/gcc/Gcov.html
> > > RedirectMatch permanent
> > > /onlinedocs/gcc/known-causes-of-trouble-with-gcc/.*
> > > https://gcc.gnu.org/onlinedocs/gcc/Trouble.html
> > >
> > > Can we just remove those pages instead?
> 
> All these directories should have been removed two years ago:
> 
> $ ls -1 -d htdocs/onlinedocs/gcc/*/
> htdocs/onlinedocs/gcc/c-implementation-defined-behavior/
> htdocs/onlinedocs/gcc/extensions-to-the-c++-language/
> htdocs/onlinedocs/gcc/extensions-to-the-c-language-family/
> htdocs/onlinedocs/gcc/gcc-command-options/
> htdocs/onlinedocs/gcc/gcov/
> htdocs/onlinedocs/gcc/gnu-objective-c-features/
> htdocs/onlinedocs/gcc/known-causes-of-trouble-with-gcc/
> htdocs/onlinedocs/gcc/language-standards-supported-by-gcc/
> htdocs/onlinedocs/gcc/_sources/
> htdocs/onlinedocs/gcc/_static/