date:20240624

Re: [PATCH] Fortran: fix passing of optional dummy as actual to optional argument [PR55978]

2024-06-24 Thread Mikael Morin


Le 23/06/2024 à 22:58, Harald Anlauf a écrit :

Dear all,

the attached patch fixes issues exhibited by the testcase in comment#19 of 
PR55978.

First, when passing an allocatable optional dummy array to an optional dummy,
we need to prevent accessing the data component of the array when the argument
is not present, and pass a null pointer instead.  This is straightforward.

Second, the case of a missing pointer optional dummy array should have worked,
but the presence check surprisingly did not work as expected at -O0 or -Og,
but at higher optimization levels.  Interestingly, the dump-tree looked right,
but running under gdb or investigating the assembler revealed that the order
of tests in a logical AND expression was opposed to what the tree-dump looked
like.  Replacing TRUTH_AND_EXPR by TRUTH_ANDIF_EXPR and checking the optimized
dump confirmed that this does fix the issue.

Note that the tree-dump is not changed by this replacement.  Does this mean
thar AND and ANDIF currently are not differentiated at this level?


tree-pretty-print.cc's op_symbol_code handles them as:

case TRUTH_AND_EXPR:
case TRUTH_ANDIF_EXPR:
  return "&&";

so no, I don't think they are differentiated.


Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Would it be ok to backport this to 14-branch, too?


Sure, OK for both.
Thanks.

Re: [PATCH] rs6000, change altivec*-runnable.c test file names

2024-06-24 Thread Kewen.Lin

Hi,

on 2024/6/22 00:15, Carl Love wrote:
> GCC maintainers:
> 
> Per the discussion of the dg header changes for test files 
> altivec-1-runnable.c and altivec-2-runnable.c it was decided it would be best 
> to change the names of the two tests to better align them with the tests that 
> they are better aligned with.
> 
> This patch is dependent on the two patches to update the dg arguments for 
> test files altivec-1-runnable.c and altivec-2-runnable.c being accepted and 
> committed before this patch.
> 
> The patch has been tested on Power 10 with no regression failures.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.

OK, thanks!

BR,
Kewen

> 
> Carl 
> 
> --
> rs6000, change altivec*-runnable.c test file names
> 
> Changed the names of the test files.
> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-1-runnable.c: Change the name to
>   altivec-38.c.
>   * gcc.target/powerpc/altivec-2-runnable.c: Change the name to
>   p8vector-builtin-9.c.
> ---
>  .../gcc.target/powerpc/{altivec-1-runnable.c => altivec-38.c} | 0
>  .../powerpc/{altivec-2-runnable.c => p8vector-builtin-9.c}| 0
>  2 files changed, 0 insertions(+), 0 deletions(-)
>  rename gcc/testsuite/gcc.target/powerpc/{altivec-1-runnable.c => 
> altivec-38.c} (100%)
>  rename gcc/testsuite/gcc.target/powerpc/{altivec-2-runnable.c => 
> p8vector-builtin-9.c} (100%)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-38.c
> similarity index 100%
> rename from gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> rename to gcc/testsuite/gcc.target/powerpc/altivec-38.c
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c
> similarity index 100%
> rename from gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> rename to gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c

Re: [PATCH] Fortran: fix passing of optional dummy as actual to optional argument [PR55978]

2024-06-24 Thread Andreas Schwab

On Jun 24 2024, Mikael Morin wrote:

> tree-pretty-print.cc's op_symbol_code handles them as:
>
> case TRUTH_AND_EXPR:
> case TRUTH_ANDIF_EXPR:
>   return "&&";
>
> so no, I don't think they are differentiated.

Only because C does not have a TRUTH_AND_EXPR operator.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH 6/6] Add a late-combine pass [PR106594]

2024-06-24 Thread Richard Sandiford

Richard Biener  writes:
> On Sat, Jun 22, 2024 at 6:50 PM Richard Sandiford
>> The traditional (and IMO correct) way to handle this is to make the
>> pattern reserve the temporary registers that it needs, using match_scratches.
>> rs6000 has many examples of this.  E.g.:
>>
>> (define_insn_and_split "@ieee_128bit_vsx_neg2"
>>   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
>> (neg:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
>>(clobber (match_scratch:V16QI 2 "=v"))]
>>   "TARGET_FLOAT128_TYPE && !TARGET_FLOAT128_HW"
>>   "#"
>>   "&& 1"
>>   [(parallel [(set (match_dup 0)
>>(neg:IEEE128 (match_dup 1)))
>>   (use (match_dup 2))])]
>> {
>>   if (GET_CODE (operands[2]) == SCRATCH)
>> operands[2] = gen_reg_rtx (V16QImode);
>>
>>   emit_insn (gen_ieee_128bit_negative_zero (operands[2]));
>> }
>>   [(set_attr "length" "8")
>>(set_attr "type" "vecsimple")])
>>
>> Before RA, this is just:
>>
>>   (set ...)
>>   (clobber (scratch:V16QI))
>>
>> and the split creates a new register.  After RA, operand 2 provides
>> the required temporary register:
>>
>>   (set ...)
>>   (clobber (reg:V16QI TMP))
>>
>> Another approach is to add can_create_pseudo_p () to the define_insn
>> condition (rather than the split condition).  But IMO that's an ICE
>> trap, since insns that have already been matched & accepted shouldn't
>> suddenly become invalid if recog is reattempted later.
>
> What about splitting immediately in late-combine?  Wouldn't that possibly
> allow more combinations to immediately happen?

It would be difficult to guarantee termination.  Often the split
instructions can be immediately recombined back to the original
instruction.  Even if we guard against that happening directly,
it'd be difficult to prove that it can't happen indirectly.

We might also run into issues like PR101523.

Combine uses define_splits (without define_insns) for 3->2 combinations,
but the current late-combine optimisation is kind-of 1/N+1->1 x N.

Personally, I think we should allow targets to use the .md file to
define match.pd-style simplification rules involving unspecs, but there
were objections to that when I last suggested it.

Thanks,
Richard

Ping^3 [PATCHv5] Optab: add isnormal_optab for __builtin_isnormal

2024-06-24 Thread HAO CHEN GUI

Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html

Thanks
Gui Haochen

在 2024/6/17 13:30, HAO CHEN GUI 写道:
> Hi,
>   Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/6/3 10:37, HAO CHEN GUI 写道:
>> Hi,
>>   All issues were addressed. Gently ping it.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html
>>
>> Thanks
>> Gui Haochen
>>
>>
>> 在 2024/5/29 14:36, HAO CHEN GUI 写道:
>>> Hi,
>>>   This patch adds an optab for __builtin_isnormal. The normal check can be
>>> implemented on rs6000 by a single instruction. It needs an optab to be
>>> expanded to the certain sequence of instructions.
>>>
>>>   The subsequent patches will implement the expand on rs6000.
>>>
>>>   Compared to previous version, the main change is to specify return
>>> value of the optab should be either 0 or 1.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652865.html
>>>
>>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>>> regressions. Is this OK for trunk?
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>> ChangeLog
>>> optab: Add isnormal_optab for isnormal builtin
>>>
>>> gcc/
>>> * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
>>> for isnormal builtin.
>>> * optabs.def (isnormal_optab): New.
>>> * doc/md.texi (isnormal): Document.
>>>
>>>
>>> patch.diff
>>> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
>>> index 53e9d210541..89ba56abf17 100644
>>> --- a/gcc/builtins.cc
>>> +++ b/gcc/builtins.cc
>>> @@ -2463,6 +2463,8 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>>>builtin_optab = isfinite_optab;
>>>break;
>>>  case BUILT_IN_ISNORMAL:
>>> +  builtin_optab = isnormal_optab;
>>> +  break;
>>>  CASE_FLT_FN (BUILT_IN_FINITE):
>>>  case BUILT_IN_FINITED32:
>>>  case BUILT_IN_FINITED64:
>>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>>> index 3eb4216141e..4fd7da095fe 100644
>>> --- a/gcc/doc/md.texi
>>> +++ b/gcc/doc/md.texi
>>> @@ -8563,6 +8563,12 @@ Return 1 if operand 1 is a finite floating point 
>>> number and 0
>>>  otherwise.  @var{m} is a scalar floating point mode.  Operand 0
>>>  has mode @code{SImode}, and operand 1 has mode @var{m}.
>>>
>>> +@cindex @code{isnormal@var{m}2} instruction pattern
>>> +@item @samp{isnormal@var{m}2}
>>> +Return 1 if operand 1 is a normal floating point number and 0
>>> +otherwise.  @var{m} is a scalar floating point mode.  Operand 0
>>> +has mode @code{SImode}, and operand 1 has mode @var{m}.
>>> +
>>>  @end table
>>>
>>>  @end ifset
>>> diff --git a/gcc/optabs.def b/gcc/optabs.def
>>> index dcd77315c2a..3c401fc0b4c 100644
>>> --- a/gcc/optabs.def
>>> +++ b/gcc/optabs.def
>>> @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
>>>  OPTAB_D (ilogb_optab, "ilogb$a2")
>>>  OPTAB_D (isinf_optab, "isinf$a2")
>>>  OPTAB_D (isfinite_optab, "isfinite$a2")
>>> +OPTAB_D (isnormal_optab, "isnormal$a2")
>>>  OPTAB_D (issignaling_optab, "issignaling$a2")
>>>  OPTAB_D (ldexp_optab, "ldexp$a3")
>>>  OPTAB_D (log10_optab, "log10$a2")

[PATCH] rs6000: Fix wrong RTL patterns for vector merge high/low short on LE

2024-06-24 Thread Kewen.Lin

Hi,

Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low short, which are altivec_vmrg[hl]h.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs.  These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order.  So it's
mapped into vmrghh on BE while vmrglh on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
16-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence.  The newly constructed test case
pr106069-2.c is a typical example for this issue on element type
short.

So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns.  With the
proposed patch, the expanders like altivec_vmrghh expands
into altivec_vmrghh_direct_be or altivec_vmrglh_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.

Following [1], this one is for vector element 16-bit size,
bootstrapped and regtested on powerpc64-linux-gnu P8/P9
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this two days later if no objections, thanks!

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655239.html

Co-authored-by: Xionghu Luo 

PR target/106069
PR target/115355

gcc/ChangeLog:

* config/rs6000/altivec.md (altivec_vmrghh_direct): Rename to ...
(altivec_vmrghh_direct_be): ... this.  Add condition BYTES_BIG_ENDIAN.
(altivec_vmrghh_direct_le): New define_insn.
(altivec_vmrglh_direct): Rename to ...
(altivec_vmrglh_direct_be): ... this.  Add condition BYTES_BIG_ENDIAN.
(altivec_vmrglh_direct_le): New define_insn.
(altivec_vmrghh): Adjust by calling gen_altivec_vmrghh_direct_be
for BE and gen_altivec_vmrglh_direct_le for LE.
(altivec_vmrglh): Adjust by calling gen_altivec_vmrglh_direct_be
for BE and gen_altivec_vmrghh_direct_le for LE.
(vec_widen_umult_hi_v16qi): Adjust the call to
gen_altivec_vmrghh_direct by gen_altivec_vmrghh for BE
and by gen_altivec_vmrglh for LE.
(vec_widen_smult_hi_v16qi): Likewise.
(vec_widen_umult_lo_v16qi): Adjust the call to
gen_altivec_vmrglh_direct by gen_altivec_vmrglh for BE
and by gen_altivec_vmrghh for LE.
(vec_widen_smult_lo_v16qi): Likewise.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghh_direct by
CODE_FOR_altivec_vmrghh_direct_be for BE and
CODE_FOR_altivec_vmrghh_direct_le for LE.  And replace
CODE_FOR_altivec_vmrglh_direct by
CODE_FOR_altivec_vmrglh_direct_be for BE and
CODE_FOR_altivec_vmrglh_direct_le for LE.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106069-2.c: New test.
---
 gcc/config/rs6000/altivec.md  | 76 +--
 gcc/config/rs6000/rs6000.cc   |  8 +-
 gcc/testsuite/gcc.target/powerpc/pr106069-2.c | 37 +
 3 files changed, 94 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069-2.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index a0e8a35b843..5af9bf920a2 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1203,17 +1203,18 @@ (define_expand "altivec_vmrghh"
(use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-   : gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+emit_insn (
+  gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
+  else
+emit_insn (
+  gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })

-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
-(vec_select:V8HI
+   (vec_select:V8HI
  (vec_concat:V16HI
(match_operand:V8HI 1 "register_operand" "v")
(match_operand:V8HI 2 "register_operand" "v"))
@@ -1221,7 +1222,21 @@ (define_insn "altivec_vmrghh_direct"

Re: [PATCH 10/52] jit: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-24 Thread Kewen.Lin

Hi Dave,

May I ask if you still have some concerns on this patch with some
replies to your previous questions?

BR,
Kewen

on 2024/6/14 10:16, Kewen.Lin wrote:
> Hi David,
> 
> on 2024/6/13 21:44, David Malcolm wrote:
>> On Sun, 2024-06-02 at 22:01 -0500, Kewen Lin wrote:
>>> Joseph pointed out "floating types should have their mode,
>>> not a poorly defined precision value" in the discussion[1],
>>> as he and Richi suggested, the existing macros
>>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>>> hook mode_for_floating_type.  Unlike the other FEs, for the
>>> uses in recording::memento_of_get_type::get_size, since
>>> {float,{,long_}double}_type_node haven't been initialized
>>> yet, this is to replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>>> with calling hook targetm.c.mode_for_floating_type.
>>>
>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>>
>>> gcc/jit/ChangeLog:
>>>
>>> * jit-recording.cc
>>> (recording::memento_of_get_type::get_size): Update
>>> macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
>>> targetm.c.mode_for_floating_type with
>>> TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE.
>>> ---
>>>  gcc/jit/jit-recording.cc | 12 
>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
>>> index 68a2e860c1f..7719b898e57 100644
>>> --- a/gcc/jit/jit-recording.cc
>>> +++ b/gcc/jit/jit-recording.cc
>>> @@ -21,7 +21,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "config.h"
>>>  #include "system.h"
>>>  #include "coretypes.h"
>>> -#include "tm.h"
>>> +#include "target.h"
>>>  #include "pretty-print.h"
>>>  #include "toplev.h"
>>>  
>>> @@ -2353,6 +2353,7 @@ size_t
>>>  recording::memento_of_get_type::get_size ()
>>>  {
>>>    int size;
>>> +  machine_mode m;
>>>    switch (m_kind)
>>>  {
>>>  case GCC_JIT_TYPE_VOID:
>>> @@ -2399,13 +2400,16 @@ recording::memento_of_get_type::get_size ()
>>>    size = 128;
>>>    break;
>>>  case GCC_JIT_TYPE_FLOAT:
>>> -  size = FLOAT_TYPE_SIZE;
>>> +  m = targetm.c.mode_for_floating_type (TI_FLOAT_TYPE);
>>> +  size = GET_MODE_PRECISION (m).to_constant ();
>>>    break;
>>>  case GCC_JIT_TYPE_DOUBLE:
>>> -  size = DOUBLE_TYPE_SIZE;
>>> +  m = targetm.c.mode_for_floating_type (TI_DOUBLE_TYPE);
>>> +  size = GET_MODE_PRECISION (m).to_constant ();
>>>    break;
>>>  case GCC_JIT_TYPE_LONG_DOUBLE:
>>> -  size = LONG_DOUBLE_TYPE_SIZE;
>>> +  m = targetm.c.mode_for_floating_type (TI_LONG_DOUBLE_TYPE);
>>> +  size = GET_MODE_PRECISION (m).to_constant ();
>>>    break;
>>>  case GCC_JIT_TYPE_SIZE_T:
>>>    size = MAX_BITS_PER_WORD;
>>
>> [CCing jit mailing list]
>>
>> Thanks for the patch; sorry for the delay in responding.
>>
>> Did your testing include jit?  Note that --enable-languages=all does
>> *not* include it (due to it needing --enable-host-shared).
> 
> Thanks for the hints!  Yes, as noted in the cover letter, I did test jit.
> Initially I used TYPE_PRECISION ({float,{long_,}double_type_node) to
> replace these just like what I proposed for the other FE changes, but the
> testing showed some failures on test-combination.c etc., by looking into
> them, I realized that this call recording::memento_of_get_type::get_size
> can happen before when we set up those type nodes.  Then I had to use the
> current approach with the new hook, it made all failures gone (no
> regressions).  btw, test result comparison showed some more lines with
> "NA->PASS: test-threads.c.exe", since it's positive, I didn't look into
> it.
> 
>>
>> The jit::recording code runs *very* early - before toplev::main.  For
>> example, a call to gcc_jit_type_get_size can trigger the above code
>> path before toplev::main has run.
>>
>> target.h says each target should have a:
>>
>>   struct gcc_target targetm = TARGET_INITIALIZER;
>>
>> Has targetm.c.mode_for_floating_type been initialized enough by that
>> static initialization?  
> 
> It depends on how to define "enough".  The hook has been initialized
> as you pointed out, I just debugged it and confirmed target specific
> hook was called as expected (rs6000_c_mode_for_floating_type on Power)
> when this jit::recording function gets called.  If "enough" refers to
> something like command line options, it's not ready.
> 
>> Could the mode_for_floating_type hook be
>> relying on some target-specific dynamic initialization that hasn't run
>> yet?  (e.g. taking account of command-line options?)
>>
> 
> Yes, it could.  Like rs6000 port, the hook checks rs6000_long_double_type_size
> for long double (it's related to command line option -mlong-double-x) and
> some other targets like i386, also would like to check TARGET_LONG_DOUBLE_64
> and TARGET_LONG_DOUBLE_128.  But I think it isn't worse than before, without
> this change (with the previous macro), we used to define the macro with
> the things related to this c

Re: [PATCH] Fix MinGW option -mcrtdll=

2024-06-24 Thread Jonathan Yong


On 6/23/24 16:40, Pali Rohár wrote:

Add missing msvcr40* and msvcrtd* cases to CPP_SPEC and
document missing _UCRT macro and msvcr71* case.

Fixes commit 453cb585f0f8673a5d69d1b420ffd4b3f53aca00.

Thanks, pushed to master branch.

[PATCH] Implement devirtualize by typeid optimization pass

2024-06-24 Thread user202729

This patch tries to address the issue in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115413 .

It does so by adding another pass devirtualize-typeid which does the same thing 
as the value range propagation pass, except

* in the true branch of a conditional of the form `typeid (a) == typeid (b)`, 
it is assumed that the vptr are equal.

* the constant folding is only performed for member function calls.

This must be done into a separate pass because it is desirable that accesses to 
the pointer's value (which is possible with the pmf extension of gcc) print out 
the correct pointer value.

Reviews are appreciated, in particular with respect to the following points:

* Currently only expressions written exactly `typeid (a) == typeid (b)` will be 
detected, and not expressions such as `auto &tmp = typeid (a); if (tmp == 
typeid (b)) ...`

* The second argument of `TYPEID_WITH_PTR_TAG` is put in, even though 
evaluating it might trigger a segmentation fault when the pointer is NULL, only 
to be thrown away in the expansion pass. We must make sure that it never get 
evaluated in the final code.
From dcfdcc86cb6ce9e9ec390f031c9ab1563506cdee Mon Sep 17 00:00:00 2001
From: user202729 
Date: Mon, 24 Jun 2024 17:15:58 +0800
Subject: [PATCH] Implement devirtualize by typeid optimization pass

	PR c++/115413

gcc/cp/ChangeLog:

	* class.cc (build_vtbl_ref): Split off the function into two sections, one to get the vtbl and one to get the element.
	(build_vtbl_element_from_vtbl_ref): Splitted off from the above.
	* cp-tree.h (build_vtbl_ref): Modify declaration accordingly.
	(build_vtbl_element_from_vtbl_ref): Likewise.
	* rtti.cc (get_tinfo_decl_dynamic): Now also return the vptr of the type.
	(tag_typeid_with_vptr): New function.
	(build_typeid): Modify to tag result with vptr.
	(get_vtable_ptr): New function.
	(get_typeid): Modify to tag result with vptr.
	(tinfo_base_init): Part moved to get_vtable_ptr.
	* typeck.cc (build_x_binary_op): Detect comparison of typeid.

gcc/ChangeLog:

	* gimple-range-cache.cc (ranger_cache::ranger_cache): Add flag for devirtualize-typeid pass.
	* gimple-range-cache.h (class ranger_cache): Likewise.
	* gimple-range-gori.cc (gori_map::gori_map): Likewise.
	(gori_compute::gori_compute): Likewise.
	* gimple-range-gori.h (class gori_map): Likewise.
	(class gori_compute): Likewise.
	* gimple-range.cc (gimple_ranger::gimple_ranger): Likewise.
	(enable_ranger): Likewise.
	* gimple-range.h (enable_ranger): Likewise.
	* internal-fn.cc (expand_TYPEID_WITH_VPTR_TAG): New function.
	(expand_VPTR_EQUIV): New function.
	* internal-fn.def (TYPEID_WITH_VPTR_TAG): New internal function for devirtualize-typeid pass.
	(VPTR_EQUIV): Likewise.
	* internal-fn.h (expand_TYPEID_WITH_VPTR_TAG): New function declaration.
	(expand_VPTR_EQUIV): New function declaration.
	* passes.def (pass_devirtualize_by_typeid): New pass.
	* timevar.def (TV_TREE_DEVIRT_TYPEID): Likewise.
	* tree-pass.h (make_pass_devirtualize_by_typeid): Likewise.
	* tree-vrp.cc (execute_ranger_vrp): Add flag for devirtualize-typeid pass.
	(make_pass_devirtualize_by_typeid): New function to create the devirtualize-typeid pass info.
	* value-pointer-equiv.h (class pointer_equiv_analyzer): Add flag for devirtualize-typeid pass.
	* value-pointer-equiv.cc (pointer_equiv_analyzer::pointer_equiv_analyzer): Likewise.
	(pointer_equiv_analyzer::visit_edge): Specially handle vptr equivalence.
	* value-query.cc (range_query::create_gori): Add flag for devirtualize-typeid pass.
	* value-query.h (range_query::create_gori): Modify declaration accordingly.

gcc/testsuite/ChangeLog:

	* g++.dg/tree-ssa/devirt-typeid.C: New test.
---
 gcc/cp/class.cc   | 30 +--
 gcc/cp/cp-tree.h  |  2 +
 gcc/cp/rtti.cc| 87 +++
 gcc/cp/typeck.cc  | 36 
 gcc/gimple-range-cache.cc |  6 +-
 gcc/gimple-range-cache.h  |  3 +-
 gcc/gimple-range-gori.cc  | 10 ++-
 gcc/gimple-range-gori.h   |  6 +-
 gcc/gimple-range.cc   |  9 +-
 gcc/gimple-range.h|  5 +-
 gcc/internal-fn.cc| 22 +
 gcc/internal-fn.def   |  4 +
 gcc/internal-fn.h |  2 +
 gcc/passes.def|  1 +
 gcc/testsuite/g++.dg/tree-ssa/devirt-typeid.C | 56 
 gcc/timevar.def   |  1 +
 gcc/tree-pass.h   |  1 +
 gcc/tree-vrp.cc   | 37 ++--
 gcc/value-pointer-equiv.cc| 26 +-
 gcc/value-pointer-equiv.h |  3 +-
 gcc/value-query.cc|  8 +-
 gcc/value-query.h |  3 +-
 22 files changed, 307 insertions(+), 51 deletions(-)
 create mode 100644 gcc/testsuite/g++.d

[r15-1575 Regression] FAIL: gcc.target/i386/pr101716.c scan-assembler-not movl[\\t ][^\\n]*eax on Linux/x86_64

2024-06-24 Thread haochen.jiang

On Linux/x86_64,

ea8061f46a301797e7ba33b52e3b4713fb8e6b48 is the first bad commit
commit ea8061f46a301797e7ba33b52e3b4713fb8e6b48
Author: Haochen Gui 
Date:   Mon Jun 24 13:12:51 2024 +0800

fwprop: invoke change_is_worthwhile to judge if a replacement is worthwhile

caused

FAIL: gcc.target/i386/pr101716.c scan-assembler leal[\\t ][^\\n]*eax
FAIL: gcc.target/i386/pr101716.c scan-assembler-not movl[\\t ][^\\n]*eax

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-1575/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101716.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101716.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[PATCH] Add -finline-functions-aggressive option [PR114531]

2024-06-24 Thread Malladi, Rama

From: Rama Malladi 

Signed-off-by: Rama Malladi 
---
gcc/common.opt  |  5 +
gcc/doc/invoke.texi | 18 +-
gcc/opts.cc | 17 -
3 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index f2bc47fdc5e..ce95175c1e4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1961,6 +1961,11 @@ finline-functions-called-once
Common Var(flag_inline_functions_called_once) Optimization
Integrate functions only required by their single caller.

+finline-functions-aggressive
+Common Var(flag_inline_functions_aggressive) Init(0) Optimization
+Aggressively integrate functions not declared \"inline\" into their callers 
when profitable.
+This option selects the same inlining heuristics as \"-O3\".
+
finline-limit-
Common RejectNegative Joined Alias(finline-limit=)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c790e2f3518..7dc5c5ab433 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -570,8 +570,8 @@ Objective-C and Objective-C++ Dialects}.
-fgcse-sm  -fhoist-adjacent-loads  -fif-conversion
-fif-conversion2  -findirect-inlining
-finline-stringops[=@var{fn}]
--finline-functions  -finline-functions-called-once  -finline-limit=@var{n}
--finline-small-functions -fipa-modref -fipa-cp  -fipa-cp-clone
+-finline-functions  -finline-functions-aggressive  
-finline-functions-called-once
+-finline-limit=@var{n}  -finline-small-functions -fipa-modref -fipa-cp  
-fipa-cp-clone
-fipa-bit-cp  -fipa-vrp  -fipa-pta  -fipa-profile  -fipa-pure-const
-fipa-reference  -fipa-reference-addressable
-fipa-stack-alignment  -fipa-icf  -fira-algorithm=@var{algorithm}
@@ -12625,9 +12625,9 @@ designed to reduce code size.
Disregard strict standards compliance.  @option{-Ofast} enables all
@option{-O3} optimizations.  It also enables optimizations that are not
valid for all standard-compliant programs.
-It turns on @option{-ffast-math}, @option{-fallow-store-data-races}
-and the Fortran-specific @option{-fstack-arrays}, unless
-@option{-fmax-stack-var-size} is specified, and @option{-fno-protect-parens}.
+It turns on @option{-ffast-math}, @option{-finline-functions-aggressive},
+@option{-fallow-store-data-races} and the Fortran-specific 
@option{-fstack-arrays},
+unless @option{-fmax-stack-var-size} is specified, and 
@option{-fno-protect-parens}.
It turns off @option{-fsemantic-interposition}.

@opindex Og
@@ -12793,6 +12793,14 @@ assembler code in its own right.
Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}.  Also enabled
by @option{-fprofile-use} and @option{-fauto-profile}.

+@opindex finline-functions-aggressive
+@item -finline-functions-aggressive
+Aggressively integrate functions not declared @code{inline} into their callers 
when
+profitable. This option selects the same inlining heuristics as @option{-O3}.
+
+Enabled at levels @option{-O3}, @option{-Ofast}, but not @option{-Og},
+@option{-O1}, @option{-O2}, @option{-Os}.
+
@opindex finline-functions-called-once
@item -finline-functions-called-once
Consider all @code{static} functions called once for inlining into their
diff --git a/gcc/opts.cc b/gcc/opts.cc
index 1b1b46455af..729f2831e67 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -700,11 +700,7 @@ static const struct default_options 
default_options_table[] =
 { OPT_LEVELS_3_PLUS, OPT_fversion_loops_for_strides, NULL, 1 },

 /* -O3 parameters.  */
-{ OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },
-{ OPT_LEVELS_3_PLUS, OPT__param_early_inlining_insns_, NULL, 14 },
-{ OPT_LEVELS_3_PLUS, OPT__param_inline_heuristics_hint_percent_, NULL, 600 
},
-{ OPT_LEVELS_3_PLUS, OPT__param_inline_min_speedup_, NULL, 15 },
-{ OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_single_, NULL, 200 },
+{ OPT_LEVELS_3_PLUS, OPT_finline_functions_aggressive, NULL, 1 },

 /* -Ofast adds optimizations to -O3.  */
 { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
@@ -3037,6 +3033,17 @@ common_handle_option (struct gcc_options *opts,
   value / 2);
   break;

+case OPT_finline_functions_aggressive:
+  if(opts->x_flag_inline_functions_aggressive)
+   {
+ opts->x_param_max_inline_insns_auto = 30;
+ opts->x_param_early_inlining_insns = 14;
+ opts->x_param_inline_heuristics_hint_percent = 600;
+ opts->x_param_inline_min_speedup = 15;
+ opts->x_param_max_inline_insns_single = 200;
+   }
+  break;
+
 case OPT_finstrument_functions_exclude_function_list_:
   add_comma_separated_to_vector
(&opts->x_flag_instrument_functions_exclude_functions, arg);
--
2.45.1

Re: Ping^3 [PATCHv5] Optab: add isfinite_optab for __builtin_isfinite

2024-06-24 Thread Richard Biener

On Mon, Jun 24, 2024 at 3:38 AM HAO CHEN GUI  wrote:
>
> Hi,
>   Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652991.html

OK

> Thanks
> Gui Haochen
>
> 在 2024/6/17 13:29, HAO CHEN GUI 写道:
> > Hi,
> >   Gently ping it.
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652991.html
> >
> > Thanks
> > Gui Haochen
> >
> > 在 2024/6/3 10:37, HAO CHEN GUI 写道:
> >> Hi,
> >>   All issues were addressed. Gently ping it.
> >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652991.html
> >>
> >> Thanks
> >> Gui Haochen
> >>
> >> 在 2024/5/29 14:36, HAO CHEN GUI 写道:
> >>> Hi,
> >>>   This patch adds an optab for __builtin_isfinite. The finite check can be
> >>> implemented on rs6000 by a single instruction. It needs an optab to be
> >>> expanded to the certain sequence of instructions.
> >>>
> >>>   The subsequent patches will implement the expand on rs6000.
> >>>
> >>>   Compared to previous version, the main change is to specify return
> >>> value of the optab should be either 0 or 1.
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652864.html
> >>>
> >>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> >>> regressions. Is this OK for trunk?
> >>>
> >>> Thanks
> >>> Gui Haochen
> >>>
> >>> ChangeLog
> >>> optab: Add isfinite_optab for isfinite builtin
> >>>
> >>> gcc/
> >>> * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
> >>> for isfinite builtin.
> >>> * optabs.def (isfinite_optab): New.
> >>> * doc/md.texi (isfinite): Document.
> >>>
> >>>
> >>> patch.diff
> >>> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> >>> index f8d94c4b435..53e9d210541 100644
> >>> --- a/gcc/builtins.cc
> >>> +++ b/gcc/builtins.cc
> >>> @@ -2459,8 +2459,10 @@ interclass_mathfn_icode (tree arg, tree fndecl)
> >>>errno_set = true; builtin_optab = ilogb_optab; break;
> >>>  CASE_FLT_FN (BUILT_IN_ISINF):
> >>>builtin_optab = isinf_optab; break;
> >>> -case BUILT_IN_ISNORMAL:
> >>>  case BUILT_IN_ISFINITE:
> >>> +  builtin_optab = isfinite_optab;
> >>> +  break;
> >>> +case BUILT_IN_ISNORMAL:
> >>>  CASE_FLT_FN (BUILT_IN_FINITE):
> >>>  case BUILT_IN_FINITED32:
> >>>  case BUILT_IN_FINITED64:
> >>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> >>> index 5730bda80dc..3eb4216141e 100644
> >>> --- a/gcc/doc/md.texi
> >>> +++ b/gcc/doc/md.texi
> >>> @@ -8557,6 +8557,12 @@ operand 2, greater than operand 2 or is unordered 
> >>> with operand 2.
> >>>
> >>>  This pattern is not allowed to @code{FAIL}.
> >>>
> >>> +@cindex @code{isfinite@var{m}2} instruction pattern
> >>> +@item @samp{isfinite@var{m}2}
> >>> +Return 1 if operand 1 is a finite floating point number and 0
> >>> +otherwise.  @var{m} is a scalar floating point mode.  Operand 0
> >>> +has mode @code{SImode}, and operand 1 has mode @var{m}.
> >>> +
> >>>  @end table
> >>>
> >>>  @end ifset
> >>> diff --git a/gcc/optabs.def b/gcc/optabs.def
> >>> index ad14f9328b9..dcd77315c2a 100644
> >>> --- a/gcc/optabs.def
> >>> +++ b/gcc/optabs.def
> >>> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
> >>>  OPTAB_D (hypot_optab, "hypot$a3")
> >>>  OPTAB_D (ilogb_optab, "ilogb$a2")
> >>>  OPTAB_D (isinf_optab, "isinf$a2")
> >>> +OPTAB_D (isfinite_optab, "isfinite$a2")
> >>>  OPTAB_D (issignaling_optab, "issignaling$a2")
> >>>  OPTAB_D (ldexp_optab, "ldexp$a3")
> >>>  OPTAB_D (log10_optab, "log10$a2")

Re: Ping^3 [PATCHv5] Optab: add isnormal_optab for __builtin_isnormal

2024-06-24 Thread Richard Biener

On Mon, Jun 24, 2024 at 3:39 AM HAO CHEN GUI  wrote:
>
> Hi,
>   Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html

OK

> Thanks
> Gui Haochen
>
> 在 2024/6/17 13:30, HAO CHEN GUI 写道:
> > Hi,
> >   Gently ping it.
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html
> >
> > Thanks
> > Gui Haochen
> >
> > 在 2024/6/3 10:37, HAO CHEN GUI 写道:
> >> Hi,
> >>   All issues were addressed. Gently ping it.
> >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html
> >>
> >> Thanks
> >> Gui Haochen
> >>
> >>
> >> 在 2024/5/29 14:36, HAO CHEN GUI 写道:
> >>> Hi,
> >>>   This patch adds an optab for __builtin_isnormal. The normal check can be
> >>> implemented on rs6000 by a single instruction. It needs an optab to be
> >>> expanded to the certain sequence of instructions.
> >>>
> >>>   The subsequent patches will implement the expand on rs6000.
> >>>
> >>>   Compared to previous version, the main change is to specify return
> >>> value of the optab should be either 0 or 1.
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652865.html
> >>>
> >>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> >>> regressions. Is this OK for trunk?
> >>>
> >>> Thanks
> >>> Gui Haochen
> >>>
> >>> ChangeLog
> >>> optab: Add isnormal_optab for isnormal builtin
> >>>
> >>> gcc/
> >>> * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
> >>> for isnormal builtin.
> >>> * optabs.def (isnormal_optab): New.
> >>> * doc/md.texi (isnormal): Document.
> >>>
> >>>
> >>> patch.diff
> >>> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> >>> index 53e9d210541..89ba56abf17 100644
> >>> --- a/gcc/builtins.cc
> >>> +++ b/gcc/builtins.cc
> >>> @@ -2463,6 +2463,8 @@ interclass_mathfn_icode (tree arg, tree fndecl)
> >>>builtin_optab = isfinite_optab;
> >>>break;
> >>>  case BUILT_IN_ISNORMAL:
> >>> +  builtin_optab = isnormal_optab;
> >>> +  break;
> >>>  CASE_FLT_FN (BUILT_IN_FINITE):
> >>>  case BUILT_IN_FINITED32:
> >>>  case BUILT_IN_FINITED64:
> >>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> >>> index 3eb4216141e..4fd7da095fe 100644
> >>> --- a/gcc/doc/md.texi
> >>> +++ b/gcc/doc/md.texi
> >>> @@ -8563,6 +8563,12 @@ Return 1 if operand 1 is a finite floating point 
> >>> number and 0
> >>>  otherwise.  @var{m} is a scalar floating point mode.  Operand 0
> >>>  has mode @code{SImode}, and operand 1 has mode @var{m}.
> >>>
> >>> +@cindex @code{isnormal@var{m}2} instruction pattern
> >>> +@item @samp{isnormal@var{m}2}
> >>> +Return 1 if operand 1 is a normal floating point number and 0
> >>> +otherwise.  @var{m} is a scalar floating point mode.  Operand 0
> >>> +has mode @code{SImode}, and operand 1 has mode @var{m}.
> >>> +
> >>>  @end table
> >>>
> >>>  @end ifset
> >>> diff --git a/gcc/optabs.def b/gcc/optabs.def
> >>> index dcd77315c2a..3c401fc0b4c 100644
> >>> --- a/gcc/optabs.def
> >>> +++ b/gcc/optabs.def
> >>> @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
> >>>  OPTAB_D (ilogb_optab, "ilogb$a2")
> >>>  OPTAB_D (isinf_optab, "isinf$a2")
> >>>  OPTAB_D (isfinite_optab, "isfinite$a2")
> >>> +OPTAB_D (isnormal_optab, "isnormal$a2")
> >>>  OPTAB_D (issignaling_optab, "issignaling$a2")
> >>>  OPTAB_D (ldexp_optab, "ldexp$a3")
> >>>  OPTAB_D (log10_optab, "log10$a2")

Re: [PATCH 6/6] Add a late-combine pass [PR106594]

2024-06-24 Thread Richard Biener

On Mon, Jun 24, 2024 at 10:03 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Sat, Jun 22, 2024 at 6:50 PM Richard Sandiford
> >> The traditional (and IMO correct) way to handle this is to make the
> >> pattern reserve the temporary registers that it needs, using 
> >> match_scratches.
> >> rs6000 has many examples of this.  E.g.:
> >>
> >> (define_insn_and_split "@ieee_128bit_vsx_neg2"
> >>   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
> >> (neg:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
> >>(clobber (match_scratch:V16QI 2 "=v"))]
> >>   "TARGET_FLOAT128_TYPE && !TARGET_FLOAT128_HW"
> >>   "#"
> >>   "&& 1"
> >>   [(parallel [(set (match_dup 0)
> >>(neg:IEEE128 (match_dup 1)))
> >>   (use (match_dup 2))])]
> >> {
> >>   if (GET_CODE (operands[2]) == SCRATCH)
> >> operands[2] = gen_reg_rtx (V16QImode);
> >>
> >>   emit_insn (gen_ieee_128bit_negative_zero (operands[2]));
> >> }
> >>   [(set_attr "length" "8")
> >>(set_attr "type" "vecsimple")])
> >>
> >> Before RA, this is just:
> >>
> >>   (set ...)
> >>   (clobber (scratch:V16QI))
> >>
> >> and the split creates a new register.  After RA, operand 2 provides
> >> the required temporary register:
> >>
> >>   (set ...)
> >>   (clobber (reg:V16QI TMP))
> >>
> >> Another approach is to add can_create_pseudo_p () to the define_insn
> >> condition (rather than the split condition).  But IMO that's an ICE
> >> trap, since insns that have already been matched & accepted shouldn't
> >> suddenly become invalid if recog is reattempted later.
> >
> > What about splitting immediately in late-combine?  Wouldn't that possibly
> > allow more combinations to immediately happen?
>
> It would be difficult to guarantee termination.  Often the split
> instructions can be immediately recombined back to the original
> instruction.  Even if we guard against that happening directly,
> it'd be difficult to prove that it can't happen indirectly.
>
> We might also run into issues like PR101523.
>
> Combine uses define_splits (without define_insns) for 3->2 combinations,
> but the current late-combine optimisation is kind-of 1/N+1->1 x N.
>
> Personally, I think we should allow targets to use the .md file to
> define match.pd-style simplification rules involving unspecs, but there
> were objections to that when I last suggested it.

Isn't that what basically "combine-helper" patterns do to some extent?

Richard.

>
> Thanks,
> Richard

Re: [PATCH] Add -finline-functions-aggressive option [PR114531]

2024-06-24 Thread Richard Biener

On Mon, Jun 24, 2024 at 1:18 PM Malladi, Rama  wrote:
>
> From: Rama Malladi 

Hmm, if we offer the ability to set -O3 inline limits why wouldn't we
offer a way to set -O2 inline limits for example with -O3?  So ... wouldn't
a -finline-limit={default,O2,O3} option be a more generic and
extensible way to achieve what the patch does?

Yeah, it conflicts somewhat with the existing -finline-limit[-=] flags,
so possibly another name (-finline-as=O3?) is needed.

Richard.

> Signed-off-by: Rama Malladi 
> ---
> gcc/common.opt  |  5 +
> gcc/doc/invoke.texi | 18 +-
> gcc/opts.cc | 17 -
> 3 files changed, 30 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index f2bc47fdc5e..ce95175c1e4 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1961,6 +1961,11 @@ finline-functions-called-once
> Common Var(flag_inline_functions_called_once) Optimization
> Integrate functions only required by their single caller.
>
> +finline-functions-aggressive
> +Common Var(flag_inline_functions_aggressive) Init(0) Optimization
> +Aggressively integrate functions not declared \"inline\" into their callers 
> when profitable.
> +This option selects the same inlining heuristics as \"-O3\".
> +
> finline-limit-
> Common RejectNegative Joined Alias(finline-limit=)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c790e2f3518..7dc5c5ab433 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -570,8 +570,8 @@ Objective-C and Objective-C++ Dialects}.
> -fgcse-sm  -fhoist-adjacent-loads  -fif-conversion
> -fif-conversion2  -findirect-inlining
> -finline-stringops[=@var{fn}]
> --finline-functions  -finline-functions-called-once  -finline-limit=@var{n}
> --finline-small-functions -fipa-modref -fipa-cp  -fipa-cp-clone
> +-finline-functions  -finline-functions-aggressive  
> -finline-functions-called-once
> +-finline-limit=@var{n}  -finline-small-functions -fipa-modref -fipa-cp  
> -fipa-cp-clone
> -fipa-bit-cp  -fipa-vrp  -fipa-pta  -fipa-profile  -fipa-pure-const
> -fipa-reference  -fipa-reference-addressable
> -fipa-stack-alignment  -fipa-icf  -fira-algorithm=@var{algorithm}
> @@ -12625,9 +12625,9 @@ designed to reduce code size.
> Disregard strict standards compliance.  @option{-Ofast} enables all
> @option{-O3} optimizations.  It also enables optimizations that are not
> valid for all standard-compliant programs.
> -It turns on @option{-ffast-math}, @option{-fallow-store-data-races}
> -and the Fortran-specific @option{-fstack-arrays}, unless
> -@option{-fmax-stack-var-size} is specified, and @option{-fno-protect-parens}.
> +It turns on @option{-ffast-math}, @option{-finline-functions-aggressive},
> +@option{-fallow-store-data-races} and the Fortran-specific 
> @option{-fstack-arrays},
> +unless @option{-fmax-stack-var-size} is specified, and 
> @option{-fno-protect-parens}.
> It turns off @option{-fsemantic-interposition}.
>
> @opindex Og
> @@ -12793,6 +12793,14 @@ assembler code in its own right.
> Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}.  Also enabled
> by @option{-fprofile-use} and @option{-fauto-profile}.
>
> +@opindex finline-functions-aggressive
> +@item -finline-functions-aggressive
> +Aggressively integrate functions not declared @code{inline} into their 
> callers when
> +profitable. This option selects the same inlining heuristics as @option{-O3}.
> +
> +Enabled at levels @option{-O3}, @option{-Ofast}, but not @option{-Og},
> +@option{-O1}, @option{-O2}, @option{-Os}.
> +
> @opindex finline-functions-called-once
> @item -finline-functions-called-once
> Consider all @code{static} functions called once for inlining into their
> diff --git a/gcc/opts.cc b/gcc/opts.cc
> index 1b1b46455af..729f2831e67 100644
> --- a/gcc/opts.cc
> +++ b/gcc/opts.cc
> @@ -700,11 +700,7 @@ static const struct default_options 
> default_options_table[] =
>  { OPT_LEVELS_3_PLUS, OPT_fversion_loops_for_strides, NULL, 1 },
>
>  /* -O3 parameters.  */
> -{ OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },
> -{ OPT_LEVELS_3_PLUS, OPT__param_early_inlining_insns_, NULL, 14 },
> -{ OPT_LEVELS_3_PLUS, OPT__param_inline_heuristics_hint_percent_, NULL, 
> 600 },
> -{ OPT_LEVELS_3_PLUS, OPT__param_inline_min_speedup_, NULL, 15 },
> -{ OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_single_, NULL, 200 },
> +{ OPT_LEVELS_3_PLUS, OPT_finline_functions_aggressive, NULL, 1 },
>
>  /* -Ofast adds optimizations to -O3.  */
>  { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
> @@ -3037,6 +3033,17 @@ common_handle_option (struct gcc_options *opts,
>value / 2);
>break;
>
> +case OPT_finline_functions_aggressive:
> +  if(opts->x_flag_inline_functions_aggressive)
> +   {
> + opts->x_param_max_inline_insns_auto = 30;
> + opts->x_param_early_inlining_insns = 14;
> + opts->x_param_inline_heuristics_hint_percent = 600;
> +

Re: [PATCH 6/6] Add a late-combine pass [PR106594]

2024-06-24 Thread Richard Sandiford

Richard Biener  writes:
> On Mon, Jun 24, 2024 at 10:03 AM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Sat, Jun 22, 2024 at 6:50 PM Richard Sandiford
>> >> The traditional (and IMO correct) way to handle this is to make the
>> >> pattern reserve the temporary registers that it needs, using 
>> >> match_scratches.
>> >> rs6000 has many examples of this.  E.g.:
>> >>
>> >> (define_insn_and_split "@ieee_128bit_vsx_neg2"
>> >>   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
>> >> (neg:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
>> >>(clobber (match_scratch:V16QI 2 "=v"))]
>> >>   "TARGET_FLOAT128_TYPE && !TARGET_FLOAT128_HW"
>> >>   "#"
>> >>   "&& 1"
>> >>   [(parallel [(set (match_dup 0)
>> >>(neg:IEEE128 (match_dup 1)))
>> >>   (use (match_dup 2))])]
>> >> {
>> >>   if (GET_CODE (operands[2]) == SCRATCH)
>> >> operands[2] = gen_reg_rtx (V16QImode);
>> >>
>> >>   emit_insn (gen_ieee_128bit_negative_zero (operands[2]));
>> >> }
>> >>   [(set_attr "length" "8")
>> >>(set_attr "type" "vecsimple")])
>> >>
>> >> Before RA, this is just:
>> >>
>> >>   (set ...)
>> >>   (clobber (scratch:V16QI))
>> >>
>> >> and the split creates a new register.  After RA, operand 2 provides
>> >> the required temporary register:
>> >>
>> >>   (set ...)
>> >>   (clobber (reg:V16QI TMP))
>> >>
>> >> Another approach is to add can_create_pseudo_p () to the define_insn
>> >> condition (rather than the split condition).  But IMO that's an ICE
>> >> trap, since insns that have already been matched & accepted shouldn't
>> >> suddenly become invalid if recog is reattempted later.
>> >
>> > What about splitting immediately in late-combine?  Wouldn't that possibly
>> > allow more combinations to immediately happen?
>>
>> It would be difficult to guarantee termination.  Often the split
>> instructions can be immediately recombined back to the original
>> instruction.  Even if we guard against that happening directly,
>> it'd be difficult to prove that it can't happen indirectly.
>>
>> We might also run into issues like PR101523.
>>
>> Combine uses define_splits (without define_insns) for 3->2 combinations,
>> but the current late-combine optimisation is kind-of 1/N+1->1 x N.
>>
>> Personally, I think we should allow targets to use the .md file to
>> define match.pd-style simplification rules involving unspecs, but there
>> were objections to that when I last suggested it.
>
> Isn't that what basically "combine-helper" patterns do to some extent?

Partly, but:

(1) It's a big hammer.  It means we add all the overhead of a define_insn
for something that is only meant to survive between one pass and the next.

(2) Unlike match.pd, it isn't designed to be applied iteratively.
There is no attempt even in theory to ensure that match helper
-> split -> match helper -> split -> ... would terminate.

(3) It operates at the level of complete instructions, including e.g.
destinations of sets.  The kind of rule I had in mind would be aimed
at arithmetic simplification, and would operate at the simplify-rtx.cc
level.

That is, if simplify_foo failed to apply a target-independent rule,
it could fall back on an automatically generated target-specific rule,
with the requirement/understanding that these rules really should be
target-specific.  One easy way of enforcing that is to say that
at least one side of a production rule must involve an unspec.

Richard

[PATCH] tree-optimization/115602 - SLP CSE results in cycles

2024-06-24 Thread Richard Biener

The following prevents SLP CSE to create new cycles which happened
because of a 1:1 permute node being present where its child was then
CSEd to the permute node.  Fixed by making a node only available to
CSE to after recursing.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/115602
* tree-vect-slp.cc (vect_cse_slp_nodes): Delay populating the
bst-map to avoid cycles.

* gcc.dg/vect/pr114921.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr115602.c | 27 +++
 gcc/tree-vect-slp.cc | 33 ++--
 2 files changed, 48 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr115602.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr115602.c 
b/gcc/testsuite/gcc.dg/vect/pr115602.c
new file mode 100644
index 000..9a208d1d950
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115602.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+
+typedef struct {
+  double x, y;
+} pointf;
+struct {
+  pointf focus;
+  double zoom;
+  pointf devscale;
+  char button;
+  pointf oldpointer;
+} gvevent_motion_job;
+char gvevent_motion_job_4;
+double gvevent_motion_pointer_1, gvevent_motion_pointer_0;
+void gvevent_motion() {
+  double dx = (gvevent_motion_pointer_0 - gvevent_motion_job.oldpointer.x) /
+  gvevent_motion_job.devscale.x,
+ dy = (gvevent_motion_pointer_1 - gvevent_motion_job.oldpointer.y) /
+  gvevent_motion_job.devscale.y;
+  if (dx && dy < .0001)
+return;
+  switch (gvevent_motion_job_4)
+  case 2: {
+gvevent_motion_job.focus.x -= dy / gvevent_motion_job.zoom;
+gvevent_motion_job.focus.y += dx / gvevent_motion_job.zoom;
+  }
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 1d4f6089cfe..bb70a3fa5c2 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6382,35 +6382,44 @@ vect_optimize_slp_pass::run ()
 static void
 vect_cse_slp_nodes (scalar_stmts_to_slp_tree_map_t *bst_map, slp_tree& node)
 {
+  bool put_p = false;
   if (SLP_TREE_DEF_TYPE (node) == vect_internal_def
   /* Besides some VEC_PERM_EXPR, two-operator nodes also
 lack scalar stmts and thus CSE doesn't work via bst_map.  Ideally
 we'd have sth that works for all internal and external nodes.  */
   && !SLP_TREE_SCALAR_STMTS (node).is_empty ())
 {
-  if (slp_tree *leader = bst_map->get (SLP_TREE_SCALAR_STMTS (node)))
+  slp_tree *leader = bst_map->get (SLP_TREE_SCALAR_STMTS (node));
+  if (leader)
{
- if (*leader != node)
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"re-using SLP tree %p for %p\n",
-(void *)*leader, (void *)node);
- vect_free_slp_tree (node);
- (*leader)->refcnt += 1;
- node = *leader;
-   }
+ /* We've visited this node already.  */
+ if (!*leader || *leader == node)
+   return;
+
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"re-using SLP tree %p for %p\n",
+(void *)*leader, (void *)node);
+ vect_free_slp_tree (node);
+ (*leader)->refcnt += 1;
+ node = *leader;
  return;
}
 
-  bst_map->put (SLP_TREE_SCALAR_STMTS (node).copy (), node);
+  /* Avoid creating a cycle by populating the map only after recursion.  */
+  bst_map->put (SLP_TREE_SCALAR_STMTS (node).copy (), nullptr);
   node->refcnt += 1;
+  put_p = true;
   /* And recurse.  */
 }
 
   for (slp_tree &child : SLP_TREE_CHILDREN (node))
 if (child)
   vect_cse_slp_nodes (bst_map, child);
+
+  /* Now record the node for CSE in other siblings.  */
+  if (put_p)
+bst_map->put (SLP_TREE_SCALAR_STMTS (node).copy (), node);
 }
 
 /* Optimize the SLP graph of VINFO.  */
-- 
2.35.3

Re: [PATCH 6/6] Add a late-combine pass [PR106594]

2024-06-24 Thread Richard Biener

On Mon, Jun 24, 2024 at 1:34 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Mon, Jun 24, 2024 at 10:03 AM Richard Sandiford
> >  wrote:
> >>
> >> Richard Biener  writes:
> >> > On Sat, Jun 22, 2024 at 6:50 PM Richard Sandiford
> >> >> The traditional (and IMO correct) way to handle this is to make the
> >> >> pattern reserve the temporary registers that it needs, using 
> >> >> match_scratches.
> >> >> rs6000 has many examples of this.  E.g.:
> >> >>
> >> >> (define_insn_and_split "@ieee_128bit_vsx_neg2"
> >> >>   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
> >> >> (neg:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
> >> >>(clobber (match_scratch:V16QI 2 "=v"))]
> >> >>   "TARGET_FLOAT128_TYPE && !TARGET_FLOAT128_HW"
> >> >>   "#"
> >> >>   "&& 1"
> >> >>   [(parallel [(set (match_dup 0)
> >> >>(neg:IEEE128 (match_dup 1)))
> >> >>   (use (match_dup 2))])]
> >> >> {
> >> >>   if (GET_CODE (operands[2]) == SCRATCH)
> >> >> operands[2] = gen_reg_rtx (V16QImode);
> >> >>
> >> >>   emit_insn (gen_ieee_128bit_negative_zero (operands[2]));
> >> >> }
> >> >>   [(set_attr "length" "8")
> >> >>(set_attr "type" "vecsimple")])
> >> >>
> >> >> Before RA, this is just:
> >> >>
> >> >>   (set ...)
> >> >>   (clobber (scratch:V16QI))
> >> >>
> >> >> and the split creates a new register.  After RA, operand 2 provides
> >> >> the required temporary register:
> >> >>
> >> >>   (set ...)
> >> >>   (clobber (reg:V16QI TMP))
> >> >>
> >> >> Another approach is to add can_create_pseudo_p () to the define_insn
> >> >> condition (rather than the split condition).  But IMO that's an ICE
> >> >> trap, since insns that have already been matched & accepted shouldn't
> >> >> suddenly become invalid if recog is reattempted later.
> >> >
> >> > What about splitting immediately in late-combine?  Wouldn't that possibly
> >> > allow more combinations to immediately happen?
> >>
> >> It would be difficult to guarantee termination.  Often the split
> >> instructions can be immediately recombined back to the original
> >> instruction.  Even if we guard against that happening directly,
> >> it'd be difficult to prove that it can't happen indirectly.
> >>
> >> We might also run into issues like PR101523.
> >>
> >> Combine uses define_splits (without define_insns) for 3->2 combinations,
> >> but the current late-combine optimisation is kind-of 1/N+1->1 x N.
> >>
> >> Personally, I think we should allow targets to use the .md file to
> >> define match.pd-style simplification rules involving unspecs, but there
> >> were objections to that when I last suggested it.
> >
> > Isn't that what basically "combine-helper" patterns do to some extent?
>
> Partly, but:
>
> (1) It's a big hammer.  It means we add all the overhead of a define_insn
> for something that is only meant to survive between one pass and the next.
>
> (2) Unlike match.pd, it isn't designed to be applied iteratively.
> There is no attempt even in theory to ensure that match helper
> -> split -> match helper -> split -> ... would terminate.
>
> (3) It operates at the level of complete instructions, including e.g.
> destinations of sets.  The kind of rule I had in mind would be aimed
> at arithmetic simplification, and would operate at the simplify-rtx.cc
> level.
>
> That is, if simplify_foo failed to apply a target-independent rule,
> it could fall back on an automatically generated target-specific rule,
> with the requirement/understanding that these rules really should be
> target-specific.  One easy way of enforcing that is to say that
> at least one side of a production rule must involve an unspec.

OK, that makes sense.  I did think of having something like match.pd
generate simplify-rtx.cc.  It probably has different constraints so that
simply translating tree codes to rtx codes and re-using match.pd patterns
isn't going to work well.

Richard.

> Richard
>
>

Re: [PATCH v2] PR tree-opt/113673: Avoid load merging when potentially trapping.

2024-06-24 Thread Richard Biener

On Fri, Jun 21, 2024 at 10:51 PM Roger Sayle  wrote:
>
>
> Hi Richard,
> Thanks for the review and apologies for taking so long to get back to this.
> This revision implements your suggestions from early May, as found at
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650405.html
>
> This patch fixes PR tree-optimization/113673, a P2 ice-on-valid regression
> caused by load merging of (ptr[0]<<8)+ptr[1] when -ftrapv has been
> specified.  When the operator is | or ^ this is safe, but for addition
> of signed integer types, a trap may be generated/required, so merging this
> idiom into a single non-trapping instruction is inappropriate, confusing
> the compiler by transforming a basic block with an exception edge into one
> without.
>
> This revision implements Richard Biener's feedback to add an early check
> for stmt_can_throw_internal (cfun, stmt) to prevent transforming in the
> presence of any statement that could trap, not just overflow on addition.
> The one other tweak included in this patch is to mark the local function
> find_bswap_or_nop_load as static ensuring that it isn't called from outside
> this file, and guaranteeing that it is dominated by stmt_can_throw_internal
> checking.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

OK.

Thanks,
Richard.

>
> 2024-06-21  Roger Sayle  
> Richard Biener  
>
> gcc/ChangeLog
> PR tree-optimization/113673
> * gimple-ssa-store-merging.cc (find_bswap_or_nop_load): Make static.
> (find_bswap_or_nop_1): Avoid transformations (load merging) when
> stmt_can_throw_internal indicates that a statement can trap.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/113673
> * g++.dg/pr113673.C: New test case.
>
> Thanks in advance,
> Roger
> --
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: 02 May 2024 10:27
> > Subject: Re: [PATCH] PR tree-opt/113673: Avoid load merging from potentially
> > trapping additions.
> >
> > On Sun, Apr 28, 2024 at 11:11 AM Roger Sayle 
> > wrote:
> > >
> > > This patch fixes PR tree-optimization/113673, a P2 ice-on-valid
> > > regression caused by load merging of (ptr[0]<<8)+ptr[1] when -ftrapv
> > > has been specified.  When the operator is | or ^ this is safe, but for
> > > addition of signed integer types, a trap may be generated/required, so
> > > merging this idiom into a single non-trapping instruction is
> > > inappropriate, confusing the compiler by transforming a basic block
> > > with an exception edge into one without.  One fix is to be more
> > > selective for PLUS_EXPR than for BIT_IOR_EXPR or BIT_XOR_EXPR in
> > > gimple-ssa-store-merging.cc's
> > > find_bswap_or_nop_1 function.
> > >
> > > An alternate solution might be to notice that in this idiom the
> > > addition can't overflow, but that this detail wasn't apparent when
> > > exception edges were added to the CFG.  In which case, it's safe to
> > > remove (or mark for
> > > removal) the problematic exceptional edge.  Unfortunately updating the
> > > CFG is a part of the compiler that I'm less familiar with.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> >
> > Instead of
> >
> > +   case PLUS_EXPR:
> > + /* Don't perform load merging if this addition can trap.  */
> > + if (cfun->can_throw_non_call_exceptions
> > + && INTEGRAL_TYPE_P (TREE_TYPE (rhs1))
> > + && TYPE_OVERFLOW_TRAPS (TREE_TYPE (rhs1)))
> > +   return NULL;
> >
> > please check stmt_can_throw_internal (cfun, stmt) - the 
> > find_bswap_or_no_load
> > call in the function suffers from the same issue, so this should probably be
> > checked before that call even.
> >
> > Thanks,
> > Richard.
> >
> > >
> > > 2024-04-28  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > PR tree-optimization/113673
> > > * gimple-ssa-store-merging.cc (find_bswap_or_nop_1)  > > PLUS_EXPR>:
> > > Don't perform load merging if a signed addition may trap.
> > >
> > > gcc/testsuite/ChangeLog
> > > PR tree-optimization/113673
> > > * g++.dg/pr113673.C: New test case.
> > >
>

Re: [PATCH] cfg: propagate source location in gimple_split_edge [PR115564]

2024-06-24 Thread Richard Biener

On Sat, Jun 22, 2024 at 12:26 AM David Malcolm  wrote:
>
> PR analyzer/115564 reports a missing warning from the analyzer
> on this infinite loop at -O2 and above:
>
>  void test (unsigned b)
>  {
>for (unsigned i = b; i >= 0; --i) {}
>  }
>
> The issue is that there are no useful location_t values in the CFG
> by the time the analyzer sees it: two basic blocks with no
> statements, connected by edges with UNKNOWN_LOCATION for their
> "goto_locus" values.  The analyzer's attempts to get a location for the
> loop fail with "UNKNOWN_LOCATION", and so it gives up on the warning.
>
> Root cause is that the edge in question is created by gimple_split_edge
> within the loop optimizer, and gimple_split_edge creates the new edge
> with UNKNOWN_LOCATION.
>
> This patch tweaks gimple_split_edge to copy edge_in->goto_locus's to the
> new edge, so that the edge seen by the analyzer has a useful goto_locus
> value, fixing the issue.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> Successful run of analyzer integration tests on x86_64-pc-linux-gnu,
> which shows 8 new true positives from -Wanalyzer-infinite-loop with
> the patch.

Is the edge the goto_locus is copied from not surviving?  Does this
maybe mean we should, when removing a forwarder(?), "merge"
the goto_locus of the incoming and outgoing edge from the forwarder?

That said, I'm not opposed to this change but I wonder whether the
fix is in the wrong place?

Richard.

> OK for trunk?
>
> gcc/testsuite/ChangeLog:
> PR analyzer/115564
> * c-c++-common/analyzer/infinite-loop-pr115564.c: New test.
>
> gcc/ChangeLog:
> PR analyzer/115564
> * tree-cfg.cc (gimple_split_edge): Propagate any source location
> from EDGE_IN to the new edge.
>
> Signed-off-by: David Malcolm 
> ---
>  .../c-c++-common/analyzer/infinite-loop-pr115564.c| 8 
>  gcc/tree-cfg.cc   | 3 +++
>  2 files changed, 11 insertions(+)
>  create mode 100644 
> gcc/testsuite/c-c++-common/analyzer/infinite-loop-pr115564.c
>
> diff --git a/gcc/testsuite/c-c++-common/analyzer/infinite-loop-pr115564.c 
> b/gcc/testsuite/c-c++-common/analyzer/infinite-loop-pr115564.c
> new file mode 100644
> index ..950d92dd1254
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/analyzer/infinite-loop-pr115564.c
> @@ -0,0 +1,8 @@
> +/* Verify that we detect the infinite loop below even at -O2.  */
> +
> +/* { dg-additional-options "-O2" } */
> +
> +void test (unsigned b)
> +{
> +  for (unsigned i = b; i >= 0; --i) {} /* { dg-warning "infinite loop" } */
> +}
> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> index 7fb7b92966be..45c0eef6c095 100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -3061,6 +3061,9 @@ gimple_split_edge (edge edge_in)
>/* set_phi_nodes sets the BB of the PHI nodes, so do it manually here.  */
>dest->il.gimple.phi_nodes = saved_phis;
>
> +  /* Propagate any source location from EDGE_IN to the new edge.  */
> +  new_edge->goto_locus = edge_in->goto_locus;
> +
>return new_bb;
>  }
>
> --
> 2.26.3
>

RE: [PATCH 1/3 v3] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-06-24 Thread Richard Biener

On Thu, 20 Jun 2024, Hu, Lin1 wrote:

> > >else if (ret_elt_bits > arg_elt_bits)
> > >  modifier = WIDEN;
> > >
> > > +  if (supportable_convert_operation (code, ret_type, arg_type, &code1))
> > > +{
> > > +  g = gimple_build_assign (lhs, code1, arg);
> > > +  gsi_replace (gsi, g, false);
> > > +  return;
> > > +}
> > 
> > Given the API change I suggest below it might make sense to have
> > supportable_indirect_convert_operation do the above and represent it as 
> > single-
> > step conversion?
> >
> 
> OK, if you want to supportable_indirect_convert_operation can do 
> something like supportable_convert_operation, I'll give it a try. This 
> functionality is really the part that this function can cover. But this 
> would require some changes not only the API change, because 
> supportable_indirect_convert_operation originally only supported Float 
> -> Int or Int ->Float.

I think I'd like to see a single API to handle direct and
(multi-)indirect-level converts that operate on vectors with all
the same number of lanes.

> >
> > > +  code_helper code2 = ERROR_MARK, code3 = ERROR_MARK;
> > > +  int multi_step_cvt = 0;
> > > +  vec interm_types = vNULL;
> > > +  if (supportable_indirect_convert_operation (NULL,
> > > +   code,
> > > +   ret_type, arg_type,
> > > +   &code2, &code3,
> > > +   &multi_step_cvt,
> > > +   &interm_types, arg))
> > > +{
> > > +  new_rhs = make_ssa_name (interm_types[0]);
> > > +  g = gimple_build_assign (new_rhs, (tree_code) code3, arg);
> > > +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > > +  g = gimple_build_assign (lhs, (tree_code) code2, new_rhs);
> > > +  gsi_replace (gsi, g, false);
> > > +  return;
> > > +}
> > > +
> > >if (modifier == NONE && (code == FIX_TRUNC_EXPR || code ==
> > FLOAT_EXPR))
> > >  {
> > > -  if (supportable_convert_operation (code, ret_type, arg_type, 
> > > &code1))
> > > - {
> > > -   g = gimple_build_assign (lhs, code1, arg);
> > > -   gsi_replace (gsi, g, false);
> > > -   return;
> > > - }
> > >/* Can't use get_compute_type here, as 
> > > supportable_convert_operation
> > >doesn't necessarily use an optab and needs two arguments.  */
> > >tree vec_compute_type
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> > > 05a169ecb2d..0aa608202ca 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -5175,7 +5175,7 @@ vectorizable_conversion (vec_info *vinfo,
> > >tree scalar_dest;
> > >tree op0, op1 = NULL_TREE;
> > >loop_vec_info loop_vinfo = dyn_cast  (vinfo);
> > > -  tree_code tc1, tc2;
> > > +  tree_code tc1;
> > >code_helper code, code1, code2;
> > >code_helper codecvt1 = ERROR_MARK, codecvt2 = ERROR_MARK;
> > >tree new_temp;
> > > @@ -5384,92 +5384,17 @@ vectorizable_conversion (vec_info *vinfo,
> > >   break;
> > >}
> > >
> > > -  /* For conversions between float and integer types try whether
> > > -  we can use intermediate signed integer types to support the
> > > -  conversion.  */
> > > -  if (GET_MODE_SIZE (lhs_mode) != GET_MODE_SIZE (rhs_mode)
> > > -   && (code == FLOAT_EXPR ||
> > > -   (code == FIX_TRUNC_EXPR && !flag_trapping_math)))
> > > - {
> > > -   bool demotion = GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE
> > (lhs_mode);
> > > -   bool float_expr_p = code == FLOAT_EXPR;
> > > -   unsigned short target_size;
> > > -   scalar_mode intermediate_mode;
> > > -   if (demotion)
> > > - {
> > > -   intermediate_mode = lhs_mode;
> > > -   target_size = GET_MODE_SIZE (rhs_mode);
> > > - }
> > > -   else
> > > - {
> > > -   target_size = GET_MODE_SIZE (lhs_mode);
> > > -   if (!int_mode_for_size
> > > -   (GET_MODE_BITSIZE (rhs_mode), 0).exists
> > (&intermediate_mode))
> > > - goto unsupported;
> > > - }
> > > -   code1 = float_expr_p ? code : NOP_EXPR;
> > > -   codecvt1 = float_expr_p ? NOP_EXPR : code;
> > > -   opt_scalar_mode mode_iter;
> > > -   FOR_EACH_2XWIDER_MODE (mode_iter, intermediate_mode)
> > > - {
> > > -   intermediate_mode = mode_iter.require ();
> > > -
> > > -   if (GET_MODE_SIZE (intermediate_mode) > target_size)
> > > - break;
> > > -
> > > -   scalar_mode cvt_mode;
> > > -   if (!int_mode_for_size
> > > -   (GET_MODE_BITSIZE (intermediate_mode), 0).exists
> > (&cvt_mode))
> > > - break;
> > > -
> > > -   cvt_type = build_nonstandard_integer_type
> > > - (GET_MODE_BITSIZE (cvt_mode), 0);
> > > -
> > > -   /* Check if the intermediate type can hold OP0's range.
> > > -  When converting from float to integer this is not necessary
> > > -  because values that do not fit the (smaller) target type are
> > > -  unspecified anywa

Re: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-24 Thread Richard Biener

On Sun, Jun 23, 2024 at 5:10 PM Feng Xue OS  wrote:
>
> >> -  if (slp_node)
> >> +  if (slp_node && SLP_TREE_LANES (slp_node) > 1)
> >
> > Hmm, that looks wrong.  It looks like SLP_TREE_NUMBER_OF_VEC_STMTS is off
> > instead, which is bad.
> >
> >> nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
> >>else
> >> nvectors = vect_get_num_copies (loop_vinfo, vectype_in);
> >> @@ -7478,6 +7472,152 @@ vect_reduction_update_partial_vector_usage 
> >> (loop_vec_info loop_vinfo,
> >>  }
> >>  }
> >>
> >> +/* Check if STMT_INFO is a lane-reducing operation that can be vectorized 
> >> in
> >> +   the context of LOOP_VINFO, and vector cost will be recorded in 
> >> COST_VEC.
> >> +   Now there are three such kinds of operations: dot-prod/widen-sum/sad
> >> +   (sum-of-absolute-differences).
> >> +
> >> +   For a lane-reducing operation, the loop reduction path that it lies in,
> >> +   may contain normal operation, or other lane-reducing operation of 
> >> different
> >> +   input type size, an example as:
> >> +
> >> + int sum = 0;
> >> + for (i)
> >> +   {
> >> + ...
> >> + sum += d0[i] * d1[i];   // dot-prod 
> >> + sum += w[i];// widen-sum 
> >> + sum += abs(s0[i] - s1[i]);  // sad 
> >> + sum += n[i];// normal 
> >> + ...
> >> +   }
> >> +
> >> +   Vectorization factor is essentially determined by operation whose input
> >> +   vectype has the most lanes ("vector(16) char" in the example), while we
> >> +   need to choose input vectype with the least lanes ("vector(4) int" in 
> >> the
> >> +   example) for the reduction PHI statement.  */
> >> +
> >> +bool
> >> +vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info 
> >> stmt_info,
> >> +   slp_tree slp_node, stmt_vector_for_cost 
> >> *cost_vec)
> >> +{
> >> +  gimple *stmt = stmt_info->stmt;
> >> +
> >> +  if (!lane_reducing_stmt_p (stmt))
> >> +return false;
> >> +
> >> +  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
> >> +
> >> +  if (!INTEGRAL_TYPE_P (type) && !SCALAR_FLOAT_TYPE_P (type))
> >> +return false;
> >> +
> >> +  /* Do not try to vectorize bit-precision reductions.  */
> >> +  if (!type_has_mode_precision_p (type))
> >> +return false;
> >> +
> >> +  if (!slp_node)
> >> +return false;
> >> +
> >> +  for (int i = 0; i < (int) gimple_num_ops (stmt) - 1; i++)
> >> +{
> >> +  stmt_vec_info def_stmt_info;
> >> +  slp_tree slp_op;
> >> +  tree op;
> >> +  tree vectype;
> >> +  enum vect_def_type dt;
> >> +
> >> +  if (!vect_is_simple_use (loop_vinfo, stmt_info, slp_node, i, &op,
> >> +  &slp_op, &dt, &vectype, &def_stmt_info))
> >> +   {
> >> + if (dump_enabled_p ())
> >> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >> +"use not simple.\n");
> >> + return false;
> >> +   }
> >> +
> >> +  if (!vectype)
> >> +   {
> >> + vectype = get_vectype_for_scalar_type (loop_vinfo, TREE_TYPE 
> >> (op),
> >> +slp_op);
> >> + if (!vectype)
> >> +   return false;
> >> +   }
> >> +
> >> +  if (!vect_maybe_update_slp_op_vectype (slp_op, vectype))
> >> +   {
> >> + if (dump_enabled_p ())
> >> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >> +"incompatible vector types for invariants\n");
> >> + return false;
> >> +   }
> >> +
> >> +  if (i == STMT_VINFO_REDUC_IDX (stmt_info))
> >> +   continue;
> >> +
> >> +  /* There should be at most one cycle def in the stmt.  */
> >> +  if (VECTORIZABLE_CYCLE_DEF (dt))
> >> +   return false;
> >> +}
> >> +
> >> +  stmt_vec_info reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt 
> >> (stmt_info));
> >> +
> >> +  /* TODO: Support lane-reducing operation that does not directly 
> >> participate
> >> + in loop reduction. */
> >> +  if (!reduc_info || STMT_VINFO_REDUC_IDX (stmt_info) < 0)
> >> +return false;
> >> +
> >> +  /* Lane-reducing pattern inside any inner loop of LOOP_VINFO is not
> >> + recoginized.  */
> >> +  gcc_assert (STMT_VINFO_DEF_TYPE (reduc_info) == vect_reduction_def);
> >> +  gcc_assert (STMT_VINFO_REDUC_TYPE (reduc_info) == TREE_CODE_REDUCTION);
> >> +
> >> +  tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (stmt_info);
> >> +  int ncopies_for_cost;
> >> +
> >> +  if (SLP_TREE_LANES (slp_node) > 1)
> >> +{
> >> +  /* Now lane-reducing operations in a non-single-lane slp node 
> >> should only
> >> +come from the same loop reduction path.  */
> >> +  gcc_assert (REDUC_GROUP_FIRST_ELEMENT (stmt_info));
> >> +  ncopies_for_cost = 1;
> >> +}
> >> +  else
> >> +{
> >> +  ncopies_for_cost = vect_get_num_copies (loop_vinfo, vectype_in);
> >
> > OK, so the fact that the

RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]

2024-06-24 Thread Tamar Christina




> -Original Message-
> From: Richard Biener 
> Sent: Thursday, June 20, 2024 8:49 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; bin.ch...@linux.alibaba.com
> Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during 
> candidate
> selection [PR114932]
> 
> On Wed, 19 Jun 2024, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Wednesday, June 19, 2024 12:55 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ;
> bin.ch...@linux.alibaba.com
> > > Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during
> candidate
> > > selection [PR114932]
> > >
> > > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > IVOPTS normally uses affine trees to perform comparisons between 
> > > > different
> IVs,
> > > > but these seem to have been missing in two key spots and instead normal
> tree
> > > > equivalencies used.
> > > >
> > > > In some cases where we have a structural equivalence but not a 
> > > > signedness
> > > > equivalencies we end up generating both a signed and unsigned IV for the
> same
> > > > candidate.
> > > >
> > > > This happens quite a lot with fortran but can also happen in C because 
> > > > this
> came
> > > > code is unable to figure out when one expression is a multiple of 
> > > > another.
> > > >
> > > > As an example in the attached testcase we get:
> > > >
> > > > Initial set of candidates:
> > > >   cost: 24 (complexity 3)
> > > >   reg_cost: 9
> > > >   cand_cost: 15
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6, 8
> > > >group:0 --> iv_cand:6, cost=(0,1)
> > > >group:1 --> iv_cand:1, cost=(0,0)
> > > >group:2 --> iv_cand:8, cost=(0,1)
> > > >group:3 --> iv_cand:8, cost=(0,1)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1, 2
> > > >
> > > > :
> > > > inv_expr 1: stride.3_27 * 4
> > > > inv_expr 2: (unsigned long) stride.3_27 * 4
> > > >
> > > > These end up being used in the same group:
> > > >
> > > > Group 1:
> > > > cand  costcompl.  inv.expr.   inv.vars
> > > > 1 0   0   NIL;6
> > > > 2 0   0   NIL;6
> > > > 3 0   0   NIL;6
> > > >
> > > > which ends up with IV opts picking the signed and unsigned IVs:
> > > >
> > > > Improved to:
> > > >   cost: 24 (complexity 3)
> > > >   reg_cost: 9
> > > >   cand_cost: 15
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6, 8
> > > >group:0 --> iv_cand:6, cost=(0,1)
> > > >group:1 --> iv_cand:1, cost=(0,0)
> > > >group:2 --> iv_cand:8, cost=(0,1)
> > > >group:3 --> iv_cand:8, cost=(0,1)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1, 2
> > > >
> > > > and so generates the same IV as both signed and unsigned:
> > > >
> > > > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, 
> > > > freq
> > > 58.2545), maybe hot
> > > > ;;prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > > > ;;pred:   28 [always]  count:23622320 (estimated locally, freq 
> > > > 6.4080)
> > > (FALLTHRU,EXECUTABLE)
> > > > ;;25 [always]  count:191126046 (estimated locally, freq 
> > > > 51.8465)
> > > (FALLTHRU,DFS_BACK,EXECUTABLE)
> > > >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> > > >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> > > >   # ivtmp.26_51 = PHI 
> > > >   # ivtmp.28_90 = PHI 
> > > >
> > > > ...
> > > >
> > > > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, 
> > > > freq
> > > 58.2545), maybe hot
> > > > ;;prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > > > ;;pred:   22 [always]  count:95443719 (estimated locally, freq 
> > > > 25.8909)
> > > (FALLTHRU)
> > > ;;21 [33.3% (guessed)]  count:71582790 (estimated 
> > > locally, freq
> 19.4182)
> > > (TRUE_VALUE,EXECUTABLE)
> > > ;;31 [33.3% (guessed)]  count:47721860 (estimated 
> > > locally, freq
> 12.9455)
> > > (TRUE_VALUE,EXECUTABLE)
> > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > >
> > >   
> > >  ivtmp.22_82 = ivtmp.22_41 + 1;
> > > ivtmp.26_72 = ivtmp.26_51 + _80;
> > > ivtmp.28_98 = ivtmp.28_90 + _39;
> > > >
> > > > These two IVs are always used as unsigned, so IV ops generates:
> > > >
> > > >   _73 = stride.3_27 * 4;
> > > >   _80 = (unsigned long) _73;
> > > >   _54 = (unsigned long) stride.3_27;
> > > >   _39 = _54 * 4;
> > > >
> > > > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> > > >
> > > > This is because candidate 6 and 8 are structurally equivalent but have
> different
> > > > signs.
> > > >
> > > > This patch changes it so that if you have two IVs that are affine 
> > > > equivalent to
> > > > just pick one over the other.  IV already has code for this, so the 
> > > > patch just
> > > > use

[PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-24 Thread pan2 . li

From: Pan Li 

The zip benchmark of coremark-pro have one SAT_SUB like pattern but
truncated as below:

void test (uint16_t *x, unsigned b, unsigned n)
{
  unsigned a = 0;
  register uint16_t *p = x;

  do {
a = *--p;
*p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
  } while (--n);
}

It will have gimple before vect pass,  it cannot hit any pattern of
SAT_SUB and then cannot vectorize to SAT_SUB.

_2 = a_11 - b_12(D);
iftmp.0_13 = (short unsigned int) _2;
_18 = a_11 >= b_12(D);
iftmp.0_5 = _18 ? iftmp.0_13 : 0;

This patch would like to improve the pattern match to recog above
as truncate after .SAT_SUB pattern.  Then we will have the pattern
similar to below,  as well as eliminate the first 3 dead stmt.

_2 = a_11 - b_12(D);
iftmp.0_13 = (short unsigned int) _2;
_18 = a_11 >= b_12(D);
iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));

The below tests are passed for this patch.
1. The rv64gcv fully regression tests.
2. The rv64gcv build with glibc.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

gcc/ChangeLog:

* match.pd: Add convert description for minus and capture.
* tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
new logic to handle in_type is incompatibile with out_type,  as
well as rename from.
(vect_recog_build_binary_gimple_stmt): Rename to.
(vect_recog_sat_add_pattern): Leverage above renamed func.
(vect_recog_sat_sub_pattern): Ditto.

Signed-off-by: Pan Li 
---
 gcc/match.pd  |  4 +--
 gcc/tree-vect-patterns.cc | 51 ---
 2 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 3d0689c9312..4a4b0b2e72f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3164,9 +3164,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Unsigned saturation sub, case 2 (branch with ge):
SAT_U_SUB = X >= Y ? X - Y : 0.  */
 (match (unsigned_integer_sat_sub @0 @1)
- (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
+ (cond^ (ge @0 @1) (convert? (minus (convert1? @0) (convert1? @1))) 
integer_zerop)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
-  && types_match (type, @0, @1
+  && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1
 
 /* Unsigned saturation sub, case 3 (branchless with gt):
SAT_U_SUB = (X - Y) * (X > Y).  */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index cef901808eb..3d887d36050 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4490,26 +4490,37 @@ vect_recog_mult_pattern (vec_info *vinfo,
 extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
 extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
 
-static gcall *
-vect_recog_build_binary_gimple_call (vec_info *vinfo, gimple *stmt,
+static gimple *
+vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 internal_fn fn, tree *type_out,
-tree op_0, tree op_1)
+tree lhs, tree op_0, tree op_1)
 {
   tree itype = TREE_TYPE (op_0);
-  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
+  tree otype = TREE_TYPE (lhs);
+  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
+  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
 
-  if (vtype != NULL_TREE
-&& direct_internal_fn_supported_p (fn, vtype, OPTIMIZE_FOR_BOTH))
+  if (v_itype != NULL_TREE && v_otype != NULL_TREE
+&& direct_internal_fn_supported_p (fn, v_itype, OPTIMIZE_FOR_BOTH))
 {
   gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
+  tree in_ssa = vect_recog_temp_ssa_var (itype, NULL);
 
-  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
+  gimple_call_set_lhs (call, in_ssa);
   gimple_call_set_nothrow (call, /* nothrow_p */ false);
-  gimple_set_location (call, gimple_location (stmt));
+  gimple_set_location (call, gimple_location (STMT_VINFO_STMT 
(stmt_info)));
+
+  *type_out = v_otype;
 
-  *type_out = vtype;
+  if (types_compatible_p (itype, otype))
+   return call;
+  else
+   {
+ append_pattern_def_seq (vinfo, stmt_info, call, v_itype);
+ tree out_ssa = vect_recog_temp_ssa_var (otype, NULL);
 
-  return call;
+ return gimple_build_assign (out_ssa, CONVERT_EXPR, in_ssa);
+   }
 }
 
   return NULL;
@@ -4541,13 +4552,13 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
stmt_vec_info stmt_vinfo,
 
   if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
 {
-  gcall *call = vect_recog_build_binary_gimple_call (vinfo, last_stmt,
-IFN_SAT_ADD, type_out,
-ops[0], ops[1]);
-  if (call)
+  gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
+

RE: [PATCH 1/3 v3] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-06-24 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 24, 2024 1:34 PM
> To: Hu, Lin1 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: RE: [PATCH 1/3 v3] vect: generate suitable convert insn for int -> 
> int, float
> -> float and int <-> float.
> 
> On Thu, 20 Jun 2024, Hu, Lin1 wrote:
> 
> > > >else if (ret_elt_bits > arg_elt_bits)
> > > >  modifier = WIDEN;
> > > >
> > > > +  if (supportable_convert_operation (code, ret_type, arg_type, &code1))
> > > > +{
> > > > +  g = gimple_build_assign (lhs, code1, arg);
> > > > +  gsi_replace (gsi, g, false);
> > > > +  return;
> > > > +}
> > >
> > > Given the API change I suggest below it might make sense to have
> > > supportable_indirect_convert_operation do the above and represent it as
> single-
> > > step conversion?
> > >
> >
> > OK, if you want to supportable_indirect_convert_operation can do
> > something like supportable_convert_operation, I'll give it a try. This
> > functionality is really the part that this function can cover. But this
> > would require some changes not only the API change, because
> > supportable_indirect_convert_operation originally only supported Float
> > -> Int or Int ->Float.
> 
> I think I'd like to see a single API to handle direct and
> (multi-)indirect-level converts that operate on vectors with all
> the same number of lanes.
> 
> > >
> > > > +  code_helper code2 = ERROR_MARK, code3 = ERROR_MARK;
> > > > +  int multi_step_cvt = 0;
> > > > +  vec interm_types = vNULL;
> > > > +  if (supportable_indirect_convert_operation (NULL,
> > > > + code,
> > > > + ret_type, arg_type,
> > > > + &code2, &code3,
> > > > + &multi_step_cvt,
> > > > + &interm_types, arg))
> > > > +{
> > > > +  new_rhs = make_ssa_name (interm_types[0]);
> > > > +  g = gimple_build_assign (new_rhs, (tree_code) code3, arg);
> > > > +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > > > +  g = gimple_build_assign (lhs, (tree_code) code2, new_rhs);
> > > > +  gsi_replace (gsi, g, false);
> > > > +  return;
> > > > +}
> > > > +
> > > >if (modifier == NONE && (code == FIX_TRUNC_EXPR || code ==
> > > FLOAT_EXPR))
> > > >  {
> > > > -  if (supportable_convert_operation (code, ret_type, arg_type, 
> > > > &code1))
> > > > -   {
> > > > - g = gimple_build_assign (lhs, code1, arg);
> > > > - gsi_replace (gsi, g, false);
> > > > - return;
> > > > -   }
> > > >/* Can't use get_compute_type here, as 
> > > > supportable_convert_operation
> > > >  doesn't necessarily use an optab and needs two arguments.  */
> > > >tree vec_compute_type
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> > > > 05a169ecb2d..0aa608202ca 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -5175,7 +5175,7 @@ vectorizable_conversion (vec_info *vinfo,
> > > >tree scalar_dest;
> > > >tree op0, op1 = NULL_TREE;
> > > >loop_vec_info loop_vinfo = dyn_cast  (vinfo);
> > > > -  tree_code tc1, tc2;
> > > > +  tree_code tc1;
> > > >code_helper code, code1, code2;
> > > >code_helper codecvt1 = ERROR_MARK, codecvt2 = ERROR_MARK;
> > > >tree new_temp;
> > > > @@ -5384,92 +5384,17 @@ vectorizable_conversion (vec_info *vinfo,
> > > > break;
> > > >}
> > > >
> > > > -  /* For conversions between float and integer types try whether
> > > > -we can use intermediate signed integer types to support the
> > > > -conversion.  */
> > > > -  if (GET_MODE_SIZE (lhs_mode) != GET_MODE_SIZE (rhs_mode)
> > > > - && (code == FLOAT_EXPR ||
> > > > - (code == FIX_TRUNC_EXPR && !flag_trapping_math)))
> > > > -   {
> > > > - bool demotion = GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE
> > > (lhs_mode);
> > > > - bool float_expr_p = code == FLOAT_EXPR;
> > > > - unsigned short target_size;
> > > > - scalar_mode intermediate_mode;
> > > > - if (demotion)
> > > > -   {
> > > > - intermediate_mode = lhs_mode;
> > > > - target_size = GET_MODE_SIZE (rhs_mode);
> > > > -   }
> > > > - else
> > > > -   {
> > > > - target_size = GET_MODE_SIZE (lhs_mode);
> > > > - if (!int_mode_for_size
> > > > - (GET_MODE_BITSIZE (rhs_mode), 0).exists
> > > (&intermediate_mode))
> > > > -   goto unsupported;
> > > > -   }
> > > > - code1 = float_expr_p ? code : NOP_EXPR;
> > > > - codecvt1 = float_expr_p ? NOP_EXPR : code;
> > > > - opt_scalar_mode mode_iter;
> > > > - FOR_EACH_2XWIDER_MODE (mode_iter, intermediate_mode)
> > > > -

Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]

2024-06-24 Thread Alexandre Oliva

On Jun 21, 2024, Christophe Lyon  wrote:

>> How about mentioning Christophe's simplification in the commit log?

> For the avoidance of doubt: it's OK for me (but you don't need to
> mention my name in fact ;-)

Needing or not, I added it ;-)

>> > be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
>> > before the fix).

>> This is OK, but you might wish to revisit this statement before
>> committing.

Oh, right, sorry, I messed it up.  uint16_t was what I should have put
in there.  int16_t would have overflown and invoked undefined behavior
to begin with, and I see it misled you far down the wrong path.

>> I think the original bug was that we were losing the cast to short

The problem was that the shift count saturated at 15.  AFAIK sign
extension was not relevant.  Hopefully the rewritten opening paragraph
below makes that clearer.  I will put it in later this week barring
objections or further suggestions of improvement.  Thanks,

[testsuite] [arm] [vect] adjust mve-vshr test [PR113281]

The test was too optimistic, alas.  We used to vectorize shifts by
clamping the shift counts below the bit width of the types (e.g. at 15
for 16-bit vector elements), but (uint16_t)32768 >> (uint16_t)16 is
well defined (because of promotion to 32-bit int) and must yield 0,
not 1 (as before the fix).

Unfortunately, in the gimple model of vector units, such large shift
counts wouldn't be well-defined, so we won't vectorize such shifts any
more, unless we can tell they're in range or undefined.

So the test that expected the vectorization we no longer performed
needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
suggested annotating the test with the expected ranges so as to enable
the optimization, and Christophe Lyon suggested a further
simplification.

Co-Authored-By: Richard Earnshaw 

for  gcc/testsuite/ChangeLog

PR tree-optimization/113281
* gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
---
 gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c 
b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
index 8c7adef9ed8f1..03078de49c65e 100644
--- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
@@ -9,6 +9,8 @@
   void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
__restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
 int i; \
 for (i=0; i= (unsigned)(BITS))  \
+   __builtin_unreachable();\
   dest[i] = a[i] OP b[i];  \
 }  \
 }

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

RE: [PATCH][ivopts]: perform affine fold on unsigned addressing modes known not to overflow. [PR114932]

2024-06-24 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Thursday, June 20, 2024 8:55 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; bin.ch...@linux.alibaba.com
> Subject: RE: [PATCH][ivopts]: perform affine fold on unsigned addressing modes
> known not to overflow. [PR114932]
> 
> On Wed, 19 Jun 2024, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Wednesday, June 19, 2024 1:14 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ;
> bin.ch...@linux.alibaba.com
> > > Subject: Re: [PATCH][ivopts]: perform affine fold on unsigned addressing
> modes
> > > known not to overflow. [PR114932]
> > >
> > > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > When the patch for PR114074 was applied we saw a good boost in
> exchange2.
> > > >
> > > > This boost was partially caused by a simplification of the addressing 
> > > > modes.
> > > > With the patch applied IV opts saw the following form for the base
> addressing;
> > > >
> > > >   Base: (integer(kind=4) *) &block + ((sizetype) ((unsigned long) 
> > > > l0_19(D) *
> > > > 324) + 36)
> > > >
> > > > vs what we normally get:
> > > >
> > > >   Base: (integer(kind=4) *) &block + ((sizetype) ((integer(kind=8)) 
> > > > l0_19(D)
> > > > * 81) + 9) * 4
> > > >
> > > > This is because the patch promoted multiplies where one operand is a
> constant
> > > > from a signed multiply to an unsigned one, to attempt to fold away the
> constant.
> > > >
> > > > This patch attempts the same but due to the various problems with SCEV 
> > > > and
> > > > niters not being able to analyze the resulting forms (i.e. PR114322) we 
> > > > can't
> > > > do it during SCEV or in the general form like in fold-const like 
> > > > extract_muldiv
> > > > attempts.
> > > >
> > > > Instead this applies the simplification during IVopts initialization 
> > > > when we
> > > > create the IV.  Essentially when we know the IV won't overflow with 
> > > > regards
> to
> > > > niters then we perform an affine fold which gets it to simplify the 
> > > > internal
> > > > computation, even if this is signed because we know that for IVOPTs 
> > > > uses the
> > > > IV won't ever overflow.  This allows IV opts to see the simplified form
> > > > without influencing the rest of the compiler.
> > > >
> > > > as mentioned in PR114074 it would be good to fix the missed 
> > > > optimization in
> the
> > > > other passes so we can perform this in general.
> > > >
> > > > The reason this has a big impact on fortran code is that fortran 
> > > > doesn't seem
> to
> > > > have unsigned integer types.  As such all it's addressing are created 
> > > > with
> > > > signed types and folding does not happen on them due to the possible
> overflow.
> > > >
> > > > concretely on AArch64 this changes the results from generation:
> > > >
> > > > mov x27, -108
> > > > mov x24, -72
> > > > mov x23, -36
> > > > add x21, x1, x0, lsl 2
> > > > add x19, x20, x22
> > > > .L5:
> > > > add x0, x22, x19
> > > > add x19, x19, 324
> > > > ldr d1, [x0, x27]
> > > > add v1.2s, v1.2s, v15.2s
> > > > str d1, [x20, 216]
> > > > ldr d0, [x0, x24]
> > > > add v0.2s, v0.2s, v15.2s
> > > > str d0, [x20, 252]
> > > > ldr d31, [x0, x23]
> > > > add v31.2s, v31.2s, v15.2s
> > > > str d31, [x20, 288]
> > > > bl  digits_20_
> > > > cmp x21, x19
> > > > bne .L5
> > > >
> > > > into:
> > > >
> > > > .L5:
> > > > ldr d1, [x19, -108]
> > > > add v1.2s, v1.2s, v15.2s
> > > > str d1, [x20, 216]
> > > > ldr d0, [x19, -72]
> > > > add v0.2s, v0.2s, v15.2s
> > > > str d0, [x20, 252]
> > > > ldr d31, [x19, -36]
> > > > add x19, x19, 324
> > > > add v31.2s, v31.2s, v15.2s
> > > > str d31, [x20, 288]
> > > > bl  digits_20_
> > > > cmp x21, x19
> > > > bne .L5
> > > >
> > > > The two patches together results in a 10% performance increase in 
> > > > exchange2
> in
> > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > > compile
> > > > time. There's also a 5% performance improvement in fotonik3d and similar
> > > > reduction in binary size.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR tree-optimization/114932
> > > > * tree-scalar-evolution.cc (alloc_iv): Perform affine unsigned 
> > > > fold.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR tree-optimization/114932
> > > > * gfortran.dg/addressing-modes_1.f90: New test.
> > > >
> > > > ---
> > > > di

[PATCH] rs6000: Fix wrong RTL patterns for vector merge high/low char on LE

2024-06-24 Thread Kewen.Lin

Hi,

Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low char, which are altivec_vmrg[hl]b.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs.  These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order.  So it's
mapped into vmrghb on BE while vmrglb on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
8-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence.  The newly constructed test case
pr106069-1.c is a typical example for this issue.

So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns.  With the
proposed patch, the expanders like altivec_vmrghb expands
into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.

Following [1], this one is for vector element 8-bit size,
bootstrapped and regtested on powerpc64-linux-gnu P8/P9
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this two days later if no objections, thanks!

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655239.html

Co-authored-by: Xionghu Luo 

PR target/106069
PR target/115355

gcc/ChangeLog:

* config/rs6000/altivec.md (altivec_vmrghb_direct): Rename to ...
(altivec_vmrghb_direct_be): ... this.  Add condition BYTES_BIG_ENDIAN.
(altivec_vmrghb_direct_le): New define_insn.
(altivec_vmrglb_direct): Rename to ...
(altivec_vmrglb_direct_be): ... this.  Add condition BYTES_BIG_ENDIAN.
(altivec_vmrglb_direct_le): New define_insn.
(altivec_vmrghb): Adjust by calling gen_altivec_vmrghb_direct_be
for BE and gen_altivec_vmrglb_direct_le for LE.
(altivec_vmrglb): Adjust by calling gen_altivec_vmrglb_direct_be
for BE and gen_altivec_vmrghb_direct_le for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghb_direct by
CODE_FOR_altivec_vmrghb_direct_be for BE and
CODE_FOR_altivec_vmrghb_direct_le for LE.  And replace
CODE_FOR_altivec_vmrglb_direct by
CODE_FOR_altivec_vmrglb_direct_be for BE and
CODE_FOR_altivec_vmrglb_direct_le for LE.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106069-1.c: New test.
---
 gcc/config/rs6000/altivec.md  | 66 +++
 gcc/config/rs6000/rs6000.cc   |  8 +--
 gcc/testsuite/gcc.target/powerpc/pr106069-1.c | 39 +++
 3 files changed, 95 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069-1.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index dcc71cc0f52..a0e8a35b843 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1152,15 +1152,16 @@ (define_expand "altivec_vmrghb"
(use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-   : gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+emit_insn (
+  gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+emit_insn (
+  gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })

-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
(vec_select:V16QI
  (vec_concat:V32QI
@@ -1174,7 +1175,25 @@ (define_insn "altivec_vmrghb_direct"
 (const_int 5) (const_int 21)
 (const_int 6) (const_int 22)
 (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (vec_select:V16QI
+ (vec_concat:V32QI
+   (match_operand:V16QI 2 "register_operand" "v")
+   (match_operand:V16QI 1 "register_operand" "v"))
+ (parallel [(const_int  8) (

Re: [PATCH v4 4/6] btf: add -gprune-btf option

2024-06-24 Thread David Faust

Ping.

Richard: I changed the option name as you asked but forgot to CC you on
the updated patch.  Is the new option OK?

Indu: You had some minor comments on the prior version which I have
addressed, not sure whether you meant the rest of the patch was OK or
not, or if you had a chance to review it.

Thanks!

archive: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654252.html

On 6/11/24 12:01, David Faust wrote:
> This patch adds a new option, -gprune-btf, to control BTF debug info
> generation.
> 
> As the name implies, this option enables a kind of "pruning" of the BTF
> information before it is emitted.  When enabled, rather than emitting
> all type information translated from DWARF, only information for types
> directly used in the source program is emitted.
> 
> The primary purpose of this pruning is to reduce the amount of
> unnecessary BTF information emitted, especially for BPF programs.  It is
> very common for BPF programs to include Linux kernel internal headers in
> order to have access to kernel data structures.  However, doing so often
> has the side effect of also adding type definitions for a large number
> of types which are not actually used by nor relevant to the program.
> In these cases, -gprune-btf commonly reduces the size of the resulting
> BTF information by 10x or more, as seen on average when compiling Linux
> kernel BPF selftests.  This both slims down the size of the resulting
> object and reduces the time required by the BPF loader to verify the
> program and its BTF information.
> 
> Note that the pruning implemented in this patch follows the same rules
> as the BTF pruning performed unconditionally by LLVM's BPF backend when
> generating BTF.  In particular, the main sources of pruning are:
> 
>   1) Only generate BTF for types used by variables and functions at the
>  file scope.
> 
>  Note that which variables are known to be "used" may differ
>  slightly between LTO and non-LTO builds due to optimizations.  For
>  non-LTO builds (and always for the BPF target), variables which are
>  optimized away during compilation are considered to be unused, and
>  they (along with their types) are pruned.  For LTO builds, such
>  variables are not known to be optimized away by the time pruning
>  occurs, so VAR records for them and information for their types may
>  be present in the emitted BTF information.  This is a missed
>  optimization that may be fixed in the future.
> 
>   2) Avoid emitting full BTF for struct and union types which are only
>  pointed-to by members of other struct/union types.  In these cases,
>  the full BTF_KIND_STRUCT or BTF_KIND_UNION which would normally
>  be emitted is replaced with a BTF_KIND_FWD, as though the
>  underlying type was a forward-declared struct or union type.
> 
> gcc/
>   * btfout.cc (btf_used_types): New hash set.
>   (struct btf_fixup): New.
>   (fixups, forwards): New vecs.
>   (btf_output): Calculate num_types depending on debug_prune_btf.
>   (btf_early_finsih): New initialization for debug_prune_btf.
>   (btf_add_used_type): New function.
>   (btf_used_type_list_cb): Likewise.
>   (btf_collect_pruned_types): Likewise.
>   (btf_add_vars): Handle special case for variables in ".maps" section
>   when generating BTF for BPF CO-RE target.
>   (btf_late_finish): Use btf_collect_pruned_types when debug_prune_btf
>   is in effect.  Move some initialization to btf_early_finish.
>   (btf_finalize): Additional deallocation for debug_prune_btf.
>   * common.opt (gprune-btf): New flag.
>   * ctfc.cc (init_ctf_strtable): Make non-static.
>   * ctfc.h (init_ctf_strtable, ctfc_delete_strtab): Make extern.
>   * doc/invoke.texi (Debugging Options): Document -gprune-btf.
> 
> gcc/testsuite/
>   * gcc.dg/debug/btf/btf-prune-1.c: New test.
>   * gcc.dg/debug/btf/btf-prune-2.c: Likewise.
>   * gcc.dg/debug/btf/btf-prune-3.c: Likewise.
>   * gcc.dg/debug/btf/btf-prune-maps.c: Likewise.
> ---
>  gcc/btfout.cc | 358 +-
>  gcc/common.opt|   4 +
>  gcc/ctfc.cc   |   2 +-
>  gcc/ctfc.h|   3 +
>  gcc/doc/invoke.texi   |  20 +
>  gcc/testsuite/gcc.dg/debug/btf/btf-prune-1.c  |  25 ++
>  gcc/testsuite/gcc.dg/debug/btf/btf-prune-2.c  |  33 ++
>  gcc/testsuite/gcc.dg/debug/btf/btf-prune-3.c  |  35 ++
>  .../gcc.dg/debug/btf/btf-prune-maps.c |  20 +
>  9 files changed, 493 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-maps.c
> 
> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
> index 89f148de9650..34d

Re: [PATCH v4 6/6] opts: allow any combination of DWARF, CTF, BTF

2024-06-24 Thread David Faust

Ping.

archive: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654253.html

On 6/11/24 12:01, David Faust wrote:
> Previously it was not supported to generate both CTF and BTF debug info
> in the same compiler run, as both formats made incompatible changes to
> the same internal data structures.
> 
> With the structural change in the prior patches, in particular the
> guarantee that CTF will always be fully emitted before any BTF
> translation occurs, there is no longer anything preventing generation
> of both CTF and BTF at the same time.  This patch changes option parsing
> to allow any combination of -gdwarf, -gctf, and -gbtf at the same time.
> 
> gcc/
>   * opts.cc (set_debug_level): Allow any combination of -gdwarf,
>   -gctf and -gbtf to be enabled at the same time.
> 
> gcc/testsuite/
>   * gcc.dg/debug/btf/btf-3.c: New test.
>   * gcc.dg/debug/btf/btf-4.c: Likewise.
>   * gcc.dg/debug/btf/btf-5.c: Likewise.
> ---
>  gcc/opts.cc| 20 +---
>  gcc/testsuite/gcc.dg/debug/btf/btf-3.c |  8 
>  gcc/testsuite/gcc.dg/debug/btf/btf-4.c |  8 
>  gcc/testsuite/gcc.dg/debug/btf/btf-5.c |  9 +
>  4 files changed, 30 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-5.c
> 
> diff --git a/gcc/opts.cc b/gcc/opts.cc
> index 1b1b46455af6..7e9f2d91172b 100644
> --- a/gcc/opts.cc
> +++ b/gcc/opts.cc
> @@ -3506,21 +3506,11 @@ set_debug_level (uint32_t dinfo, int extended, const 
> char *arg,
>  }
>else
>  {
> -  /* Make and retain the choice if both CTF and DWARF debug info are to
> -  be generated.  */
> -  if (((dinfo == DWARF2_DEBUG) || (dinfo == CTF_DEBUG))
> -   && ((opts->x_write_symbols == (DWARF2_DEBUG|CTF_DEBUG))
> -   || (opts->x_write_symbols == DWARF2_DEBUG)
> -   || (opts->x_write_symbols == CTF_DEBUG)))
> - {
> -   opts->x_write_symbols |= dinfo;
> -   opts_set->x_write_symbols |= dinfo;
> - }
> -  /* However, CTF and BTF are not allowed together at this time.  */
> -  else if (((dinfo == DWARF2_DEBUG) || (dinfo == BTF_DEBUG))
> -&& ((opts->x_write_symbols == (DWARF2_DEBUG|BTF_DEBUG))
> -|| (opts->x_write_symbols == DWARF2_DEBUG)
> -|| (opts->x_write_symbols == BTF_DEBUG)))
> +  /* Any combination of DWARF, CTF and BTF is allowed.  */
> +  if (((dinfo == DWARF2_DEBUG) || (dinfo == CTF_DEBUG)
> +|| (dinfo == BTF_DEBUG))
> +   && ((opts->x_write_symbols | (DWARF2_DEBUG | CTF_DEBUG | BTF_DEBUG))
> +== (DWARF2_DEBUG | CTF_DEBUG | BTF_DEBUG)))
>   {
> opts->x_write_symbols |= dinfo;
> opts_set->x_write_symbols |= dinfo;
> diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-3.c 
> b/gcc/testsuite/gcc.dg/debug/btf/btf-3.c
> new file mode 100644
> index ..93c8164a2a54
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/debug/btf/btf-3.c
> @@ -0,0 +1,8 @@
> +/* Verify that BTF debug info can co-exist with DWARF.  */
> +/* { dg-do compile } */
> +/* { dg-options "-gdwarf -gbtf -dA" } */
> +/* { dg-final { scan-assembler "0xeb9f.*btf_magic" } } */
> +/* { dg-final { scan-assembler "DWARF version number" } } */
> +
> +void func (void)
> +{ }
> diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-4.c 
> b/gcc/testsuite/gcc.dg/debug/btf/btf-4.c
> new file mode 100644
> index ..b087917188bb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/debug/btf/btf-4.c
> @@ -0,0 +1,8 @@
> +/* Verify that BTF debug info can co-exist with CTF.  */
> +/* { dg-do compile } */
> +/* { dg-options "-gctf -gbtf -dA" } */
> +/* { dg-final { scan-assembler "0xeb9f.*btf_magic" } } */
> +/* { dg-final { scan-assembler "0xdff2.*CTF preamble magic number" } } */
> +
> +void func (void)
> +{ }
> diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-5.c 
> b/gcc/testsuite/gcc.dg/debug/btf/btf-5.c
> new file mode 100644
> index ..45267b5fc422
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/debug/btf/btf-5.c
> @@ -0,0 +1,9 @@
> +/* Verify that BTF, CTF and DWARF can all co-exist happily.  */
> +/* { dg-do compile } */
> +/* { dg-options "-gctf -gbtf -gdwarf -dA" } */
> +/* { dg-final { scan-assembler "0xeb9f.*btf_magic" } } */
> +/* { dg-final { scan-assembler "0xdff2.*CTF preamble magic number" } } */
> +/* { dg-final { scan-assembler "DWARF version number" } } */
> +
> +void func (void)
> +{ }

Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]

2024-06-24 Thread Richard Earnshaw (lists)

On 24/06/2024 12:35, Alexandre Oliva wrote:
> On Jun 21, 2024, Christophe Lyon  wrote:
> 
>>> How about mentioning Christophe's simplification in the commit log?
> 
>> For the avoidance of doubt: it's OK for me (but you don't need to
>> mention my name in fact ;-)
> 
> Needing or not, I added it ;-)
> 
 be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
 before the fix).
> 
>>> This is OK, but you might wish to revisit this statement before
>>> committing.
> 
> Oh, right, sorry, I messed it up.  uint16_t was what I should have put
> in there.  int16_t would have overflown and invoked undefined behavior
> to begin with, and I see it misled you far down the wrong path.
> 
>>> I think the original bug was that we were losing the cast to short
> 
> The problem was that the shift count saturated at 15.  AFAIK sign
> extension was not relevant.  Hopefully the rewritten opening paragraph
> below makes that clearer.  I will put it in later this week barring
> objections or further suggestions of improvement.  Thanks,

A signed shift right on a 16-bit vector element by 15 would still yield -1; but 
...

> 
> 
> [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
> 
> The test was too optimistic, alas.  We used to vectorize shifts by
> clamping the shift counts below the bit width of the types (e.g. at 15
> for 16-bit vector elements), but (uint16_t)32768 >> (uint16_t)16 is
> well defined (because of promotion to 32-bit int) and must yield 0,
> not 1 (as before the fix).

That make more sense now.  Thanks.

> 
> Unfortunately, in the gimple model of vector units, such large shift
> counts wouldn't be well-defined, so we won't vectorize such shifts any
> more, unless we can tell they're in range or undefined.
> 
> So the test that expected the vectorization we no longer performed
> needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
> suggested annotating the test with the expected ranges so as to enable
> the optimization, and Christophe Lyon suggested a further
> simplification.
> 
> 
> Co-Authored-By: Richard Earnshaw 
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR tree-optimization/113281
>   * gcc.target/arm/simd/mve-vshr.c: Add expected ranges.

I think this is OK now.

R.

> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> index 8c7adef9ed8f1..03078de49c65e 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> @@ -9,6 +9,8 @@
>void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
>  int i;   \
>  for (i=0; i +  if ((unsigned)b[i] >= (unsigned)(BITS))
> \
> + __builtin_unreachable();\
>dest[i] = a[i] OP b[i];
> \
>  }
> \
>  }
> 
>

Ping^2 [PATCH-2v4] Value Range: Add range op for builtin isfinite

2024-06-24 Thread HAO CHEN GUI

Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html

Thanks
Gui Haochen

在 2024/6/20 14:57, HAO CHEN GUI 写道:
> Hi,
>   Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/5/30 10:46, HAO CHEN GUI 写道:
>> Hi,
>>   This patch adds the range op for builtin isfinite.
>>
>>   Compared to previous version, the main change is to set the range to
>> 1 if it's finite number otherwise to 0.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652220.html
>>
>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions. Is it OK for the trunk?
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> Value Range: Add range op for builtin isfinite
>>
>> The former patch adds optab for builtin isfinite. Thus builtin isfinite
>> might not be folded at front end.  So the range op for isfinite is needed
>> for value range analysis.  This patch adds range op for builtin isfinite.
>>
>> gcc/
>>  * gimple-range-op.cc (class cfn_isfinite): New.
>>  (op_cfn_finite): New variables.
>>  (gimple_range_op_handler::maybe_builtin_call): Handle
>>  CFN_BUILT_IN_ISFINITE.
>>
>> gcc/testsuite/
>>  * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.
>>
>> patch.diff
>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
>> index 4e60a42eaac..5ec5c828fa4 100644
>> --- a/gcc/gimple-range-op.cc
>> +++ b/gcc/gimple-range-op.cc
>> @@ -1233,6 +1233,62 @@ public:
>>}
>>  } op_cfn_isinf;
>>
>> +//Implement range operator for CFN_BUILT_IN_ISFINITE
>> +class cfn_isfinite : public range_operator
>> +{
>> +public:
>> +  using range_operator::fold_range;
>> +  using range_operator::op1_range;
>> +  virtual bool fold_range (irange &r, tree type, const frange &op1,
>> +   const irange &, relation_trio) const override
>> +  {
>> +if (op1.undefined_p ())
>> +  return false;
>> +
>> +if (op1.known_isfinite ())
>> +  {
>> +wide_int one = wi::one (TYPE_PRECISION (type));
>> +r.set (type, one, one);
>> +return true;
>> +  }
>> +
>> +if (op1.known_isnan ()
>> +|| op1.known_isinf ())
>> +  {
>> +r.set_zero (type);
>> +return true;
>> +  }
>> +
>> +r.set_varying (type);
>> +return true;
>> +  }
>> +  virtual bool op1_range (frange &r, tree type, const irange &lhs,
>> +  const frange &, relation_trio) const override
>> +  {
>> +if (lhs.undefined_p ())
>> +  return false;
>> +
>> +if (lhs.zero_p ())
>> +  {
>> +// The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented.
>> +// Set range to varying
>> +r.set_varying (type);
>> +return true;
>> +  }
>> +
>> +if (!range_includes_zero_p (lhs))
>> +  {
>> +nan_state nan (false);
>> +r.set (type, real_min_representable (type),
>> +   real_max_representable (type), nan);
>> +return true;
>> +  }
>> +
>> +r.set_varying (type);
>> +return true;
>> +  }
>> +} op_cfn_isfinite;
>> +
>>  // Implement range operator for CFN_BUILT_IN_
>>  class cfn_parity : public range_operator
>>  {
>> @@ -1330,6 +1386,11 @@ gimple_range_op_handler::maybe_builtin_call ()
>>m_operator = &op_cfn_isinf;
>>break;
>>
>> +case CFN_BUILT_IN_ISFINITE:
>> +  m_op1 = gimple_call_arg (call, 0);
>> +  m_operator = &op_cfn_isfinite;
>> +  break;
>> +
>>  CASE_CFN_COPYSIGN_ALL:
>>m_op1 = gimple_call_arg (call, 0);
>>m_op2 = gimple_call_arg (call, 1);
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
>> new file mode 100644
>> index 000..f5dce0a0486
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
>> @@ -0,0 +1,31 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-evrp" } */
>> +
>> +#include 
>> +void link_error();
>> +
>> +void test1 (double x)
>> +{
>> +  if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x))
>> +link_error ();
>> +}
>> +
>> +void test2 (float x)
>> +{
>> +  if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x))
>> +link_error ();
>> +}
>> +
>> +void test3 (double x)
>> +{
>> +  if (__builtin_isfinite (x) && __builtin_isinf (x))
>> +link_error ();
>> +}
>> +
>> +void test4 (float x)
>> +{
>> +  if (__builtin_isfinite (x) && __builtin_isinf (x))
>> +link_error ();
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */

[PATCH v3] RISC-V: Add dg-remove-option for z* extensions

2024-06-24 Thread Patrick O'Neill

This introduces testsuite support infra for removing extensions.
Since z* extensions don't have ordering requirements the logic for
adding/removing those extensions has also been consolidated.

This fixes RVWMO compile testcases failing on Ztso targets by removing
the extension from the -march string.

gcc/ChangeLog:

* doc/sourcebuild.texi (dg-remove-option): Add documentation.
(dg-add-option): Add documentation for riscv_{a,zaamo,zalrsc,ztso}

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-amo-add-1.c: Add dg-remove-options
for ztso.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-compat-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-1.c: Replace manually
specified -march string with dg-add/remove-options directives.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-5.c: Ditto.
* lib/target-supports-dg.exp: Add dg-remove-options.
* lib/target-supports.exp: Add dg-remove-options and consolidate z*
extension add/remove-option code.

Signed-off-by: Patrick O'Neill 
---
Tested using rv64gcv_ztso/rv64id but relying on precommit to run the targets
there.

Beyond testing Ztso/Zalrsc this is also helpful for the Zabha patch I'm
working on. We can continue to test the atomic subword emulation
routines without specifing a -march string.
---
v2 ChangeLog:
Remove spare bracket that pre-commit flagged.
Add missing dg-add-options for zalrsc testcases.
---
v3 ChangeLog:
Add documentation for dg-remove-option and document some existing riscv_*
dg-add-option options.

Approved here: 
https://inbox.sourceware.org/gcc-patches/9fa5c829-1e30-444c-a091-62d05f2f2...@gmail.com/
I'll let it sit on the list for an hour or so before committing in case anyone
has feedback on the added docs.
---
 gcc/doc/sourcebuild.texi  |  41 
 .../riscv/amo/amo-table-a-6-amo-add-1.c   |   1 +
 .../riscv/amo/amo-table-a-6-amo-add-2.c   |   1 +
 .../riscv/amo/amo-table-a-6-amo-add-3.c   |   1 +
 .../riscv/amo/amo-table-a-6-amo-add-4.c   |   1 +
 .../riscv/amo/amo-table-a-6-amo-add-5.c   |   1 +
 .../amo/amo-table-a-6-compare-exchange-1.c|   1 +
 .../amo/amo-table-a-6-compare-exchange-2.c|   1 +
 .../amo/amo-table-a-6-compare-exchange-3.c|   1 +
 .../amo/amo-table-a-6-compare-exchange-4.c|   1 +
 .../amo/amo-table-a-6-compare-exchange-5.c|   1 +
 .../amo/amo-table-a-6-compare-exchange-6.c|   1 +
 .../amo/amo-table-a-6-compare-exchange-7.c|   1 +
 .../riscv/amo/amo-table-a-6-fence-1.c |   1 +
 .../riscv/amo/amo-table-a-6-fence-2.c |   1 +
 .../riscv/amo/amo-table-a-6-fence-3.c |   1 +
 .../riscv/amo/amo-table-a-6-fence-4.c |   1 +
 .../riscv/amo/amo-table-a-6-fence-5.c |   1 +
 .../riscv/amo/amo-table-a-6-load-1.c  |   1 +
 .../riscv/amo/amo-table-a-6-load-2.c  |   1 +
 .../riscv/amo/amo-table-a-6-load-3.c  |   1 +
 .../riscv/amo/amo-table-a-6-store-1.c |   1 +
 .../riscv/amo/amo-table-a-6-store

Re: [PATCH] RISC-V: Support -m[no-]unaligned-access

2024-06-24 Thread Palmer Dabbelt


On Fri, 22 Dec 2023 01:23:13 PST (-0800), wangpengcheng...@bytedance.com wrote:

These two options are negative alias of -m[no-]strict-align.

This matches LLVM implmentation.

gcc/ChangeLog:

* config/riscv/riscv.opt: Add option alias.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-align-10.c: New test.
* gcc.target/riscv/predef-align-7.c: New test.
* gcc.target/riscv/predef-align-8.c: New test.
* gcc.target/riscv/predef-align-9.c: New test.

Signed-off-by: Wang Pengcheng


Sorry for being slow here.  With the scalar/vector alignment split we're 
cleaning up a bunch of these LLVM/GCC differences, and we're waiting for 
the LLVM folks to decide how these are going to behave.  LLVM will 
release well before GCC does, so we've got some time.


So this isn't lost, just slow.


---
gcc/config/riscv/riscv.opt | 4 
gcc/testsuite/gcc.target/riscv/predef-align-10.c | 16 
gcc/testsuite/gcc.target/riscv/predef-align-7.c | 15 +++
gcc/testsuite/gcc.target/riscv/predef-align-8.c | 16 
gcc/testsuite/gcc.target/riscv/predef-align-9.c | 15 +++
5 files changed, 66 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-9.c

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index cf207d4dcdf..1e22998ce6e 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -116,6 +116,10 @@ mstrict-align
Target Mask(STRICT_ALIGN) Save
Do not generate unaligned memory accesses.

+munaligned-access
+Target Alias(mstrict-align) NegativeAlias
+Enable unaligned memory accesses.
+
Enum
Name(code_model) Type(enum riscv_code_model)
Known code models (for use with the -mcmodel= option):
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-10.c
b/gcc/testsuite/gcc.target/riscv/predef-align-10.c
new file mode 100644
index 000..c86b2c7a5ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-10.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=rocket -munaligned-access" } */
+
+int main() {
+
+/* rocket default is cpu tune param misaligned access slow */
+#if !defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_slow is not set"
+#endif
+
+#if defined(__riscv_misaligned_avoid) || defined(__riscv_misaligned_fast)
+#error "__riscv_misaligned_avoid or __riscv_misaligned_fast is
unexpectedly set"
+#endif
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-7.c
b/gcc/testsuite/gcc.target/riscv/predef-align-7.c
new file mode 100644
index 000..405f3686c2e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-7.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=thead-c906 -mno-unaligned-access" } */
+
+int main() {
+
+#if !defined(__riscv_misaligned_avoid)
+#error "__riscv_misaligned_avoid is not set"
+#endif
+
+#if defined(__riscv_misaligned_fast) || defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_fast or __riscv_misaligned_slow is unexpectedly
set"
+#endif
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-8.c
b/gcc/testsuite/gcc.target/riscv/predef-align-8.c
new file mode 100644
index 000..64072c04a47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-8.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=thead-c906 -munaligned-access" } */
+
+int main() {
+
+/* thead-c906 default is cpu tune param misaligned access fast */
+#if !defined(__riscv_misaligned_fast)
+#error "__riscv_misaligned_fast is not set"
+#endif
+
+#if defined(__riscv_misaligned_avoid) || defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_avoid or __riscv_misaligned_slow is
unexpectedly set"
+#endif
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-9.c
b/gcc/testsuite/gcc.target/riscv/predef-align-9.c
new file mode 100644
index 000..f5418de87cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-9.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=rocket -mno-unaligned-access" } */
+
+int main() {
+
+#if !defined(__riscv_misaligned_avoid)
+#error "__riscv_misaligned_avoid is not set"
+#endif
+
+#if defined(__riscv_misaligned_fast) || defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_fast or __riscv_misaligned_slow is unexpectedly
set"
+#endif
+
+ return 0;
+}

Re: [PATCH v4 4/6] btf: add -gprune-btf option

2024-06-24 Thread Indu Bhagat


On 6/24/24 09:11, David Faust wrote:

Ping.

Richard: I changed the option name as you asked but forgot to CC you on
the updated patch.  Is the new option OK?

Indu: You had some minor comments on the prior version which I have
addressed, not sure whether you meant the rest of the patch was OK or
not, or if you had a chance to review it.



Hi David,

Thanks for making the change in the commit message to clearly state the 
behavior of the option -gprune-btf with and without LTO build.  I did 
take a look at the V3 version of the patch, had tested it a bit too.


While there are still remain some gaps in my understanding of the 
algorithm, but overall I think this patch as such looks good and makes 
forward progress.


So, LGTM.

Thanks
Indu


Thanks!

archive:https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654252.html

Ping^2 [PATCH-1v3] Value Range: Add range op for builtin isinf

2024-06-24 Thread HAO CHEN GUI

Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html

Thanks
Gui Haochen

在 2024/6/20 14:56, HAO CHEN GUI 写道:
> Hi,
>   Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/5/30 10:46, HAO CHEN GUI 写道:
>> Hi,
>>   The builtin isinf is not folded at front end if the corresponding optab
>> exists. It causes the range evaluation failed on the targets which has
>> optab_isinf. For instance, range-sincos.c will fail on the targets which
>> has optab_isinf as it calls builtin_isinf.
>>
>>   This patch fixed the problem by adding range op for builtin isinf.
>>
>>   Compared with previous version, the main change is to set the range to
>> 1 if it's infinite number otherwise to 0.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652219.html
>>
>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions. Is it OK for the trunk?
>>
>> Thanks
>> Gui Haochen
>>
>>
>> ChangeLog
>> Value Range: Add range op for builtin isinf
>>
>> The builtin isinf is not folded at front end if the corresponding optab
>> exists.  So the range op for isinf is needed for value range analysis.
>> This patch adds range op for builtin isinf.
>>
>> gcc/
>>  * gimple-range-op.cc (class cfn_isinf): New.
>>  (op_cfn_isinf): New variables.
>>  (gimple_range_op_handler::maybe_builtin_call): Handle
>>  CASE_FLT_FN (BUILT_IN_ISINF).
>>
>> gcc/testsuite/
>>  * gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test.
>>
>> patch.diff
>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
>> index 55dfbb23ce2..4e60a42eaac 100644
>> --- a/gcc/gimple-range-op.cc
>> +++ b/gcc/gimple-range-op.cc
>> @@ -1175,6 +1175,63 @@ private:
>>bool m_is_pos;
>>  } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);
>>
>> +// Implement range operator for CFN_BUILT_IN_ISINF
>> +class cfn_isinf : public range_operator
>> +{
>> +public:
>> +  using range_operator::fold_range;
>> +  using range_operator::op1_range;
>> +  virtual bool fold_range (irange &r, tree type, const frange &op1,
>> +   const irange &, relation_trio) const override
>> +  {
>> +if (op1.undefined_p ())
>> +  return false;
>> +
>> +if (op1.known_isinf ())
>> +  {
>> +wide_int one = wi::one (TYPE_PRECISION (type));
>> +r.set (type, one, one);
>> +return true;
>> +  }
>> +
>> +if (op1.known_isnan ()
>> +|| (!real_isinf (&op1.lower_bound ())
>> +&& !real_isinf (&op1.upper_bound (
>> +  {
>> +r.set_zero (type);
>> +return true;
>> +  }
>> +
>> +r.set_varying (type);
>> +return true;
>> +  }
>> +  virtual bool op1_range (frange &r, tree type, const irange &lhs,
>> +  const frange &, relation_trio) const override
>> +  {
>> +if (lhs.undefined_p ())
>> +  return false;
>> +
>> +if (lhs.zero_p ())
>> +  {
>> +nan_state nan (true);
>> +r.set (type, real_min_representable (type),
>> +   real_max_representable (type), nan);
>> +return true;
>> +  }
>> +
>> +if (!range_includes_zero_p (lhs))
>> +  {
>> +// The range is [-INF,-INF][+INF,+INF], but it can't be represented.
>> +// Set range to [-INF,+INF]
>> +r.set_varying (type);
>> +r.clear_nan ();
>> +return true;
>> +  }
>> +
>> +r.set_varying (type);
>> +return true;
>> +  }
>> +} op_cfn_isinf;
>>
>>  // Implement range operator for CFN_BUILT_IN_
>>  class cfn_parity : public range_operator
>> @@ -1268,6 +1325,11 @@ gimple_range_op_handler::maybe_builtin_call ()
>>m_operator = &op_cfn_signbit;
>>break;
>>
>> +CASE_FLT_FN (BUILT_IN_ISINF):
>> +  m_op1 = gimple_call_arg (call, 0);
>> +  m_operator = &op_cfn_isinf;
>> +  break;
>> +
>>  CASE_CFN_COPYSIGN_ALL:
>>m_op1 = gimple_call_arg (call, 0);
>>m_op2 = gimple_call_arg (call, 1);
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
>> new file mode 100644
>> index 000..468f1bcf5c7
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
>> @@ -0,0 +1,44 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-evrp" } */
>> +
>> +#include 
>> +void link_error();
>> +
>> +void
>> +test1 (double x)
>> +{
>> +  if (x > __DBL_MAX__ && !__builtin_isinf (x))
>> +link_error ();
>> +  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
>> +link_error ();
>> +}
>> +
>> +void
>> +test2 (float x)
>> +{
>> +  if (x > __FLT_MAX__ && !__builtin_isinf (x))
>> +link_error ();
>> +  if (x < -__FLT_MAX__ && !__builtin_isinf (x))
>> +link_error ();
>> +}
>> +
>> +void
>> +test3 (double x)
>> +{
>> +  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__)
>> +link_error ();
>> +  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__)
>> +link_error ();
>> +}
>> +
>> +void
>> +test4 (float x)

[PATCH v4] RISC-V: Add dg-remove-option for z* extensions

2024-06-24 Thread Patrick O'Neill

This introduces testsuite support infra for removing extensions.
Since z* extensions don't have ordering requirements the logic for
adding/removing those extensions has also been consolidated.

This fixes RVWMO compile testcases failing on Ztso targets by removing
the extension from the -march string.

gcc/ChangeLog:

* doc/sourcebuild.texi (dg-remove-option): Add documentation.
(dg-add-option): Add documentation for riscv_{a,zaamo,zalrsc,ztso}

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-amo-add-1.c: Add dg-remove-options
for ztso.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-compat-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-1.c: Replace manually
specified -march string with dg-add/remove-options directives.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-5.c: Ditto.
* lib/target-supports-dg.exp: Add dg-remove-options.
* lib/target-supports.exp: Add dg-remove-options and consolidate z*
extension add/remove-option code.

Signed-off-by: Patrick O'Neill 
---
Tested using rv64gcv_ztso/rv64id but relying on precommit to run the targets
there.

Beyond testing Ztso/Zalrsc this is also helpful for the Zabha patch I'm
working on. We can continue to test the atomic subword emulation
routines without specifing a -march string.
---
v2 ChangeLog:
Remove spare bracket that pre-commit flagged.
Add missing dg-add-options for zalrsc testcases.
---
v3 ChangeLog:
Add documentation for dg-remove-option and document some existing riscv_*
dg-add-option options.

Approved here: 
https://inbox.sourceware.org/gcc-patches/9fa5c829-1e30-444c-a091-62d05f2f2...@gmail.com/
I'll let it sit on the list for an hour or so before committing in case anyone
has feedback on the added docs.
---
v4 ChangeLog:
Add missing @table to new dg-remove-option section in sourcebuild.texi as noted
by the linaro pre-commit CI.

Approved here: 
https://inbox.sourceware.org/gcc-patches/9fa5c829-1e30-444c-a091-62d05f2f2...@gmail.com/
Will wait an hour or so before committing.
---
 gcc/doc/sourcebuild.texi  |  43 
 .../riscv/amo/amo-table-a-6-amo-add-1.c   |   1 +
 .../riscv/amo/amo-table-a-6-amo-add-2.c   |   1 +
 .../riscv/amo/amo-table-a-6-amo-add-3.c   |   1 +
 .../riscv/amo/amo-table-a-6-amo-add-4.c   |   1 +
 .../riscv/amo/amo-table-a-6-amo-add-5.c   |   1 +
 .../amo/amo-table-a-6-compare-exchange-1.c|   1 +
 .../amo/amo-table-a-6-compare-exchange-2.c|   1 +
 .../amo/amo-table-a-6-compare-exchange-3.c|   1 +
 .../amo/amo-table-a-6-compare-exchange-4.c|   1 +
 .../amo/amo-table-a-6-compare-exchange-5.c|   1 +
 .../amo/amo-table-a-6-compare-exchange-6.c|   1 +
 .../amo/amo-table-a-6-compare-exchange-7.c|   1 +
 .../riscv/amo/amo-table-a-6-fence-1.c |   1 +
 .../riscv/amo/amo-table-a-6-fence-2.c |   1 +
 .../riscv/amo/amo-table-a-6-fence-3.c |   1 +
 .../riscv/amo/amo-table-a-6-fence-4.c |   1 +
 .../riscv/amo/amo-table-a-6-fenc

Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]

2024-06-24 Thread Alexandre Oliva

On Jun 24, 2024, "Richard Earnshaw (lists)"  wrote:

> A signed shift right on a 16-bit vector element by 15 would still
> yield -1

Yeah.  Indeed, ISTM that we *could* have retained the clamping
transformation for *signed* shifts, since the clamping would only make a
difference in case of (undefined) overflow.  Only for unsigned shifts
can well-defined shifts yield different results with clamping.

Richard (Sandiford), do you happen to recall why the IRC conversation
mentioned in the PR trail decided to drop it entirely, even for signed
types?

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH] Add a late-combine pass [PR106594]

2024-06-24 Thread Segher Boessenkool

I didn't see this before.   Sigh.

On Tue, Jan 02, 2024 at 09:47:11AM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > On Tue, Oct 24, 2023 at 07:49:10PM +0100, Richard Sandiford wrote:
> >> This patch adds a combine pass that runs late in the pipeline.
> >
> > But it is not.  It is a completely new thing, and much closer to
> > fwprop than to combine, too.
> 
> Well, it is a combine pass.

No, it is not.  In the context of GCC combine is the instruction
combiner.  Which does something else than this does.

So use a different name.  Please.  It will be NAKked by the combine
maintainer otherwise.

> It's not a new instance of the pass in
> combine.cc, but I don't think that's the implication.  We already have
> two combine passes: the combine.cc one and the postreload one.

There is no postreload-combine pass.  There is a postreload pass that
does various trivial things.  One of those is reload_combine, which is
nothing like combine.  It is a kind of limited fwprop for memory
addressing.

> > Could you rename it to something else, please?  Something less confusing
> > to both users and maintainers :-)
> 
> Do you have any suggestions?

Since it is something like fwprop, maybe something like that?  Or maybe
put "addressing" in the name, if that is the point here.

> >> The pass currently has a single objective: remove definitions by
> >> substituting into all uses.
> >
> > The easy case ;-)
> 
> And the yet a case that no existing pass handles. :)  That's why I'm
> trying to add something that does.

So, fwprop.

Segher

RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-24 Thread Tamar Christina

Hi,

> -Original Message-
> From: pan2...@intel.com 
> Sent: Monday, June 24, 2024 2:55 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com; Pan Li 
> Subject: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> From: Pan Li 
> 
> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
> 
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
> 
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
> 
> It will have gimple before vect pass,  it cannot hit any pattern of
> SAT_SUB and then cannot vectorize to SAT_SUB.
> 
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = _18 ? iftmp.0_13 : 0;
> 
> This patch would like to improve the pattern match to recog above
> as truncate after .SAT_SUB pattern.  Then we will have the pattern
> similar to below,  as well as eliminate the first 3 dead stmt.
> 
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> 

I guess this is because one branch of the  cond is a constant so the
convert is folded in.  I was wondering though,  can't we just push
in the truncate in this case?

i.e. in this case we know both types are unsigned and the difference
positive and max value is the max value of the truncate type.

It seems like folding as a general rule

  _1 = *p_10;
  a_11 = (unsigned int) _1;
  _2 = a_11 - b_12(D);
  iftmp.0_13 = (short unsigned int) _2;
  _18 = a_11 >= b_12(D);
  iftmp.0_5 = _18 ? iftmp.0_13 : 0;
  *p_10 = iftmp.0_5;

Into 

  _1 = *p_10;
  a_11 = (unsigned int) _1;
  _2 = ((short unsigned int) a_11) - ((short unsigned int) b_12(D));
  iftmp.0_13 = _2;
  _18 = a_11 >= b_12(D);
  iftmp.0_5 = _18 ? iftmp.0_13 : 0;
  *p_10 = iftmp.0_5;

Is valid (though might have missed something).  This would negate the need for 
this change to the vectorizer and saturation detection
but also should generate better vector code. This is what we do in the general 
case https://godbolt.org/z/dfoj6fWdv
I think here we're just not seeing through the cond.

Typically lots of architectures have cheap truncation operations, so truncating 
before saturation means you do the cheap
operation first rather than doing the complex operation on the wider type.

That is,

_2 = a_11 - b_12(D);
iftmp.0_13 = (short unsigned int) _2;
_18 = a_11 >= b_12(D);
iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int) b_12(D));

is cheaper than

_2 = a_11 - b_12(D);
iftmp.0_13 = (short unsigned int) _2;
_18 = a_11 >= b_12(D);
iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));

after vectorization.   Normally the vectorizer will try to do this through 
over-widening detection as well,
but we haven't taught ranger about the ranges of these new IFNs (probably 
should at some point).

Cheers,
Tamar

> The below tests are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
> 
> gcc/ChangeLog:
> 
>   * match.pd: Add convert description for minus and capture.
>   * tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
>   new logic to handle in_type is incompatibile with out_type,  as
>   well as rename from.
>   (vect_recog_build_binary_gimple_stmt): Rename to.
>   (vect_recog_sat_add_pattern): Leverage above renamed func.
>   (vect_recog_sat_sub_pattern): Ditto.
> 
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  |  4 +--
>  gcc/tree-vect-patterns.cc | 51 ---
>  2 files changed, 33 insertions(+), 22 deletions(-)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3d0689c9312..4a4b0b2e72f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3164,9 +3164,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned saturation sub, case 2 (branch with ge):
> SAT_U_SUB = X >= Y ? X - Y : 0.  */
>  (match (unsigned_integer_sat_sub @0 @1)
> - (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
> + (cond^ (ge @0 @1) (convert? (minus (convert1? @0) (convert1? @1)))
> integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && types_match (type, @0, @1
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1
> 
>  /* Unsigned saturation sub, case 3 (branchless with gt):
> SAT_U_SUB = (X - Y) * (X > Y).  */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index cef901808eb..3d887d36050 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4490,26 +4490,37 @@ vect_recog_mult_pattern (vec_info *vinfo,
>  extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
>  extern bool gimple

[Committed] RISC-V: Add dg-remove-option for z* extensions

2024-06-24 Thread Patrick O'Neill


Committed.

Patrick

On 6/24/24 12:06, Patrick O'Neill wrote:

This introduces testsuite support infra for removing extensions.
Since z* extensions don't have ordering requirements the logic for
adding/removing those extensions has also been consolidated.

This fixes RVWMO compile testcases failing on Ztso targets by removing
the extension from the -march string.

gcc/ChangeLog:

* doc/sourcebuild.texi (dg-remove-option): Add documentation.
(dg-add-option): Add documentation for riscv_{a,zaamo,zalrsc,ztso}

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-amo-add-1.c: Add dg-remove-options
for ztso.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-compat-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-1.c: Replace manually
specified -march string with dg-add/remove-options directives.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-5.c: Ditto.
* lib/target-supports-dg.exp: Add dg-remove-options.
* lib/target-supports.exp: Add dg-remove-options and consolidate z*
extension add/remove-option code.

Signed-off-by: Patrick O'Neill 
---
Tested using rv64gcv_ztso/rv64id but relying on precommit to run the targets
there.

Beyond testing Ztso/Zalrsc this is also helpful for the Zabha patch I'm
working on. We can continue to test the atomic subword emulation
routines without specifing a -march string.
---
v2 ChangeLog:
Remove spare bracket that pre-commit flagged.
Add missing dg-add-options for zalrsc testcases.
---
v3 ChangeLog:
Add documentation for dg-remove-option and document some existing riscv_*
dg-add-option options.

Approved here: 
https://inbox.sourceware.org/gcc-patches/9fa5c829-1e30-444c-a091-62d05f2f2...@gmail.com/
I'll let it sit on the list for an hour or so before committing in case anyone
has feedback on the added docs.
---
v4 ChangeLog:
Add missing @table to new dg-remove-option section in sourcebuild.texi as noted
by the linaro pre-commit CI.

Approved here: 
https://inbox.sourceware.org/gcc-patches/9fa5c829-1e30-444c-a091-62d05f2f2...@gmail.com/
Will wait an hour or so before committing.
---
  gcc/doc/sourcebuild.texi  |  43 
  .../riscv/amo/amo-table-a-6-amo-add-1.c   |   1 +
  .../riscv/amo/amo-table-a-6-amo-add-2.c   |   1 +
  .../riscv/amo/amo-table-a-6-amo-add-3.c   |   1 +
  .../riscv/amo/amo-table-a-6-amo-add-4.c   |   1 +
  .../riscv/amo/amo-table-a-6-amo-add-5.c   |   1 +
  .../amo/amo-table-a-6-compare-exchange-1.c|   1 +
  .../amo/amo-table-a-6-compare-exchange-2.c|   1 +
  .../amo/amo-table-a-6-compare-exchange-3.c|   1 +
  .../amo/amo-table-a-6-compare-exchange-4.c|   1 +
  .../amo/amo-table-a-6-compare-exchange-5.c|   1 +
  .../amo/amo-table-a-6-compare-exchange-6.c|   1 +
  .../amo/amo-table-a-6-compare-exchange-7.c|   1 +
  .../riscv/amo/amo-table-a-6-fence-1.c |   1 +
  .../riscv/amo/amo-table-a-6-fence-2.c |   1 +
  .../riscv/amo/amo-table-a-6-fence-3.c |   1 +
  .../ri

Re: [PATCH v2 2/3] RISC-V: setmem for RISCV with V extension

2024-06-24 Thread Jeff Law





On 12/19/23 2:53 AM, Sergei Lewis wrote:

gcc/ChangeLog

 * config/riscv/riscv-protos.h (riscv_vector::expand_vec_setmem): New 
function
 declaration.

 * config/riscv/riscv-string.cc (riscv_vector::expand_vec_setmem): New
 function: this generates an inline vectorised memory set, if and only if we
 know the entire operation can be performed in a single vector store

 * config/riscv/riscv.md (setmem): Try riscv_vector::expand_vec_setmem
 for constant lengths

gcc/testsuite/ChangeLog
 * gcc.target/riscv/rvv/base/setmem-1.c: New tests
 * gcc.target/riscv/rvv/base/setmem-2.c: New tests
 * gcc.target/riscv/rvv/base/setmem-3.c: New tests
So I've updated this patch to work on the trunk and run it through 
pre-commit CI.  Results are clean and I've pushed this to the trunk.


Thanks for your patience.

jeff

[libstdc++] [testsuite] no libatomic for vxworks

2024-06-24 Thread Alexandre Oliva



libatomic hasn't been ported to vxworks.  Most of the stdatomic.h and
 underlying requirements are provided by builtins and libgcc,
and the vxworks libc already provides remaining __atomic symbols, so
porting libatomic doesn't seem to make sense.

However, some of the target arch-only tests in
add_options_for_libatomic cover vxworks targets, so we end up
attempting to link libatomic in, even though it's not there.
Preempt those too-broad tests.

We've long been using a workaround very similar to this on ppc, and now
that we've made sure there's nothing in libatomic that we'd need on any
vxworks targets, we're ready to contribute this change.  Regstrapping on
x86_64-linux-gnu, just to be sure.  Ok to install?


Co-Authored-By: Marc Poulhiès 

for  libstdc++-v3/ChangeLog

* testsuite/lib/dg-options.exp (add_options_for_libatomic):
None for *-*-vxworks*.
---
 libstdc++-v3/testsuite/lib/dg-options.exp |5 +
 1 file changed, 5 insertions(+)

diff --git a/libstdc++-v3/testsuite/lib/dg-options.exp 
b/libstdc++-v3/testsuite/lib/dg-options.exp
index 84f9e3ebc730c..0d77fb029b09b 100644
--- a/libstdc++-v3/testsuite/lib/dg-options.exp
+++ b/libstdc++-v3/testsuite/lib/dg-options.exp
@@ -338,6 +338,11 @@ proc atomic_link_flags { paths } {
 }
 
 proc add_options_for_libatomic { flags } {
+# We don't (need to) build libatomic for vxworks.  Don't try to
+# link it in, even on arches that support libatomic.
+if { [istarget *-*-vxworks*] } {
+   return $flags
+}
 if { [istarget hppa*-*-hpux*]
 || ([istarget powerpc*-*-*] && [check_effective_target_ilp32])
 || [istarget riscv*-*-*]

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [C PATCH] C: Error message for incorrect use of static in array declarations

2024-06-24 Thread Marek Polacek

On Sun, Jun 23, 2024 at 08:42:36PM +0200, Martin Uecker wrote:
> 
> This adds an explicit error message for [static] and [static*] 
> (the same as clang has) instead of the generic "error: expected
> expression before ']' token", which is not entirely accurate.
> For function definitions the subsequent error "[*] can not be
> used outside function prototype scope" is then suppressed.
> 
> 
> Bootstrapped and regression tested on x86_64.
> 
> 
> 
> commit 1157d04764eeeb51fa1098727813dbc092e11dd2
> Author: Martin Uecker 
> Date:   Sat Nov 4 14:39:19 2023 +0100
> 
> C: Error message for incorrect use of static in array declarations.

Please use "[PATCH] c: ..." for C patches.
 
> Add an explicit error messages when c99's static is
> used without a size expression in an array declarator.
> 
> gcc/:
> c/c-parser.cc (c_parser_direct_declarator_inner): Add
> error message.

No "c/" here.

> 
> gcc/testsuite:
> gcc.dg/c99-arraydecl-4.c: New test.
> 
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> index e83e9c683f7..91b8d24ca78 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -4732,41 +4732,29 @@ c_parser_direct_declarator_inner (c_parser *parser, 
> bool id_present,
>   false, false, false, false, cla_prefer_id);
>if (!quals_attrs->declspecs_seen_p)
>   quals_attrs = NULL;

FWIW, I'd prefer to use

  const bool static_seen = c_parser_next_token_is_keyword (parser, RID_STATIC);

> -  /* If "static" is present, there must be an array dimension.
> -  Otherwise, there may be a dimension, "*", or no
> -  dimension.  */

Why remove the comment?

> -  if (static_seen)
> +
> +  star_seen = false;

bool star_seen = false; would be nicer IMHO.

> +  if (c_parser_next_token_is (parser, CPP_MULT)
> +   && c_parser_peek_2nd_token (parser)->type == CPP_CLOSE_SQUARE)
>   {
> -   star_seen = false;
> -   dimen = c_parser_expr_no_commas (parser, NULL);
> +   star_seen = true;
> +   c_parser_consume_token (parser);
>   }
> -  else
> +  else if (!c_parser_next_token_is (parser, CPP_CLOSE_SQUARE))
> + dimen = c_parser_expr_no_commas (parser, NULL);
> +
> +  if (static_seen && star_seen)
>   {
> -   if (c_parser_next_token_is (parser, CPP_CLOSE_SQUARE))
> - {
> -   dimen.value = NULL_TREE;
> -   star_seen = false;
> - }
> -   else if (c_parser_next_token_is (parser, CPP_MULT))
> - {
> -   if (c_parser_peek_2nd_token (parser)->type == CPP_CLOSE_SQUARE)
> - {
> -   dimen.value = NULL_TREE;
> -   star_seen = true;
> -   c_parser_consume_token (parser);
> - }
> -   else
> - {
> -   star_seen = false;
> -   dimen = c_parser_expr_no_commas (parser, NULL);
> - }
> - }
> -   else
> - {
> -   star_seen = false;
> -   dimen = c_parser_expr_no_commas (parser, NULL);
> - }
> +   error_at (c_parser_peek_token (parser)->location,
> + "% may not be used with an unspecified "
> + "variable length array size");

The last two lines are not indented enough.

> +   /* Prevent further errors. */

Two spaces after a '.'.

> +   star_seen = false;
>   }
> +  else if (static_seen && NULL_TREE == dimen.value)

Please let's use !dimen.value or dimen.value == NULL_TREE instead.

I think it'd be better to do:

  if (static_seen)
{
  if (star_seen)
// ...
  else if (!dimen.value)
// ...
}

> + error_at (c_parser_peek_token (parser)->location,
> +   "% may not be used without an array size");
> +
>if (c_parser_next_token_is (parser, CPP_CLOSE_SQUARE))
>   c_parser_consume_token (parser);
>else
> diff --git a/gcc/testsuite/gcc.dg/c99-arraydecl-4.c 
> b/gcc/testsuite/gcc.dg/c99-arraydecl-4.c
> new file mode 100644
> index 000..bfc26196433
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/c99-arraydecl-4.c
> @@ -0,0 +1,15 @@
> +/* { dg-do "compile" } */

It's unusual to quote compile here.

> +/* { dg-options "-std=c99 -pedantic-errors" } */
> +
> +void fo(char buf[static]);   /* { dg-error "'static' may not be used without 
> an array size" } */
> +void fo(char buf[static]) { }/* { dg-error "'static' may not be used 
> without an array size" } */
> +
> +void fu(char buf[static *]); /* { dg-error "'static' may not be used with an 
> unspecified variable length array size" } */
> +void fu(char buf[static *]) { }  /* { dg-error "'static' may not be used 
> with an unspecified variable length array size" } */
> +
> +void fe(int n, char buf[static n]);
> +void fe(int n, char buf[static *]) { }   /* { dg-error "'static' may not 
> be used with an unspecified variable length array size" } */

With -Wvla-parameter we get:
c99-arraydecl-4.c:11:21:

Re: [libstdc++] [testsuite] no libatomic for vxworks

2024-06-24 Thread Jonathan Wakely

On Mon, 24 Jun 2024 at 21:33, Alexandre Oliva  wrote:
>
>
> libatomic hasn't been ported to vxworks.  Most of the stdatomic.h and
>  underlying requirements are provided by builtins and libgcc,
> and the vxworks libc already provides remaining __atomic symbols, so
> porting libatomic doesn't seem to make sense.
>
> However, some of the target arch-only tests in
> add_options_for_libatomic cover vxworks targets, so we end up
> attempting to link libatomic in, even though it's not there.
> Preempt those too-broad tests.
>
> We've long been using a workaround very similar to this on ppc, and now
> that we've made sure there's nothing in libatomic that we'd need on any
> vxworks targets, we're ready to contribute this change.  Regstrapping on
> x86_64-linux-gnu, just to be sure.  Ok to install?

OK, thanks.


>
>
> Co-Authored-By: Marc Poulhiès 
>
> for  libstdc++-v3/ChangeLog
>
> * testsuite/lib/dg-options.exp (add_options_for_libatomic):
> None for *-*-vxworks*.
> ---
>  libstdc++-v3/testsuite/lib/dg-options.exp |5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/libstdc++-v3/testsuite/lib/dg-options.exp 
> b/libstdc++-v3/testsuite/lib/dg-options.exp
> index 84f9e3ebc730c..0d77fb029b09b 100644
> --- a/libstdc++-v3/testsuite/lib/dg-options.exp
> +++ b/libstdc++-v3/testsuite/lib/dg-options.exp
> @@ -338,6 +338,11 @@ proc atomic_link_flags { paths } {
>  }
>
>  proc add_options_for_libatomic { flags } {
> +# We don't (need to) build libatomic for vxworks.  Don't try to
> +# link it in, even on arches that support libatomic.
> +if { [istarget *-*-vxworks*] } {
> +   return $flags
> +}
>  if { [istarget hppa*-*-hpux*]
>  || ([istarget powerpc*-*-*] && [check_effective_target_ilp32])
>  || [istarget riscv*-*-*]
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
>

[RFC/PATCH] diagnostics: UX: add doc URLs for attributes

2024-06-24 Thread David Malcolm

In r14-5118-gc5db4d8ba5f3de I added a mechanism to automatically add
documentation URLs to quoted strings in diagnostics.
In r14-6920-g9e49746da303b8 I added a mechanism to generate URLs for
mentions of command-line options in quoted strings in diagnostics.

This patch does a similar thing for attributes.  It adds a new Python 3
script to scrape the generated HTML looking for documentation of
attributes, and uses this to (re)generate a new gcc/attr-urls.def file.

Running "make regenerate-attr-urls" after rebuilding the HTML docs will
regenerate gcc/attr-urls.def in the source directory.

The patch uses this to optionally add doc URLs for attributes in any
diagnostic emitted during the lifetime of a auto_urlify_attributes
instance, and adds such instances everywhere that a diagnostic refers
to a diagnostic within quotes (based on grepping the source tree
for references to attributes in strings and in code).

For example, given:

$ ./xgcc -B. -S ../../src/gcc/testsuite/gcc.dg/attr-access-2.c
../../src/gcc/testsuite/gcc.dg/attr-access-2.c:14:16: warning:
attribute ‘access(read_write, 2, 3)’ positional argument 2 conflicts
with previous designation by argument 1 [-Wattributes]

with this patch the quoted text `access(read_write, 2, 3)'
automatically gains the URL for our docs for "access":
  
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-access-function-attribute
in a sufficiently modern terminal.

Two points I'm not sure about:

- like r14-6920-g9e49746da303b8 this avoids the Makefile target
depending on the generated HTML, since a missing URL is a minor
problem, whereas requiring all users to build HTML docs seems more
involved.  Doing so also avoids Python 3 as a build requirement for
everyone, but instead just for developers addding attributes.
Like the options, we could add a CI test for this.  Is this the right
approach?

- the patch currently gathers target-specific attributes, but doesn't
select the appropriate one; it just picks the first one it sees.  For
example, the function attribute "interrupt" has 19 URLs within our docs:
one common, and 18 target-specific ones.  I'm not sure what the best
approach here is; perhaps some kind of new target hook for identifying
target-specific documentation?

Thoughts?

gcc/ChangeLog:
* Makefile.in (ATTR_URLS_HTML_DEPS): New.
(regenerate-attr-urls): New.
(regenerate-attr-urls-unit-test): New.
* attr-urls.def: New file.
* attribs.cc: Include "gcc-urlifier.h".
(decl_attributes): Use auto_urlify_attributes.
* diagnostic.cc (diagnostic_context::override_urlifier): New.
* diagnostic.h (diagnostic_context::override_urlifier): New decl.
(diagnostic_context::get_urlifier): New accessor.
* gcc-urlifier.cc: Include "diagnostic.h" and "attr-urls.def".
(gcc_urlifier::make_doc): Convert to...
(make_doc_url): ...this.
(auto_override_urlifier::auto_override_urlifier): New.
(auto_override_urlifier::~auto_override_urlifier): New.
(struct attr_url_entry): New.
(find_attr_url_entry): New.
(attribute_urlifier::get_url_for_quoted_text): New.
(attribute_urlifier::get_url_suffix_for_quoted_text): New.
(selftest::gcc_urlifier_cc_tests): Split out body into...
(selftest::test_gcc_urlifier): ...this, and also call...
(selftest::test_attribute_urlifier): ...this new function.
* gcc-urlifier.h: Include "pretty-print-urlifier.h" and "label-text.h".
(class auto_override_urlifier): New.
(class attribute_urlifier): New.
(class auto_urlify_attributes): New.
* gimple-ssa-warn-access.cc: Include "gcc-urlifier.h".
(pass_waccess::execute): Use auto_urlify_attributes.
* gimplify.cc: Include "gcc-urlifier.h".
(expand_FALLTHROUGH): Use auto_urlify_attributes.
* internal-fn.cc: Include "gcc-urlifier.h.
(expand_FALLTHROUGH): Use auto_urlify_attributes.
* ipa-pure-const.cc: Include "gcc-urlifier.h.
(suggest_attribute): Use auto_urlify_attributes.
* ipa-strub.cc: Include "gcc-urlifier.h.
(can_strub_p): Use auto_urlify_attributes.
* regenerate-attr-urls.py: New file.
* tree-cfg.cc: Include "gcc-urlifier.h.
(do_warn_unused_result): Use auto_urlify_attributes.
* tree-ssa-uninit.cc: Include "gcc-urlifier.h.
(maybe_warn_read_write_only): Use auto_urlify_attributes.
(maybe_warn_pass_by_reference): Likewise.

gcc/analyzer/ChangeLog:
* region-model.cc: Include "gcc-urlifier.h".
(reason_attr_access::emit): Use auto_urlify_attributes.
* sm-taint.cc: Include "gcc-urlifier.h".
(tainted_access_attrib_size::emit): Use auto_urlify_attributes.

gcc/c-family/ChangeLog:
* c-attribs.cc: Include "gcc-urlifier.h".
(positional_argument): Use auto_urlify_attributes.
* c-common.cc: Include "gcc-urlifier.h".
(parse_opti

Re: [PATCH 10/52] jit: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-24 Thread David Malcolm

On Fri, 2024-06-14 at 10:16 +0800, Kewen.Lin wrote:
> Hi David,
> 
> on 2024/6/13 21:44, David Malcolm wrote:
> > On Sun, 2024-06-02 at 22:01 -0500, Kewen Lin wrote:
> > > Joseph pointed out "floating types should have their mode,
> > > not a poorly defined precision value" in the discussion[1],
> > > as he and Richi suggested, the existing macros
> > > {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
> > > hook mode_for_floating_type.  Unlike the other FEs, for the
> > > uses in recording::memento_of_get_type::get_size, since
> > > {float,{,long_}double}_type_node haven't been initialized
> > > yet, this is to replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
> > > with calling hook targetm.c.mode_for_floating_type.
> > > 
> > > [1]
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
> > > 
> > > gcc/jit/ChangeLog:
> > > 
> > > * jit-recording.cc
> > > (recording::memento_of_get_type::get_size): Update
> > > macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
> > > targetm.c.mode_for_floating_type with
> > > TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE.
> > > ---
> > >  gcc/jit/jit-recording.cc | 12 
> > >  1 file changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
> > > index 68a2e860c1f..7719b898e57 100644
> > > --- a/gcc/jit/jit-recording.cc
> > > +++ b/gcc/jit/jit-recording.cc
> > > @@ -21,7 +21,7 @@ along with GCC; see the file COPYING3.  If not
> > > see
> > >  #include "config.h"
> > >  #include "system.h"
> > >  #include "coretypes.h"
> > > -#include "tm.h"
> > > +#include "target.h"
> > >  #include "pretty-print.h"
> > >  #include "toplev.h"
> > >  
> > > @@ -2353,6 +2353,7 @@ size_t
> > >  recording::memento_of_get_type::get_size ()
> > >  {
> > >    int size;
> > > +  machine_mode m;
> > >    switch (m_kind)
> > >  {
> > >  case GCC_JIT_TYPE_VOID:
> > > @@ -2399,13 +2400,16 @@ recording::memento_of_get_type::get_size
> > > ()
> > >    size = 128;
> > >    break;
> > >  case GCC_JIT_TYPE_FLOAT:
> > > -  size = FLOAT_TYPE_SIZE;
> > > +  m = targetm.c.mode_for_floating_type (TI_FLOAT_TYPE);
> > > +  size = GET_MODE_PRECISION (m).to_constant ();
> > >    break;
> > >  case GCC_JIT_TYPE_DOUBLE:
> > > -  size = DOUBLE_TYPE_SIZE;
> > > +  m = targetm.c.mode_for_floating_type (TI_DOUBLE_TYPE);
> > > +  size = GET_MODE_PRECISION (m).to_constant ();
> > >    break;
> > >  case GCC_JIT_TYPE_LONG_DOUBLE:
> > > -  size = LONG_DOUBLE_TYPE_SIZE;
> > > +  m = targetm.c.mode_for_floating_type
> > > (TI_LONG_DOUBLE_TYPE);
> > > +  size = GET_MODE_PRECISION (m).to_constant ();
> > >    break;
> > >  case GCC_JIT_TYPE_SIZE_T:
> > >    size = MAX_BITS_PER_WORD;
> > 
> > [CCing jit mailing list]
> > 
> > Thanks for the patch; sorry for the delay in responding.
> > 
> > Did your testing include jit?  Note that --enable-languages=all
> > does
> > *not* include it (due to it needing --enable-host-shared).
> 
> Thanks for the hints!  Yes, as noted in the cover letter, I did test
> jit.
> Initially I used TYPE_PRECISION ({float,{long_,}double_type_node) to
> replace these just like what I proposed for the other FE changes, but
> the
> testing showed some failures on test-combination.c etc., by looking
> into
> them, I realized that this call
> recording::memento_of_get_type::get_size
> can happen before when we set up those type nodes.  Then I had to use
> the
> current approach with the new hook, it made all failures gone (no
> regressions).  btw, test result comparison showed some more lines
> with
> "NA->PASS: test-threads.c.exe", since it's positive, I didn't look
> into
> it.
> 
> > 
> > The jit::recording code runs *very* early - before toplev::main. 
> > For
> > example, a call to gcc_jit_type_get_size can trigger the above code
> > path before toplev::main has run.
> > 
> > target.h says each target should have a:
> > 
> >   struct gcc_target targetm = TARGET_INITIALIZER;
> > 
> > Has targetm.c.mode_for_floating_type been initialized enough by
> > that
> > static initialization?  
> 
> It depends on how to define "enough".  The hook has been initialized
> as you pointed out, I just debugged it and confirmed target specific
> hook was called as expected (rs6000_c_mode_for_floating_type on
> Power)
> when this jit::recording function gets called.  If "enough" refers to
> something like command line options, it's not ready.
> 
> > Could the mode_for_floating_type hook be
> > relying on some target-specific dynamic initialization that hasn't
> > run
> > yet?  (e.g. taking account of command-line options?)
> > 
> 
> Yes, it could.  Like rs6000 port, the hook checks
> rs6000_long_double_type_size
> for long double (it's related to command line option -mlong-double-x)
> and
> some other targets like i386, also would like to check
> TARGET_LONG_DOUBLE_64
> and TARGET_LONG_DOUBLE_128.  But I think it isn't worse than befo

Re: [pushed 2/2] testsuite: check that generated .sarif files validate against the SARIF schema [PR109360]

2024-06-24 Thread David Malcolm

On Sat, 2024-06-22 at 11:26 +0300, Dimitar Dimitrov wrote:
> On Fri, Jun 21, 2024 at 08:55:36AM -0400, David Malcolm wrote:
> > This patch extends the dg directive verify-sarif-file so that if
> > the "jsonschema" tool is available, it will be used to validate the
> > generated .sarif file.
> > 
> > Tested with jsonschema 3.2 with Python 3.8
> 
> Hi David,
> 
> The new testcase fails on my Fedora 40 with jsonschema 4.19.1 and
> Python
> 3.12.3:
> 
> ```
> Executing on host: jsonschema --version    (timeout = 300)
> spawn -ignore SIGHUP jsonschema --version
> /usr/bin/jsonschema:5: DeprecationWarning: The jsonschema CLI is
> deprecated and will be removed in a future version. Please use check-
> jsonschema instead, which can be installed from
> https://pypi.org/project/check-jsonschema/
>   from jsonschema.cli import main
> 4.19.1
> FAIL: c-c++-common/analyzer/malloc-sarif-1.c (test .sarif output
> against SARIF schema)
> ```

Sorry about that.

> 
> The deprecation warning output seems to confuse DejaGnu and it fails
> the test.

Thanks for the heads-up; I'll upgrade my machine to a less ancient
version and update the patch accordingly.

Dave

[to-be-committed][V3][RISC-V] cmpmem for RISCV with V extension

2024-06-24 Thread Jeff Law


So this is the cmpmem patch from Sergei, updated for the trunk.

Updates included adjusting the existing cmpmemsi expander to 
conditionally try expansion via vector.  And a minor testsuite 
adjustment to turn off vector expansion in one test that is primarily 
focused on vset optimization and ensuring we don't have extras.


I've spun this in my tester successfully and just want to see a clean 
run through precommit CI before moving forward.


Jeff

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_vector::expand_vec_cmpmem): New
function declaration.
* config/riscv/riscv-string.cc (riscv_vector::expand_vec_cmpmem): New
function.
* config/riscv/riscv.md (cmpmemsi): Try riscv_vector::expand_vec_cmpmem
for constant lengths.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/cmpmem-1.c: New codegen tests
* gcc.target/riscv/rvv/base/cmpmem-2.c: New execution tests
* gcc.target/riscv/rvv/base/cmpmem-3.c: New codegen tests
* gcc.target/riscv/rvv/base/cmpmem-4.c: New codegen tests
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Turn off vector mem* 
and
str* handling.


diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index a3380d4250d..a8b76173fa0 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -679,6 +679,7 @@ void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = 
false);
 bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
 void emit_vec_extract (rtx, rtx, rtx);
 bool expand_vec_setmem (rtx, rtx, rtx);
+bool expand_vec_cmpmem (rtx, rtx, rtx, rtx);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 1ddebdcee3f..257a514d290 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1605,4 +1605,104 @@ expand_vec_setmem (rtx dst_in, rtx length_in, rtx 
fill_value_in)
   return true;
 }
 
+/* Used by cmpmemsi in riscv.md.  */
+
+bool
+expand_vec_cmpmem (rtx result_out, rtx blk_a_in, rtx blk_b_in, rtx length_in)
+{
+  HOST_WIDE_INT lmul;
+  /* Check we are able and allowed to vectorise this operation;
+ bail if not.  */
+  if (!check_vectorise_memory_operation (length_in, lmul))
+return false;
+
+  /* Strategy:
+ load entire blocks at a and b into vector regs
+ generate mask of bytes that differ
+ find first set bit in mask
+ find offset of first set bit in mask, use 0 if none set
+ result is ((char*)a[offset] - (char*)b[offset])
+   */
+
+  machine_mode vmode
+  = riscv_vector::get_vector_mode (QImode, BYTES_PER_RISCV_VECTOR * lmul)
+ .require ();
+  rtx blk_a_addr = copy_addr_to_reg (XEXP (blk_a_in, 0));
+  rtx blk_a = change_address (blk_a_in, vmode, blk_a_addr);
+  rtx blk_b_addr = copy_addr_to_reg (XEXP (blk_b_in, 0));
+  rtx blk_b = change_address (blk_b_in, vmode, blk_b_addr);
+
+  rtx vec_a = gen_reg_rtx (vmode);
+  rtx vec_b = gen_reg_rtx (vmode);
+
+  machine_mode mask_mode = get_mask_mode (vmode);
+  rtx mask = gen_reg_rtx (mask_mode);
+  rtx mismatch_ofs = gen_reg_rtx (Pmode);
+
+  rtx ne = gen_rtx_NE (mask_mode, vec_a, vec_b);
+  rtx vmsops[] = { mask, ne, vec_a, vec_b };
+  rtx vfops[] = { mismatch_ofs, mask };
+
+  /* If the length is exactly vlmax for the selected mode, do that.
+ Otherwise, use a predicated store.  */
+
+  if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in)))
+{
+  emit_move_insn (vec_a, blk_a);
+  emit_move_insn (vec_b, blk_b);
+  emit_vlmax_insn (code_for_pred_cmp (vmode), riscv_vector::COMPARE_OP,
+  vmsops);
+
+  emit_vlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
+  riscv_vector::CPOP_OP, vfops);
+}
+  else
+{
+  if (!satisfies_constraint_K (length_in))
+ length_in = force_reg (Pmode, length_in);
+
+  rtx memmask = CONSTM1_RTX (mask_mode);
+
+  rtx m_ops_a[] = { vec_a, memmask, blk_a };
+  rtx m_ops_b[] = { vec_b, memmask, blk_b };
+
+  emit_nonvlmax_insn (code_for_pred_mov (vmode),
+ riscv_vector::UNARY_OP_TAMA, m_ops_a, length_in);
+  emit_nonvlmax_insn (code_for_pred_mov (vmode),
+ riscv_vector::UNARY_OP_TAMA, m_ops_b, length_in);
+
+  emit_nonvlmax_insn (code_for_pred_cmp (vmode), riscv_vector::COMPARE_OP,
+ vmsops, length_in);
+
+  emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
+ riscv_vector::CPOP_OP, vfops, length_in);
+}
+
+  /* Mismatch_ofs is -1 if blocks match, or the offset of
+ the first mismatch otherwise.  */
+  rtx ltz = gen_reg_rtx (Xmode);
+  emit_insn (gen_slt_3 (LT, Xmode, Xmode, ltz, mismatch_ofs, const0_rtx));
+  /* mismatch_ofs += (mismatch_ofs < 0) ? 1 : 0.  */
+  emit_insn (
+  gen_rtx_SET (mismatch_ofs, gen_rtx_PLUS (Pmode, mi

[PATCH] c++: decltype of by-ref capture proxy of ref [PR115504]

2024-06-24 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14?

-- >8 --

The capture proxy handling in finish_decltype_type added in r14-5330
was stripping the reference type of a capture proxy's captured variable,
which is desirable for a by-value capture, but not for a by-ref capture
(of a reference).

PR c++/115504

gcc/cp/ChangeLog:

* semantics.cc (finish_decltype_type): For a by-reference
capture proxy, don't strip the reference type (if any) of
the captured variable.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto8.C: New test.
---
 gcc/cp/semantics.cc |  4 +++-
 gcc/testsuite/g++.dg/cpp1y/decltype-auto8.C | 11 +++
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto8.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 08f5f245e7d..b4f626924af 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12076,9 +12076,11 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
{
  if (is_normal_capture_proxy (expr))
{
+ bool by_ref = TYPE_REF_P (TREE_TYPE (expr));
  expr = DECL_CAPTURED_VARIABLE (expr);
  type = TREE_TYPE (expr);
- type = non_reference (type);
+ if (!by_ref)
+   type = non_reference (type);
}
  else
{
diff --git a/gcc/testsuite/g++.dg/cpp1y/decltype-auto8.C 
b/gcc/testsuite/g++.dg/cpp1y/decltype-auto8.C
new file mode 100644
index 000..9a5e435f14f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/decltype-auto8.C
@@ -0,0 +1,11 @@
+// PR c++/115504
+// { dg-do compile { target c++14 } }
+
+void f(int& x) {
+  [&x]() {
+decltype(auto) a = x;
+using type = decltype(x);
+using type = decltype(a);
+using type = int&; // not 'int'
+  };
+}
-- 
2.45.2.606.g9005149a4a

[PATCH] c++: using non-dep array variable of unknown bound [PR115358]

2024-06-24 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?  This fixes PR115358 whose testcase used a constexpr static
array variable, but it seems the general issue is not specific to
constexpr as illustrated by the below testcase.  Note that Clang
currently rejects the testcase for a similar reason to GCC...

-- >8 --

For a non-dependent array variable of unknown bound, it seems we need to
try instantiating its definition upon use in a template context for sake
of proper checking and typing of its expression context.  This seems
analogous to deducing the return type of a function which is similarly
done upon first use even in a template context.

PR c++/115358

gcc/cp/ChangeLog:

* decl2.cc (mark_used): Call maybe_instantiate_decl for an array
variable with unknown bound.
* semantics.cc (finish_decltype_type): Remove now redundant
handling of array variables with unknown bound.
* typeck.cc (cxx_sizeof_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/template/array37.C: New test.
---
 gcc/cp/decl2.cc |  2 ++
 gcc/cp/semantics.cc |  7 ---
 gcc/cp/typeck.cc|  7 ---
 gcc/testsuite/g++.dg/template/array37.C | 16 
 4 files changed, 18 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/array37.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 6c3ef60d51f..cdd2b8aada2 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -6001,6 +6001,8 @@ mark_used (tree decl, tsubst_flags_t complain /* = 
tf_warning_or_error */)
  find out its type.  For OpenMP user defined reductions, we need them
  instantiated for reduction clauses which inline them by hand directly.  */
   if (undeduced_auto_decl (decl)
+  || (VAR_P (decl)
+ && VAR_HAD_UNKNOWN_BOUND (decl))
   || (TREE_CODE (decl) == FUNCTION_DECL
  && DECL_OMP_DECLARE_REDUCTION_P (decl)))
 maybe_instantiate_decl (decl);
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index b4f626924af..3247521e03e 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12002,13 +12002,6 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
   return error_mark_node;
 }
 
-  /* To get the size of a static data member declared as an array of
- unknown bound, we need to instantiate it.  */
-  if (VAR_P (expr)
-  && VAR_HAD_UNKNOWN_BOUND (expr)
-  && DECL_TEMPLATE_INSTANTIATION (expr))
-instantiate_decl (expr, /*defer_ok*/true, /*expl_inst_mem*/false);
-
   if (id_expression_or_member_access_p)
 {
   /* If e is an id-expression or a class member access (5.2.5
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 5970ac3d398..9297948cfa5 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -2127,13 +2127,6 @@ cxx_sizeof_expr (location_t loc, tree e, tsubst_flags_t 
complain)
   location_t e_loc = cp_expr_loc_or_loc (e, loc);
   STRIP_ANY_LOCATION_WRAPPER (e);
 
-  /* To get the size of a static data member declared as an array of
- unknown bound, we need to instantiate it.  */
-  if (VAR_P (e)
-  && VAR_HAD_UNKNOWN_BOUND (e)
-  && DECL_TEMPLATE_INSTANTIATION (e))
-instantiate_decl (e, /*defer_ok*/true, /*expl_inst_mem*/false);
-
   if (TREE_CODE (e) == PARM_DECL
   && DECL_ARRAY_PARAMETER_P (e)
   && (complain & tf_warning))
diff --git a/gcc/testsuite/g++.dg/template/array37.C 
b/gcc/testsuite/g++.dg/template/array37.C
new file mode 100644
index 000..ed03b955f05
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/array37.C
@@ -0,0 +1,16 @@
+// PR c++/115358
+// { dg-do compile { target c++14 } }
+
+template
+struct A { static int STR[]; };
+
+template
+int A::STR[] = {1,2,3};
+
+void g(int(&)[3]);
+
+int main() {
+  [](auto) {
+g(A::STR); // { dg-bogus "int []" }
+  };
+}
-- 
2.45.2.606.g9005149a4a

Re: [PATCH 4/13 ver4] rs6000, extend the current vec_{un,}signed{e,o}, built-ins

2024-06-24 Thread Carl Love




On 6/18/24 20:03, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/6/14 03:40, Carl Love wrote:
>>
>> GCC maintainers:
>>
>> As noted the removal of __builtin_vsx_xvcvdpuxds_uns and 
>> __builtin_vsx_xvcvspuxws was moved to patch 2 in the seris.  The patch has 
>> been updated per the comments from version 3.
>>
>> Please let me know if this patch is acceptable for mainline.  
>>
>>  Carl 
>>
>> --
>>
>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>> convert a vector of floats to signed/unsigned long long ints.  Extend the
> 
> Nit: s/signed/a vector of signed/

Fixed.

> 
>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>> vector of floats to return the even/odd signed/unsigned integers.
>>
> 
> Likewise.

Fixed.

> 
>> The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
>> vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
>> built-ins.
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
>> now for internal use only. They are not documented and they do not
>> have testcases.
>>
> 
> 
>> The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
>> vec_signed{e,o}, remove.
>>
>> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
>> vec_unsigned{e,o}, remove.
> 
> As the comments in 2/13 v4 and the previous review comments, I preferred
> these two are moved to 2/13 as well (this patch should focus on extending).
> 

Moved to patch 2.

>>
>> Add testcases and update documentation.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def: __builtin_vsx_xvcvdpsxws,
>>  __builtin_vsx_xvcvdpuxws): Removed.
>>  (__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Renamed
> 
> Nit: s/Renamed/Rename to/

OK, fixed.

> 
>>  __builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively.
>>  (XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
>>  VEC_VUNSIGNEDE_V4SF respectively.
> 
> Likewise.

OK, fixed. 

> 
>>  (__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New
>>  built-in definitions.
>>  * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
>>  vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
> 
> Formatting nits: "..,.." -> ".., ..", "  " -> " "

OK, I fixed the various spacing issues.
> 
>>  * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
>>  vunsignede_v4sf, vunsignedo_v4sf): New  define_expands.
> 
> Likewise.

dito

> 
>>  * doc/extend.texi (vec_signedo, vec_signede): Add documentation
>>  for new overloaded built-ins.
> 
> Missing vec_unsignedo and vec_unsignede, may be also mention for which
> types, like "converting vector float to vector {un,}signed long long".
> 

OK, fixed.

>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/builtins-3-runnable.c
>>  (test_unsigned_int_result, test_ll_unsigned_int_result): Add
>>  new argument.
>>  (vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New
>>  tests for the overloaded built-ins.
>> ---  gcc/config/rs6000/rs6000-builtins.def | 20 ++---
>>  gcc/config/rs6000/rs6000-overload.def |  8 ++
>>  gcc/config/rs6000/vsx.md  | 84 +++
>>  gcc/doc/extend.texi   | 10 +++
>>  .../gcc.target/powerpc/builtins-3-runnable.c  | 49 +--
>>  5 files changed, 154 insertions(+), 17 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index 322d27b7a0d..29a9deb3410 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1688,26 +1688,26 @@
>>const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
>>  XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
>>  
>> -  const vsi __builtin_vsx_xvcvdpsxws (vd);
>> -XVCVDPSXWS vsx_xvcvdpsxws {}
>> -
>>const vsll __builtin_vsx_xvcvdpuxds (vd);
>>  XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
>>  
>>const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
>>  XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
>>  
>> -  const vsi __builtin_vsx_xvcvdpuxws (vd);
>> -XVCVDPUXWS vsx_xvcvdpuxws {}
>> -
>>const vd __builtin_vsx_xvcvspdp (vf);
>>  XVCVSPDP vsx_xvcvspdp {}
>>  
>> -  const vsll __builtin_vsx_xvcvspsxds (vf);
>> -XVCVSPSXDS vsx_xvcvspsxds {}
>> +  const vsll __builtin_vsignede_v4sf (vf);
>> +VEC_VSIGNEDE_V4SF vsignede_v4sf {}
>> +
>> +  const vsll __builtin_vsignedo_v4sf (vf);
>> +VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
>> +
>> +  const vull __builtin_vunsignede_v4sf (vf);
>> +VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
>>  
>> -  const vsll __builtin_vsx_xvcvspuxds (vf);
>> -XVCVSPUXDS vsx_xvcvspuxds {}
>> +  const vull __builtin_vunsignedo_v4sf (vf);
>> +VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {}
>>

Re: [PATCH] c++: using non-dep array variable of unknown bound [PR115358]

2024-06-24 Thread Jason Merrill


On 6/24/24 21:00, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?  This fixes PR115358 whose testcase used a constexpr static
array variable, but it seems the general issue is not specific to
constexpr as illustrated by the below testcase.  Note that Clang
currently rejects the testcase for a similar reason to GCC...


OK.


-- >8 --

For a non-dependent array variable of unknown bound, it seems we need to
try instantiating its definition upon use in a template context for sake
of proper checking and typing of its expression context.  This seems
analogous to deducing the return type of a function which is similarly
done upon first use even in a template context.

PR c++/115358

gcc/cp/ChangeLog:

* decl2.cc (mark_used): Call maybe_instantiate_decl for an array
variable with unknown bound.
* semantics.cc (finish_decltype_type): Remove now redundant
handling of array variables with unknown bound.
* typeck.cc (cxx_sizeof_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/template/array37.C: New test.
---
  gcc/cp/decl2.cc |  2 ++
  gcc/cp/semantics.cc |  7 ---
  gcc/cp/typeck.cc|  7 ---
  gcc/testsuite/g++.dg/template/array37.C | 16 
  4 files changed, 18 insertions(+), 14 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/array37.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 6c3ef60d51f..cdd2b8aada2 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -6001,6 +6001,8 @@ mark_used (tree decl, tsubst_flags_t complain /* = 
tf_warning_or_error */)
   find out its type.  For OpenMP user defined reductions, we need them
   instantiated for reduction clauses which inline them by hand directly.  
*/
if (undeduced_auto_decl (decl)
+  || (VAR_P (decl)
+ && VAR_HAD_UNKNOWN_BOUND (decl))
|| (TREE_CODE (decl) == FUNCTION_DECL
  && DECL_OMP_DECLARE_REDUCTION_P (decl)))
  maybe_instantiate_decl (decl);
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index b4f626924af..3247521e03e 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12002,13 +12002,6 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
return error_mark_node;
  }
  
-  /* To get the size of a static data member declared as an array of

- unknown bound, we need to instantiate it.  */
-  if (VAR_P (expr)
-  && VAR_HAD_UNKNOWN_BOUND (expr)
-  && DECL_TEMPLATE_INSTANTIATION (expr))
-instantiate_decl (expr, /*defer_ok*/true, /*expl_inst_mem*/false);
-
if (id_expression_or_member_access_p)
  {
/* If e is an id-expression or a class member access (5.2.5
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 5970ac3d398..9297948cfa5 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -2127,13 +2127,6 @@ cxx_sizeof_expr (location_t loc, tree e, tsubst_flags_t 
complain)
location_t e_loc = cp_expr_loc_or_loc (e, loc);
STRIP_ANY_LOCATION_WRAPPER (e);
  
-  /* To get the size of a static data member declared as an array of

- unknown bound, we need to instantiate it.  */
-  if (VAR_P (e)
-  && VAR_HAD_UNKNOWN_BOUND (e)
-  && DECL_TEMPLATE_INSTANTIATION (e))
-instantiate_decl (e, /*defer_ok*/true, /*expl_inst_mem*/false);
-
if (TREE_CODE (e) == PARM_DECL
&& DECL_ARRAY_PARAMETER_P (e)
&& (complain & tf_warning))
diff --git a/gcc/testsuite/g++.dg/template/array37.C 
b/gcc/testsuite/g++.dg/template/array37.C
new file mode 100644
index 000..ed03b955f05
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/array37.C
@@ -0,0 +1,16 @@
+// PR c++/115358
+// { dg-do compile { target c++14 } }
+
+template
+struct A { static int STR[]; };
+
+template
+int A::STR[] = {1,2,3};
+
+void g(int(&)[3]);
+
+int main() {
+  [](auto) {
+g(A::STR); // { dg-bogus "int []" }
+  };
+}

Re: [PATCH] c++: decltype of by-ref capture proxy of ref [PR115504]

2024-06-24 Thread Jason Merrill


On 6/24/24 21:00, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14?

-- >8 --

The capture proxy handling in finish_decltype_type added in r14-5330
was stripping the reference type of a capture proxy's captured variable,
which is desirable for a by-value capture, but not for a by-ref capture
(of a reference).


I'm not sure why we would want it for by-value, either; regardless of 
the capture kind, decltype(x) is int&.



PR c++/115504

gcc/cp/ChangeLog:

* semantics.cc (finish_decltype_type): For a by-reference
capture proxy, don't strip the reference type (if any) of
the captured variable.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto8.C: New test.
---
  gcc/cp/semantics.cc |  4 +++-
  gcc/testsuite/g++.dg/cpp1y/decltype-auto8.C | 11 +++
  2 files changed, 14 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto8.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 08f5f245e7d..b4f626924af 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12076,9 +12076,11 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
{
  if (is_normal_capture_proxy (expr))
{
+ bool by_ref = TYPE_REF_P (TREE_TYPE (expr));
  expr = DECL_CAPTURED_VARIABLE (expr);
  type = TREE_TYPE (expr);
- type = non_reference (type);
+ if (!by_ref)
+   type = non_reference (type);
}
  else
{
diff --git a/gcc/testsuite/g++.dg/cpp1y/decltype-auto8.C 
b/gcc/testsuite/g++.dg/cpp1y/decltype-auto8.C
new file mode 100644
index 000..9a5e435f14f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/decltype-auto8.C
@@ -0,0 +1,11 @@
+// PR c++/115504
+// { dg-do compile { target c++14 } }
+
+void f(int& x) {
+  [&x]() {
+decltype(auto) a = x;
+using type = decltype(x);
+using type = decltype(a);
+using type = int&; // not 'int'
+  };
+}

RE: [PATCH 1/3 v3] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-06-24 Thread Hu, Lin1

> -Original Message-
> From: Tamar Christina 
> Sent: Monday, June 24, 2024 10:12 PM
> To: Richard Biener ; Hu, Lin1 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: RE: [PATCH 1/3 v3] vect: generate suitable convert insn for int -> 
> int,
> float -> float and int <-> float.
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, June 24, 2024 1:34 PM
> > To: Hu, Lin1 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > ubiz...@gmail.com
> > Subject: RE: [PATCH 1/3 v3] vect: generate suitable convert insn for
> > int -> int, float
> > -> float and int <-> float.
> >
> > On Thu, 20 Jun 2024, Hu, Lin1 wrote:
> >
> > > > >else if (ret_elt_bits > arg_elt_bits)
> > > > >  modifier = WIDEN;
> > > > >
> > > > > +  if (supportable_convert_operation (code, ret_type, arg_type, 
> > > > > &code1))
> > > > > +{
> > > > > +  g = gimple_build_assign (lhs, code1, arg);
> > > > > +  gsi_replace (gsi, g, false);
> > > > > +  return;
> > > > > +}
> > > >
> > > > Given the API change I suggest below it might make sense to have
> > > > supportable_indirect_convert_operation do the above and represent
> > > > it as
> > single-
> > > > step conversion?
> > > >
> > >
> > > OK, if you want to supportable_indirect_convert_operation can do
> > > something like supportable_convert_operation, I'll give it a try.
> > > This functionality is really the part that this function can cover.
> > > But this would require some changes not only the API change, because
> > > supportable_indirect_convert_operation originally only supported
> > > Float
> > > -> Int or Int ->Float.
> >
> > I think I'd like to see a single API to handle direct and
> > (multi-)indirect-level converts that operate on vectors with all the
> > same number of lanes.
> >
> > > >
> > > > > +  code_helper code2 = ERROR_MARK, code3 = ERROR_MARK;
> > > > > +  int multi_step_cvt = 0;
> > > > > +  vec interm_types = vNULL;
> > > > > +  if (supportable_indirect_convert_operation (NULL,
> > > > > +   code,
> > > > > +   ret_type, arg_type,
> > > > > +   &code2, &code3,
> > > > > +   &multi_step_cvt,
> > > > > +   &interm_types, arg))
> > > > > +{
> > > > > +  new_rhs = make_ssa_name (interm_types[0]);
> > > > > +  g = gimple_build_assign (new_rhs, (tree_code) code3, arg);
> > > > > +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > > > > +  g = gimple_build_assign (lhs, (tree_code) code2, new_rhs);
> > > > > +  gsi_replace (gsi, g, false);
> > > > > +  return;
> > > > > +}
> > > > > +
> > > > >if (modifier == NONE && (code == FIX_TRUNC_EXPR || code ==
> > > > FLOAT_EXPR))
> > > > >  {
> > > > > -  if (supportable_convert_operation (code, ret_type, arg_type,
> &code1))
> > > > > - {
> > > > > -   g = gimple_build_assign (lhs, code1, arg);
> > > > > -   gsi_replace (gsi, g, false);
> > > > > -   return;
> > > > > - }
> > > > >/* Can't use get_compute_type here, as
> supportable_convert_operation
> > > > >doesn't necessarily use an optab and needs two arguments.  */
> > > > >tree vec_compute_type
> > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > index 05a169ecb2d..0aa608202ca 100644
> > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > @@ -5175,7 +5175,7 @@ vectorizable_conversion (vec_info *vinfo,
> > > > >tree scalar_dest;
> > > > >tree op0, op1 = NULL_TREE;
> > > > >loop_vec_info loop_vinfo = dyn_cast  (vinfo);
> > > > > -  tree_code tc1, tc2;
> > > > > +  tree_code tc1;
> > > > >code_helper code, code1, code2;
> > > > >code_helper codecvt1 = ERROR_MARK, codecvt2 = ERROR_MARK;
> > > > >tree new_temp;
> > > > > @@ -5384,92 +5384,17 @@ vectorizable_conversion (vec_info *vinfo,
> > > > >   break;
> > > > >}
> > > > >
> > > > > -  /* For conversions between float and integer types try whether
> > > > > -  we can use intermediate signed integer types to support the
> > > > > -  conversion.  */
> > > > > -  if (GET_MODE_SIZE (lhs_mode) != GET_MODE_SIZE (rhs_mode)
> > > > > -   && (code == FLOAT_EXPR ||
> > > > > -   (code == FIX_TRUNC_EXPR && !flag_trapping_math)))
> > > > > - {
> > > > > -   bool demotion = GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE
> > > > (lhs_mode);
> > > > > -   bool float_expr_p = code == FLOAT_EXPR;
> > > > > -   unsigned short target_size;
> > > > > -   scalar_mode intermediate_mode;
> > > > > -   if (demotion)
> > > > > - {
> > > > > -   intermediate_mode = lhs_mode;
> > > > > -   target_size = GET_MODE_SIZE (rhs_mode);
> > > > > - }
> > > > > -   else
> > > > > - {
> > > > > -   target_size

Re: [PATCH] c++: ICE with __dynamic_cast redecl [PR115501]

2024-06-24 Thread Jason Merrill


On 6/18/24 10:58, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
Since r13-3299, build_dynamic_cast_1 calls pushdecl which calls
duplicate_decls and that in this testcase emits the "conflicting
declaration" error and returns error_mark_node, so the subsequent
build_cxx_call crashes on the error_mark_node.

PR c++/115501

gcc/cp/ChangeLog:

* rtti.cc (build_dynamic_cast_1): Return if dcast_fn is erroneous.

gcc/testsuite/ChangeLog:

* g++.dg/rtti/dyncast8.C: New test.
---
  gcc/cp/rtti.cc   |  2 ++
  gcc/testsuite/g++.dg/rtti/dyncast8.C | 15 +++
  2 files changed, 17 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/rtti/dyncast8.C

diff --git a/gcc/cp/rtti.cc b/gcc/cp/rtti.cc
index ed69606f4dd..cc006ea927f 100644
--- a/gcc/cp/rtti.cc
+++ b/gcc/cp/rtti.cc
@@ -794,6 +794,8 @@ build_dynamic_cast_1 (location_t loc, tree type, tree expr,
  pop_abi_namespace (flags);
  dynamic_cast_node = dcast_fn;
}
+ if (dcast_fn == error_mark_node)
+   return error_mark_node;
  result = build_cxx_call (dcast_fn, 4, elems, complain);
  SET_EXPR_LOCATION (result, loc);
  
diff --git a/gcc/testsuite/g++.dg/rtti/dyncast8.C b/gcc/testsuite/g++.dg/rtti/dyncast8.C

new file mode 100644
index 000..de23433dd9b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/rtti/dyncast8.C
@@ -0,0 +1,15 @@
+// PR c++/115501
+// { dg-do compile }
+
+struct s{virtual void f();};
+struct s1 : s{};
+namespace __cxxabiv1
+{
+  extern "C" void __dynamic_cast(); // { dg-message "previous declaration" }
+}
+void diagnostic_information_impl(s const *se)
+{
+  dynamic_cast(se);
+}
+
+// { dg-error "conflicting declaration" "" { target *-*-* } 0 }

base-commit: e4f938936867d8799775d1455e67bd3fb8711afd

Re: [PATCH] c++: ICE with __has_unique_object_representations [PR115476]

2024-06-24 Thread Jason Merrill


On 6/18/24 10:31, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14/13?


Makes sense to me, though probably the [meta.unary.prop] table should be 
adjusted in the same way.  Jonathan, what do you think?



-- >8 --
Here we started to ICE with r13-25: in check_trait_type, for "X[]" we
return true here:

   if (kind == 1 && TREE_CODE (type) == ARRAY_TYPE && !TYPE_DOMAIN (type))
 return true; // Array of unknown bound. Don't care about completeness.

and then end up crashing in record_has_unique_obj_representations:

4836  if (cur != wi::to_offset (sz))

because sz is null.

https://eel.is/c++draft/type.traits#tab:meta.unary.prop-row-47-column-3-sentence-1
says that the preconditions for __has_unique_object_representations are:
"T shall be a complete type, cv void, or an array of unknown bound" and
that "For an array type T, the same result as
has_unique_object_representations_v>" so T[]
should be treated as T.  So we should use kind==2 for the trait.

PR c++/115476

gcc/cp/ChangeLog:

* semantics.cc (finish_trait_expr)
: Move below to call
check_trait_type with kind==2.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/has-unique-obj-representations4.C: New test.
---
  gcc/cp/semantics.cc  |  2 +-
  .../cpp1z/has-unique-obj-representations4.C  | 16 
  2 files changed, 17 insertions(+), 1 deletion(-)
  create mode 100644 
gcc/testsuite/g++.dg/cpp1z/has-unique-obj-representations4.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 08f5f245e7d..42251b6764b 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12966,7 +12966,6 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
  case CPTK_HAS_NOTHROW_COPY:
  case CPTK_HAS_TRIVIAL_COPY:
  case CPTK_HAS_TRIVIAL_DESTRUCTOR:
-case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
if (!check_trait_type (type1))
return error_mark_node;
break;
@@ -12976,6 +12975,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
  case CPTK_IS_STD_LAYOUT:
  case CPTK_IS_TRIVIAL:
  case CPTK_IS_TRIVIALLY_COPYABLE:
+case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
if (!check_trait_type (type1, /* kind = */ 2))
return error_mark_node;
break;
diff --git a/gcc/testsuite/g++.dg/cpp1z/has-unique-obj-representations4.C 
b/gcc/testsuite/g++.dg/cpp1z/has-unique-obj-representations4.C
new file mode 100644
index 000..d6949dc7005
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/has-unique-obj-representations4.C
@@ -0,0 +1,16 @@
+// PR c++/115476
+// { dg-do compile { target c++11 } }
+
+struct X;
+static_assert(__has_unique_object_representations(X), ""); // { dg-error 
"invalid use of incomplete type" }
+static_assert(__has_unique_object_representations(X[]), "");  // { dg-error 
"invalid use of incomplete type" }
+static_assert(__has_unique_object_representations(X[1]), "");  // { dg-error 
"invalid use of incomplete type" }
+static_assert(__has_unique_object_representations(X[][1]), "");  // { dg-error 
"invalid use of incomplete type" }
+
+struct X {
+  int x;
+};
+static_assert(__has_unique_object_representations(X), "");
+static_assert(__has_unique_object_representations(X[]), "");
+static_assert(__has_unique_object_representations(X[1]), "");
+static_assert(__has_unique_object_representations(X[][1]), "");

base-commit: 7f9be55a4630134a237219af9cc8143e02080380

Re: [PATCH] c++: ICE with generic lambda and pack expansion [PR115425]

2024-06-24 Thread Jason Merrill


On 6/17/24 14:17, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
In r13-272 we hardened the *_PACK_EXPANSION and *_ARGUMENT_PACK macros.
That trips up here because make_pack_expansion returns error_mark_node
and we access that with PACK_EXPANSION_LOCAL_P.

PR c++/115425

gcc/cp/ChangeLog:

* pt.cc (tsubst_pack_expansion): Return error_mark_node if
make_pack_expansion doesn't work out.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-generic12.C: New test.
---
  gcc/cp/pt.cc  |  2 ++
  gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C | 25 +++
  2 files changed, 27 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 607753ae6b7..e676372f75b 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -13775,6 +13775,8 @@ tsubst_pack_expansion (tree t, tree args, 
tsubst_flags_t complain,
else
result = tsubst (pattern, args, complain, in_decl);
result = make_pack_expansion (result, complain);
+  if (result == error_mark_node)
+   return error_mark_node;
PACK_EXPANSION_LOCAL_P (result) = PACK_EXPANSION_LOCAL_P (t);
PACK_EXPANSION_SIZEOF_P (result) = PACK_EXPANSION_SIZEOF_P (t);
if (PACK_EXPANSION_AUTO_P (t))
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C
new file mode 100644
index 000..219529c7c32
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-generic12.C
@@ -0,0 +1,25 @@
+// PR c++/115425
+// { dg-do compile { target c++20 } }
+
+using size_t = decltype(sizeof(0));
+
+template 
+struct X {};
+
+template
+void foo(X);
+
+template
+struct S;
+
+template
+auto test() {
+  constexpr static auto x = foo>(); // { dg-error "no 
matching function" }
+  return [](X) {
+(typename S::type{}, ...);
+  }(X<__integer_pack (0)...>{});
+}
+
+int main() {
+  test();
+}

base-commit: b63c7d92012f92e0517190cf263d29bbef8a06bf

[COMMITTED] Make transitive relations an oracle option

2024-06-24 Thread Andrew MacLeod

Transitive relations can add  processing time to the relation oracle as 
they need to look up previous relations. This  that may not be desired, 
especially for something like fast VRP.   This patch adds a flag at 
oracle creation time which makes processing them optional.


Bootstraps on x86_64-pc-linux-gnu with no regressions.   Pushed.

Andrew
From d3088bf565afcad410ce4fd3ebf6c993f63703b6 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 24 Jun 2024 10:29:06 -0400
Subject: [PATCH 1/2] Make transitive relations an oracle option

This patch makes processing of transitive relations configurable at
dom_oracle creation.

	* tree-vrp.cc (execute_fast_vrp): Do not use transitive relations.
	* value-query.cc (range_query::create_relation_oracle): Add
	parameter to enable transitive relations.
	* value-query.h (range_query::create_relation_oracle): Likewise.
	* value-relation.h (dom_oracle::dom_oracle): Likewise.
	* value-relation.cc (dom_oracle::dom_oracle): Likewise.
	(dom_oracle::register_transitives): Check transitive flag.
---
 gcc/tree-vrp.cc   | 3 ++-
 gcc/value-query.cc| 7 ---
 gcc/value-query.h | 2 +-
 gcc/value-relation.cc | 6 +-
 gcc/value-relation.h  | 3 ++-
 5 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 4fc33e63e7d..26979b706e5 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -1258,7 +1258,8 @@ execute_fast_vrp (struct function *fun, bool final_p)
 
   gcc_checking_assert (!fun->x_range_query);
   fun->x_range_query = &dr;
-  get_range_query (fun)->create_relation_oracle ();
+  // Create a relation oracle without transitives.
+  get_range_query (fun)->create_relation_oracle (false);
 
   folder.substitute_and_fold ();
   if (folder.m_unreachable)
diff --git a/gcc/value-query.cc b/gcc/value-query.cc
index 0a280be580b..cac2cb5b2bc 100644
--- a/gcc/value-query.cc
+++ b/gcc/value-query.cc
@@ -223,17 +223,18 @@ range_query::destroy_infer_oracle ()
 }
 
 // Create dominance based range oracle for the current query if dom info is
-// available.
+// available.  DO_TRANS_P indicates whether transitive relations should
+// be created.  This can cost more in compile time.
 
 void
-range_query::create_relation_oracle ()
+range_query::create_relation_oracle (bool do_trans_p)
 {
   gcc_checking_assert (this != &global_ranges);
   gcc_checking_assert (m_relation == &default_relation_oracle);
 
   if (!dom_info_available_p (CDI_DOMINATORS))
 return;
-  m_relation = new dom_oracle ();
+  m_relation = new dom_oracle (do_trans_p);
   gcc_checking_assert (m_relation);
 }
 
diff --git a/gcc/value-query.h b/gcc/value-query.h
index 2572a03095d..78840fd7a78 100644
--- a/gcc/value-query.h
+++ b/gcc/value-query.h
@@ -76,7 +76,7 @@ public:
   virtual bool range_on_exit (vrange &r, basic_block bb, tree expr);
 
   inline class relation_oracle &relation () const  { return *m_relation; }
-  void create_relation_oracle ();
+  void create_relation_oracle (bool do_trans_p = true);
   void destroy_relation_oracle ();
 
   inline class infer_range_oracle &infer_oracle () const { return *m_infer; }
diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index d7bc1b72558..9293d9ed65b 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -978,8 +978,9 @@ relation_chain_head::find_relation (const_bitmap b1, const_bitmap b2) const
 
 // Instantiate a relation oracle.
 
-dom_oracle::dom_oracle ()
+dom_oracle::dom_oracle (bool do_trans_p)
 {
+  m_do_trans_p = do_trans_p;
   m_relations.create (0);
   m_relations.safe_grow_cleared (last_basic_block_for_fn (cfun) + 1);
   m_relation_set = BITMAP_ALLOC (&m_bitmaps);
@@ -1179,6 +1180,9 @@ void
 dom_oracle::register_transitives (basic_block root_bb,
   const value_relation &relation)
 {
+  // Only register transitives if they are requested.
+  if (!m_do_trans_p)
+return;
   basic_block bb;
   // Only apply transitives to certain kinds of operations.
   switch (relation.kind ())
diff --git a/gcc/value-relation.h b/gcc/value-relation.h
index cf009e6aa19..f168fd9ed41 100644
--- a/gcc/value-relation.h
+++ b/gcc/value-relation.h
@@ -216,7 +216,7 @@ public:
 class dom_oracle : public equiv_oracle
 {
 public:
-  dom_oracle ();
+  dom_oracle (bool do_trans_p = true);
   ~dom_oracle ();
 
   void record (basic_block bb, relation_kind k, tree op1, tree op2)
@@ -229,6 +229,7 @@ public:
   void dump (FILE *f, basic_block bb) const final override;
   void dump (FILE *f) const final override;
 private:
+  bool m_do_trans_p;
   bitmap m_tmp, m_tmp2;
   bitmap m_relation_set;  // Index by ssa-name. True if a relation exists
   vec  m_relations;  // Index by BB, list of relations.
-- 
2.45.0

Re: [PATCH] Add param for bb limit to invoke fast_vrp.

2024-06-24 Thread Andrew MacLeod



On 6/22/24 09:15, Richard Biener wrote:

On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod  wrote:

This patch adds

  --param=vrp-block-limit=N

When the basic block counter for a function exceeded 'N' , VRP is
invoked with the new fast_vrp algorithm instead.   This algorithm uses a
lot less memory and processing power, although it does get a few less
things.

Primary motivation is cases like
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP
passes consume about 600 seconds of the compile time, and a lot of
memory.  With fast_vrp, it spends less than 10 seconds total in the
3 passes of VRP. This test case has about 400,000 basic blocks.

The default for N in this patch is 150,000,  arbitrarily chosen.

This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
as well) on x86_64-pc-linux-gnu, with no regressions.

What do you think, OK for trunk?

+  if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
+ &data == &pass_data_fast_vrp)

|| goes to the next line.

Btw, we have -Wdisabled-optimization for these cases which should
say sth like "function has excess of %d number of basic blocks
(--param vrp-block-limit=%d), using fast VRP algorithm"
or so in this case.

As I wrote in the PR the priority should be -O1 compile-time
performance and memory use.



Yeah, I just wanted to use it as a model for "bad" cases for ranger.   
Adjusted patch attached which now issues the warning.


I also found that the transitive relations were causing a small blowup 
in time for relation processing now that I turned relations on for fast 
VRP.  I commited a patch and fast_vrp no longer does transitives.


If you want to experiment with enabling fast VRP at -O1, it should be 
fast all the time now.  I think :-)    This testcase runs in about 95 
seconds on my test machine.  if I turn on VRP, a single VRP pass takes 
about 2.5 seconds.    Its all set up, you can just add:


NEXT_PASS (pass_fast_vrp);

at an appropriate spot.


Richard.


Andrew

PS sorry,. it doesn't help the threader in that PR :-(

It's probably one of the known bottlenecks there - IIRC the path range
queries make O(n^2) work.  I can look at this at some point as I've
dealt with the slowness of this pass in the past.

There is param_max_jump_thread_paths that should limit searching
but there is IIRC no way to limit the work path ranger actually does
when resolving the query.

Yeah, Id like to talk to Aldy about revamping the threader now that some 
of the newer facilities are available that fast_vrp uses.


We can calculate all the outgoing ranges for a block at once with :

  // Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
  bool gori_on_edge (class ssa_cache &r, edge e, range_query *query = 
NULL);


This is what the fast_vrp routines uses.  We can gather all range 
restrictions generated from an edge efficiently just once and then 
intersect them with a known range as we walk the different paths. We 
don't need the gori exports , nor any of the other on-demand bits where 
we calculate each export range dynamically.. I suspect it would reduce 
the workload and memory impact quite a bit, but I'm not really familiar 
with exactly how the threader uses those things.


It'd require some minor tweaking to the lazy_ssa_cache to make the 
bitmap of names set accessible. This  would provide similar 
functionality to what the gori export () routine provides.  Both 
relations and inferred ranges should only need to be calculated once per 
block as well and could/should/would be applied the same way if they are 
present.   I don't *think* the threader uses any of the def chains, but 
Aldy can chip in.


Andrew
From 15f697aad90c35e42a4416d7db6e7289c0f5aae3 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 17 Jun 2024 11:38:46 -0400
Subject: [PATCH 2/2] Add param for bb limit to invoke fast_vrp.

If the basic block count is too high, simply use fast_vrp for all
VRP passes.

	gcc/doc/
	* invoke.texi (vrp-block-limit): Document.

	gcc/
	* params.opt (-param=vrp-block-limit): New.
	* tree-vrp.cc (fvrp_folder::execute): Invoke fast_vrp if block
	count exceeds limit.
---
 gcc/doc/invoke.texi |  3 +++
 gcc/params.opt  |  4 
 gcc/tree-vrp.cc | 16 ++--
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c790e2f3518..80da5e9d306 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16840,6 +16840,9 @@ this parameter.  The default value of this parameter is 50.
 @item vect-induction-float
 Enable loop vectorization of floating point inductions.
 
+@item vrp-block-limit
+Maximum number of basic blocks before VRP switches to a lower memory algorithm.
+
 @item vrp-sparse-threshold
 Maximum number of basic blocks before VRP uses a sparse bitmap cache.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index d34ef545bf0..c17ba17b91b 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1198,6 +1198,10

RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-24 Thread Li, Pan2

Thanks Tamar for comments. It indeed benefits the vectorized code, for example 
in RISC-V, we may eliminate some vsetvel insn in loop for widen here.

> iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int) 
> b_12(D));
> is cheaper than
> iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));

I am not sure if it has any correctness problem for this transform, take 
uint16_t to uint8_t as example.

uint16_t a, b;
uint8_t result = (uint8_t)(a >= b ? a - b : 0);

Given a = 0x100; // 256
   b = 0xff; // 255
For iftmp.0_5 = .SAT_SUB ((char unsigned) a, (char unsigned) b) = .SAT_SUB (0, 
255) = 0
For iftmp.0_5 = (char unsigned).SAT_SUB (a, b) = (char unsigned).SAT_SUB (256, 
255) = 1

Please help to correct me if any misunderstanding, thanks again for 
enlightening.

Pan

-Original Message-
From: Tamar Christina  
Sent: Tuesday, June 25, 2024 4:00 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
jeffreya...@gmail.com; pins...@gmail.com
Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

Hi,

> -Original Message-
> From: pan2...@intel.com 
> Sent: Monday, June 24, 2024 2:55 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com; Pan Li 
> Subject: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> From: Pan Li 
> 
> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
> 
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
> 
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
> 
> It will have gimple before vect pass,  it cannot hit any pattern of
> SAT_SUB and then cannot vectorize to SAT_SUB.
> 
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = _18 ? iftmp.0_13 : 0;
> 
> This patch would like to improve the pattern match to recog above
> as truncate after .SAT_SUB pattern.  Then we will have the pattern
> similar to below,  as well as eliminate the first 3 dead stmt.
> 
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> 

I guess this is because one branch of the  cond is a constant so the
convert is folded in.  I was wondering though,  can't we just push
in the truncate in this case?

i.e. in this case we know both types are unsigned and the difference
positive and max value is the max value of the truncate type.

It seems like folding as a general rule

  _1 = *p_10;
  a_11 = (unsigned int) _1;
  _2 = a_11 - b_12(D);
  iftmp.0_13 = (short unsigned int) _2;
  _18 = a_11 >= b_12(D);
  iftmp.0_5 = _18 ? iftmp.0_13 : 0;
  *p_10 = iftmp.0_5;

Into 

  _1 = *p_10;
  a_11 = (unsigned int) _1;
  _2 = ((short unsigned int) a_11) - ((short unsigned int) b_12(D));
  iftmp.0_13 = _2;
  _18 = a_11 >= b_12(D);
  iftmp.0_5 = _18 ? iftmp.0_13 : 0;
  *p_10 = iftmp.0_5;

Is valid (though might have missed something).  This would negate the need for 
this change to the vectorizer and saturation detection
but also should generate better vector code. This is what we do in the general 
case https://godbolt.org/z/dfoj6fWdv
I think here we're just not seeing through the cond.

Typically lots of architectures have cheap truncation operations, so truncating 
before saturation means you do the cheap
operation first rather than doing the complex operation on the wider type.

That is,

_2 = a_11 - b_12(D);
iftmp.0_13 = (short unsigned int) _2;
_18 = a_11 >= b_12(D);
iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int) b_12(D));

is cheaper than

_2 = a_11 - b_12(D);
iftmp.0_13 = (short unsigned int) _2;
_18 = a_11 >= b_12(D);
iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));

after vectorization.   Normally the vectorizer will try to do this through 
over-widening detection as well,
but we haven't taught ranger about the ranges of these new IFNs (probably 
should at some point).

Cheers,
Tamar

> The below tests are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
> 
> gcc/ChangeLog:
> 
>   * match.pd: Add convert description for minus and capture.
>   * tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
>   new logic to handle in_type is incompatibile with out_type,  as
>   well as rename from.
>   (vect_recog_build_binary_gimple_stmt): Rename to.
>   (vect_recog_sat_add_pattern): Leverage above renamed func.
>   (vect_recog_sat_sub_pattern): Ditto.
> 
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  |  4 +--
>  gcc/tree-vect-patterns.cc | 51 ---
>  2 f

Re: [PATCH] Add param for bb limit to invoke fast_vrp.

2024-06-24 Thread Andrew Pinski

On Mon, Jun 24, 2024 at 7:20 PM Andrew MacLeod  wrote:
>
>
> On 6/22/24 09:15, Richard Biener wrote:
> > On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod  wrote:
> >> This patch adds
> >>
> >>   --param=vrp-block-limit=N
> >>
> >> When the basic block counter for a function exceeded 'N' , VRP is
> >> invoked with the new fast_vrp algorithm instead.   This algorithm uses a
> >> lot less memory and processing power, although it does get a few less
> >> things.
> >>
> >> Primary motivation is cases like
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP
> >> passes consume about 600 seconds of the compile time, and a lot of
> >> memory.  With fast_vrp, it spends less than 10 seconds total in the
> >> 3 passes of VRP. This test case has about 400,000 basic blocks.
> >>
> >> The default for N in this patch is 150,000,  arbitrarily chosen.
> >>
> >> This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
> >> as well) on x86_64-pc-linux-gnu, with no regressions.
> >>
> >> What do you think, OK for trunk?
> > +  if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
> > + &data == &pass_data_fast_vrp)
> >
> > || goes to the next line.
> >
> > Btw, we have -Wdisabled-optimization for these cases which should
> > say sth like "function has excess of %d number of basic blocks
> > (--param vrp-block-limit=%d), using fast VRP algorithm"
> > or so in this case.
> >
> > As I wrote in the PR the priority should be -O1 compile-time
> > performance and memory use.
>
>
> Yeah, I just wanted to use it as a model for "bad" cases for ranger.
> Adjusted patch attached which now issues the warning.
>
> I also found that the transitive relations were causing a small blowup
> in time for relation processing now that I turned relations on for fast
> VRP.  I commited a patch and fast_vrp no longer does transitives.
>
> If you want to experiment with enabling fast VRP at -O1, it should be
> fast all the time now.  I think :-)This testcase runs in about 95
> seconds on my test machine.  if I turn on VRP, a single VRP pass takes
> about 2.5 seconds.Its all set up, you can just add:
>
> NEXT_PASS (pass_fast_vrp);
>
> at an appropriate spot.
>
> > Richard.
> >
> >> Andrew
> >>
> >> PS sorry,. it doesn't help the threader in that PR :-(
> > It's probably one of the known bottlenecks there - IIRC the path range
> > queries make O(n^2) work.  I can look at this at some point as I've
> > dealt with the slowness of this pass in the past.
> >
> > There is param_max_jump_thread_paths that should limit searching
> > but there is IIRC no way to limit the work path ranger actually does
> > when resolving the query.
> >
> Yeah, Id like to talk to Aldy about revamping the threader now that some
> of the newer facilities are available that fast_vrp uses.
>
> We can calculate all the outgoing ranges for a block at once with :
>
>// Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
>bool gori_on_edge (class ssa_cache &r, edge e, range_query *query =
> NULL);
>
> This is what the fast_vrp routines uses.  We can gather all range
> restrictions generated from an edge efficiently just once and then
> intersect them with a known range as we walk the different paths. We
> don't need the gori exports , nor any of the other on-demand bits where
> we calculate each export range dynamically.. I suspect it would reduce
> the workload and memory impact quite a bit, but I'm not really familiar
> with exactly how the threader uses those things.
>
> It'd require some minor tweaking to the lazy_ssa_cache to make the
> bitmap of names set accessible. This  would provide similar
> functionality to what the gori export () routine provides.  Both
> relations and inferred ranges should only need to be calculated once per
> block as well and could/should/would be applied the same way if they are
> present.   I don't *think* the threader uses any of the def chains, but
> Aldy can chip in.

+   warning (OPT_Wdisabled_optimization,
+"Using fast VRP algorithm. %d basic blocks"
+" exceeds %s%d limit",
+n_basic_blocks_for_fn (fun),
+"--param=vrp-block-limit=",
+param_vrp_block_limit);

This should be:
warning (OPT_Wdisabled_optimization, "Using fast VRP algorithm. %d basic blocks"
" exceeds %<%--param=vrp-block-limit=d%> limit",
n_basic_blocks_for_fn (fun), param_vrp_block_limit);

I had thought it was mentioned that options should be quoted but it is
not mentioned in the coding conventions:
https://gcc.gnu.org/codingconventions.html#Diagnostics

But it is mentioned in
https://inbox.sourceware.org/gcc/2d2bd844-2de4-ecff-7a07-b22350750...@gmail.com/
; This is why you were getting an error as you mentioned on IRC.


>
> Andrew

Re: [PATCH 05/11] Handle const and varible modifiers for CodeView

2024-06-24 Thread Mark Harmstone


On 24/6/24 04:39, Jeff Law wrote:



So presumably you're freeing these objects elsewhere?  I see the free 
(custom_types), but I don' see where you free an subobjects.  Did I miss 
something?

I'll go ahead and commit, but please double check for memory leaks.


Thanks Jeff. I just realized I wrote "varible" rather than "volatile" - ah well.

See patch 4 - write_custom_types loops through the custom_types linked list, 
and removes and frees the head until it's empty.

Mark

Re: [PATCH 2/13 ver4] rs6000, Remove __builtin_vsx_xvcvspsxws,, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.

2024-06-24 Thread Carl Love

Kewen:

On 6/18/24 20:03, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/6/14 03:40, Carl Love wrote:
>> GCC maintainers:
>>
>> Per the comments on patch 0004 from version 3, the removal of 
>> The built-in __builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws was 
>> moved to this patch.  The rest of the patch is unchanged from version 3.  
>> There were no comments on this patch for version 3.
>>
>> Please let me know if this patch is acceptable.  Thanks.
>>
>> Carl 
>>
>>
>> -
>>
>> rs6000, Remove __builtin_vsx_xvcvspsxws,
>>  __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.
> 
> Nit: Maybe make it shorter like: Remove built-ins 
> __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}
> 
>>
>> The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
> 
> Nit: Strictly speaking, not a duplicate of vec_signed but covered by it.
> 
>> built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
>> built-in is not documented and there are no test cases for it.
>>
>> The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
>> vec_unsigned, remove.
>>
>> The __builtin_vsx_xvcvspuxws is redundant as it is covered by
>> vec_unsigned, remove.
> 
> As mentioned in the previous review, I'd expect patch 4/13 only focuses on
> extending vec_{un,}signed{e,o} for vector float (aka. __builtin_vsx_xvcvspsxds
> and __builtin_vsx_xvcvspuxds related), and this patch focuses on some built-in
> removals which have been covered by the existing vec_{un,}signed{,e,o}, so
> it can also drop the built-ins:
> 
> "The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
> vec_signed{e,o}, remove.
> 
> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
> vec_unsigned{e,o}, remove."
> 
> // copied from 4/13.

Not sure why I didn't move these two with the other two???  Sorry.

Moved them from patch 4.

  Carl

Re: Re: [PATCH] RISC-V: Support -m[no-]unaligned-access

2024-06-24 Thread Wang Pengcheng

Thanks for taking a look!

Things have changed after I posted this patch and LLVM doesn't support
this option now, so I think we don't need this patch any more.

Please see this PR and its references:
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/62.


On 2024/6/25 2:17, Palmer Dabbelt wrote:
> On Fri, 22 Dec 2023 01:23:13 PST (-0800), wangpengcheng...@bytedance.com
> wrote:
>> These two options are negative alias of -m[no-]strict-align.
>>
>> This matches LLVM implmentation.
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv.opt: Add option alias.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/predef-align-10.c: New test.
>> * gcc.target/riscv/predef-align-7.c: New test.
>> * gcc.target/riscv/predef-align-8.c: New test.
>> * gcc.target/riscv/predef-align-9.c: New test.
>>
>> Signed-off-by: Wang Pengcheng
>
> Sorry for being slow here.  With the scalar/vector alignment split we're
> cleaning up a bunch of these LLVM/GCC differences, and we're waiting for
> the LLVM folks to decide how these are going to behave.  LLVM will
> release well before GCC does, so we've got some time.
>
> So this isn't lost, just slow.
>
>> ---
>> gcc/config/riscv/riscv.opt | 4 
>> gcc/testsuite/gcc.target/riscv/predef-align-10.c | 16 
>> gcc/testsuite/gcc.target/riscv/predef-align-7.c | 15 +++
>> gcc/testsuite/gcc.target/riscv/predef-align-8.c | 16 
>> gcc/testsuite/gcc.target/riscv/predef-align-9.c | 15 +++
>> 5 files changed, 66 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-10.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-7.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-8.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-9.c
>>
>> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
>> index cf207d4dcdf..1e22998ce6e 100644
>> --- a/gcc/config/riscv/riscv.opt
>> +++ b/gcc/config/riscv/riscv.opt
>> @@ -116,6 +116,10 @@ mstrict-align
>> Target Mask(STRICT_ALIGN) Save
>> Do not generate unaligned memory accesses.
>>
>> +munaligned-access
>> +Target Alias(mstrict-align) NegativeAlias
>> +Enable unaligned memory accesses.
>> +
>> Enum
>> Name(code_model) Type(enum riscv_code_model)
>> Known code models (for use with the -mcmodel= option):
>> diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-10.c
>> b/gcc/testsuite/gcc.target/riscv/predef-align-10.c
>> new file mode 100644
>> index 000..c86b2c7a5ed
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/predef-align-10.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-mtune=rocket -munaligned-access" } */
>> +
>> +int main() {
>> +
>> +/* rocket default is cpu tune param misaligned access slow */
>> +#if !defined(__riscv_misaligned_slow)
>> +#error "__riscv_misaligned_slow is not set"
>> +#endif
>> +
>> +#if defined(__riscv_misaligned_avoid) ||
>> defined(__riscv_misaligned_fast)
>> +#error "__riscv_misaligned_avoid or __riscv_misaligned_fast is
>> unexpectedly set"
>> +#endif
>> +
>> + return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-7.c
>> b/gcc/testsuite/gcc.target/riscv/predef-align-7.c
>> new file mode 100644
>> index 000..405f3686c2e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/predef-align-7.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-mtune=thead-c906 -mno-unaligned-access" } */
>> +
>> +int main() {
>> +
>> +#if !defined(__riscv_misaligned_avoid)
>> +#error "__riscv_misaligned_avoid is not set"
>> +#endif
>> +
>> +#if defined(__riscv_misaligned_fast) ||
defined(__riscv_misaligned_slow)
>> +#error "__riscv_misaligned_fast or __riscv_misaligned_slow is
>> unexpectedly
>> set"
>> +#endif
>> +
>> + return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-8.c
>> b/gcc/testsuite/gcc.target/riscv/predef-align-8.c
>> new file mode 100644
>> index 000..64072c04a47
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/predef-align-8.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-mtune=thead-c906 -munaligned-access" } */
>> +
>> +int main() {
>> +
>> +/* thead-c906 default is cpu tune param misaligned access fast */
>> +#if !defined(__riscv_misaligned_fast)
>> +#error "__riscv_misaligned_fast is not set"
>> +#endif
>> +
>> +#if defined(__riscv_misaligned_avoid) ||
>> defined(__riscv_misaligned_slow)
>> +#error "__riscv_misaligned_avoid or __riscv_misaligned_slow is
>> unexpectedly set"
>> +#endif
>> +
>> + return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-9.c
>> b/gcc/testsuite/gcc.target/riscv/predef-align-9.c
>> new file mode 100644
>> index 000..f5418de87cf
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/predef-align-9.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-mtune=rocket -mno-unaligned-access" } */
>> +
>> +int main() {
>> +
>> +#if !defined(__riscv_misaligned_avoi

[PATCH 1/3 v4] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-06-24 Thread Hu, Lin1

Hi,

This is the current version. 

I haven't made any major changes to the original code, I think it will have 
less impact on your code. And I think the current API is sufficient to support 
the mode selection you mentioned, if you have any concerns you can mention 
them. I can tweak it further.

BRs,
Lin

gcc/ChangeLog:

PR target/107432
* tree-vect-generic.cc
(expand_vector_conversion): Support convert for int -> int,
float -> float and int <-> float.
* tree-vect-stmts.cc (vectorizable_conversion): Wrap the
indirect convert part.
(supportable_indirect_convert_operation): New function.
* tree-vectorizer.h (supportable_indirect_convert_operation):
Define the new function.

gcc/testsuite/ChangeLog:

PR target/107432
* gcc.target/i386/pr107432-1.c: New test.
* gcc.target/i386/pr107432-2.c: Ditto.
* gcc.target/i386/pr107432-3.c: Ditto.
* gcc.target/i386/pr107432-4.c: Ditto.
* gcc.target/i386/pr107432-5.c: Ditto.
* gcc.target/i386/pr107432-6.c: Ditto.
* gcc.target/i386/pr107432-7.c: Ditto.
---
 gcc/testsuite/gcc.target/i386/pr107432-1.c | 234 +++
 gcc/testsuite/gcc.target/i386/pr107432-2.c | 105 +
 gcc/testsuite/gcc.target/i386/pr107432-3.c |  55 +
 gcc/testsuite/gcc.target/i386/pr107432-4.c |  56 +
 gcc/testsuite/gcc.target/i386/pr107432-5.c |  72 ++
 gcc/testsuite/gcc.target/i386/pr107432-6.c | 139 +++
 gcc/testsuite/gcc.target/i386/pr107432-7.c | 150 
 gcc/tree-vect-generic.cc   |  34 ++-
 gcc/tree-vect-stmts.cc | 259 ++---
 gcc/tree-vectorizer.h  |   4 +
 10 files changed, 1013 insertions(+), 95 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-7.c

diff --git a/gcc/testsuite/gcc.target/i386/pr107432-1.c 
b/gcc/testsuite/gcc.target/i386/pr107432-1.c
new file mode 100644
index 000..a4f37447eb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr107432-1.c
@@ -0,0 +1,234 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64 -mavx512bw -mavx512vl -O3" } */
+/* { dg-final { scan-assembler-times "vpmovqd" 6 } } */
+/* { dg-final { scan-assembler-times "vpmovqw" 6 } } */
+/* { dg-final { scan-assembler-times "vpmovqb" 6 } } */
+/* { dg-final { scan-assembler-times "vpmovdw" 6 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpmovdw" 8 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpmovdb" 6 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpmovdb" 8 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpmovwb" 8 } } */
+
+#include 
+
+typedef short __v2hi __attribute__ ((__vector_size__ (4)));
+typedef char __v2qi __attribute__ ((__vector_size__ (2)));
+typedef char __v4qi __attribute__ ((__vector_size__ (4)));
+typedef char __v8qi __attribute__ ((__vector_size__ (8)));
+
+typedef unsigned short __v2hu __attribute__ ((__vector_size__ (4)));
+typedef unsigned short __v4hu __attribute__ ((__vector_size__ (8)));
+typedef unsigned char __v2qu __attribute__ ((__vector_size__ (2)));
+typedef unsigned char __v4qu __attribute__ ((__vector_size__ (4)));
+typedef unsigned char __v8qu __attribute__ ((__vector_size__ (8)));
+typedef unsigned int __v2su __attribute__ ((__vector_size__ (8)));
+
+__v2si mm_cvtepi64_epi32_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v2di)a, __v2si);
+}
+
+__m128imm256_cvtepi64_epi32_builtin_convertvector(__m256i a)
+{
+  return (__m128i)__builtin_convertvector((__v4di)a, __v4si);
+}
+
+__m256imm512_cvtepi64_epi32_builtin_convertvector(__m512i a)
+{
+  return (__m256i)__builtin_convertvector((__v8di)a, __v8si);
+}
+
+__v2hi mm_cvtepi64_epi16_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v2di)a, __v2hi);
+}
+
+__v4hi mm256_cvtepi64_epi16_builtin_convertvector(__m256i a)
+{
+  return __builtin_convertvector((__v4di)a, __v4hi);
+}
+
+__m128imm512_cvtepi64_epi16_builtin_convertvector(__m512i a)
+{
+  return (__m128i)__builtin_convertvector((__v8di)a, __v8hi);
+}
+
+__v2qi mm_cvtepi64_epi8_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v2di)a, __v2qi);
+}
+
+__v4qi mm256_cvtepi64_epi8_builtin_convertvector(__m256i a)
+{
+  return __builtin_convertvector((__v4di)a, __v4qi);
+}
+
+__v8qi mm512_cvtepi64_epi8_builtin_convertvector(__m512i a)
+{
+  return __builtin_convertvector((__v8di)a, __v8qi);
+}
+
+__v2hi mm64_cvtepi32_epi16_builtin_convertvector(__v2si a)
+{

RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-24 Thread Tamar Christina

> -Original Message-
> From: Li, Pan2 
> Sent: Tuesday, June 25, 2024 3:25 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> Thanks Tamar for comments. It indeed benefits the vectorized code, for 
> example in
> RISC-V, we may eliminate some vsetvel insn in loop for widen here.
> 
> > iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int) 
> > b_12(D));
> > is cheaper than
> > iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> 
> I am not sure if it has any correctness problem for this transform, take 
> uint16_t to
> uint8_t as example.
> 
> uint16_t a, b;
> uint8_t result = (uint8_t)(a >= b ? a - b : 0);
> 
> Given a = 0x100; // 256
>b = 0xff; // 255
> For iftmp.0_5 = .SAT_SUB ((char unsigned) a, (char unsigned) b) = .SAT_SUB (0,
> 255) = 0
> For iftmp.0_5 = (char unsigned).SAT_SUB (a, b) = (char unsigned).SAT_SUB (256,
> 255) = 1
> 
> Please help to correct me if any misunderstanding, thanks again for 
> enlightening.

Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
should have
though it through more.

Tamar.
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, June 25, 2024 4:00 AM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> Hi,
> 
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Monday, June 24, 2024 2:55 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > jeffreya...@gmail.com; pins...@gmail.com; Pan Li 
> > Subject: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> >
> > From: Pan Li 
> >
> > The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> > truncated as below:
> >
> > void test (uint16_t *x, unsigned b, unsigned n)
> > {
> >   unsigned a = 0;
> >   register uint16_t *p = x;
> >
> >   do {
> > a = *--p;
> > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
> >   } while (--n);
> > }
> >
> > It will have gimple before vect pass,  it cannot hit any pattern of
> > SAT_SUB and then cannot vectorize to SAT_SUB.
> >
> > _2 = a_11 - b_12(D);
> > iftmp.0_13 = (short unsigned int) _2;
> > _18 = a_11 >= b_12(D);
> > iftmp.0_5 = _18 ? iftmp.0_13 : 0;
> >
> > This patch would like to improve the pattern match to recog above
> > as truncate after .SAT_SUB pattern.  Then we will have the pattern
> > similar to below,  as well as eliminate the first 3 dead stmt.
> >
> > _2 = a_11 - b_12(D);
> > iftmp.0_13 = (short unsigned int) _2;
> > _18 = a_11 >= b_12(D);
> > iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> >
> 
> I guess this is because one branch of the  cond is a constant so the
> convert is folded in.  I was wondering though,  can't we just push
> in the truncate in this case?
> 
> i.e. in this case we know both types are unsigned and the difference
> positive and max value is the max value of the truncate type.
> 
> It seems like folding as a general rule
> 
>   _1 = *p_10;
>   a_11 = (unsigned int) _1;
>   _2 = a_11 - b_12(D);
>   iftmp.0_13 = (short unsigned int) _2;
>   _18 = a_11 >= b_12(D);
>   iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>   *p_10 = iftmp.0_5;
> 
> Into
> 
>   _1 = *p_10;
>   a_11 = (unsigned int) _1;
>   _2 = ((short unsigned int) a_11) - ((short unsigned int) b_12(D));
>   iftmp.0_13 = _2;
>   _18 = a_11 >= b_12(D);
>   iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>   *p_10 = iftmp.0_5;
> 
> Is valid (though might have missed something).  This would negate the need for
> this change to the vectorizer and saturation detection
> but also should generate better vector code. This is what we do in the 
> general case
> https://godbolt.org/z/dfoj6fWdv
> I think here we're just not seeing through the cond.
> 
> Typically lots of architectures have cheap truncation operations, so 
> truncating
> before saturation means you do the cheap
> operation first rather than doing the complex operation on the wider type.
> 
> That is,
> 
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int) 
> b_12(D));
> 
> is cheaper than
> 
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> 
> after vectorization.   Normally the vectorizer will try to do this through 
> over-
> widening detection as well,
> but we haven't taught ranger about the ranges of these new IFNs (probably
> should at some point).
> 
> Cheers,
> Tamar
> 
> > The below tests are passed for this patch.

Re: [PATCH] Add param for bb limit to invoke fast_vrp.

2024-06-24 Thread Andrew Pinski

On Mon, Jun 24, 2024 at 7:35 PM Andrew Pinski  wrote:
>
> On Mon, Jun 24, 2024 at 7:20 PM Andrew MacLeod  wrote:
> >
> >
> > On 6/22/24 09:15, Richard Biener wrote:
> > > On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod  
> > > wrote:
> > >> This patch adds
> > >>
> > >>   --param=vrp-block-limit=N
> > >>
> > >> When the basic block counter for a function exceeded 'N' , VRP is
> > >> invoked with the new fast_vrp algorithm instead.   This algorithm uses a
> > >> lot less memory and processing power, although it does get a few less
> > >> things.
> > >>
> > >> Primary motivation is cases like
> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP
> > >> passes consume about 600 seconds of the compile time, and a lot of
> > >> memory.  With fast_vrp, it spends less than 10 seconds total in the
> > >> 3 passes of VRP. This test case has about 400,000 basic blocks.
> > >>
> > >> The default for N in this patch is 150,000,  arbitrarily chosen.
> > >>
> > >> This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
> > >> as well) on x86_64-pc-linux-gnu, with no regressions.
> > >>
> > >> What do you think, OK for trunk?
> > > +  if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
> > > + &data == &pass_data_fast_vrp)
> > >
> > > || goes to the next line.
> > >
> > > Btw, we have -Wdisabled-optimization for these cases which should
> > > say sth like "function has excess of %d number of basic blocks
> > > (--param vrp-block-limit=%d), using fast VRP algorithm"
> > > or so in this case.
> > >
> > > As I wrote in the PR the priority should be -O1 compile-time
> > > performance and memory use.
> >
> >
> > Yeah, I just wanted to use it as a model for "bad" cases for ranger.
> > Adjusted patch attached which now issues the warning.
> >
> > I also found that the transitive relations were causing a small blowup
> > in time for relation processing now that I turned relations on for fast
> > VRP.  I commited a patch and fast_vrp no longer does transitives.
> >
> > If you want to experiment with enabling fast VRP at -O1, it should be
> > fast all the time now.  I think :-)This testcase runs in about 95
> > seconds on my test machine.  if I turn on VRP, a single VRP pass takes
> > about 2.5 seconds.Its all set up, you can just add:
> >
> > NEXT_PASS (pass_fast_vrp);
> >
> > at an appropriate spot.
> >
> > > Richard.
> > >
> > >> Andrew
> > >>
> > >> PS sorry,. it doesn't help the threader in that PR :-(
> > > It's probably one of the known bottlenecks there - IIRC the path range
> > > queries make O(n^2) work.  I can look at this at some point as I've
> > > dealt with the slowness of this pass in the past.
> > >
> > > There is param_max_jump_thread_paths that should limit searching
> > > but there is IIRC no way to limit the work path ranger actually does
> > > when resolving the query.
> > >
> > Yeah, Id like to talk to Aldy about revamping the threader now that some
> > of the newer facilities are available that fast_vrp uses.
> >
> > We can calculate all the outgoing ranges for a block at once with :
> >
> >// Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
> >bool gori_on_edge (class ssa_cache &r, edge e, range_query *query =
> > NULL);
> >
> > This is what the fast_vrp routines uses.  We can gather all range
> > restrictions generated from an edge efficiently just once and then
> > intersect them with a known range as we walk the different paths. We
> > don't need the gori exports , nor any of the other on-demand bits where
> > we calculate each export range dynamically.. I suspect it would reduce
> > the workload and memory impact quite a bit, but I'm not really familiar
> > with exactly how the threader uses those things.
> >
> > It'd require some minor tweaking to the lazy_ssa_cache to make the
> > bitmap of names set accessible. This  would provide similar
> > functionality to what the gori export () routine provides.  Both
> > relations and inferred ranges should only need to be calculated once per
> > block as well and could/should/would be applied the same way if they are
> > present.   I don't *think* the threader uses any of the def chains, but
> > Aldy can chip in.
>
> +   warning (OPT_Wdisabled_optimization,
> +"Using fast VRP algorithm. %d basic blocks"
> +" exceeds %s%d limit",
> +n_basic_blocks_for_fn (fun),
> +"--param=vrp-block-limit=",
> +param_vrp_block_limit);
>
> This should be:
> warning (OPT_Wdisabled_optimization, "Using fast VRP algorithm. %d basic 
> blocks"
> " exceeds %<%--param=vrp-block-limit=d%> limit",
> n_basic_blocks_for_fn (fun), param_vrp_block_limit);
>
> I had thought it was mentioned that options should be quoted but it is
> not mentioned in the coding conventions:
> https://gcc.gnu.org/codingconventions.html#Diagnostics
>
> But it is mentioned in
> https://inbox.sourceware.org/gcc/2d2bd844-2de4-ecff-7a07-b22350750...@gmail.com/
> ; This is why you were get

[PUSHED] c-family: Add Warning property to Wnrvo option [PR115624]

2024-06-24 Thread Andrew Pinski

This was missing when Wnrvo was added in
r14-1594-g2ae5384d457b9c67586de012816dfc71a6943164 .

Pushed to the trunk and GCC 14 branch as obvious after a bootstrap/test on 
x86_64-linux-gnu.

gcc/c-family/ChangeLog:

PR C++/115624
* c.opt (Wnrvo): Add Warning property.

Signed-off-by: Andrew Pinski 
---
 gcc/c-family/c.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index b067369fa7e..864ef4e3b3d 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1137,7 +1137,7 @@ C ObjC Var(warn_override_init_side_effects) Init(1) 
Warning
 Warn about overriding initializers with side effects.
 
 Wnrvo
-C++ ObjC++ Var(warn_nrvo)
+C++ ObjC++ Var(warn_nrvo) Warning
 Warn if the named return value optimization is not performed although it is 
allowed.
 
 Wpacked-bitfield-compat
-- 
2.43.0

[PATCH] c++: structured bindings and lookup of tuple_size/tuple_element [PR115605]

2024-06-24 Thread Andrew Pinski

The problem here is even though we pass std namespace to lookup_template_class
as the context, it will look at the current scope for the name too.
The fix is to lookup the qualified name first and then use that
for lookup_template_class.
This is how std::initializer_list is handled in listify.

Note g++.dg/cpp1z/decomp22.C testcase now fails correctly
with an error, that tuple_size is not in the std namespace.
I copied a fixed up testcase into g++.dg/cpp1z/decomp62.C.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c++/115605

gcc/cp/ChangeLog:

* decl.cc (get_tuple_size): Call lookup_qualified_name
before calling lookup_template_class.
(get_tuple_element_type): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/decomp22.C: Expect an error
* g++.dg/cpp1z/decomp61.C: New test.
* g++.dg/cpp1z/decomp62.C: Copied from decomp22.C
and wrap tuple_size/tuple_element inside std namespace.

Signed-off-by: Andrew Pinski 
---
 gcc/cp/decl.cc| 16 +---
 gcc/testsuite/g++.dg/cpp1z/decomp22.C |  2 +-
 gcc/testsuite/g++.dg/cpp1z/decomp61.C | 53 +++
 gcc/testsuite/g++.dg/cpp1z/decomp62.C | 23 
 4 files changed, 88 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/decomp61.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/decomp62.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 03deb1493a4..81dde4d51a3 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -9195,10 +9195,13 @@ get_tuple_size (tree type)
 {
   tree args = make_tree_vec (1);
   TREE_VEC_ELT (args, 0) = type;
-  tree inst = lookup_template_class (tuple_size_identifier, args,
+  tree std_tuple_size = lookup_qualified_name (std_node, 
tuple_size_identifier);
+  if (std_tuple_size == error_mark_node)
+return NULL_TREE;
+  tree inst = lookup_template_class (std_tuple_size, args,
 /*in_decl*/NULL_TREE,
-/*context*/std_node,
-tf_none);
+/*context*/NULL_TREE,
+tf_warning_or_error);
   inst = complete_type (inst);
   if (inst == error_mark_node
   || !COMPLETE_TYPE_P (inst)
@@ -9224,9 +9227,12 @@ get_tuple_element_type (tree type, unsigned i)
   tree args = make_tree_vec (2);
   TREE_VEC_ELT (args, 0) = build_int_cst (integer_type_node, i);
   TREE_VEC_ELT (args, 1) = type;
-  tree inst = lookup_template_class (tuple_element_identifier, args,
+  tree std_tuple_elem = lookup_qualified_name (std_node, 
tuple_element_identifier);
+  if (std_tuple_elem == error_mark_node)
+return NULL_TREE;
+  tree inst = lookup_template_class (std_tuple_elem, args,
 /*in_decl*/NULL_TREE,
-/*context*/std_node,
+/*context*/NULL_TREE,
 tf_warning_or_error);
   return make_typename_type (inst, type_identifier,
 none_type, tf_warning_or_error);
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp22.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp22.C
index 9e6b8df486a..4131486e292 100644
--- a/gcc/testsuite/g++.dg/cpp1z/decomp22.C
+++ b/gcc/testsuite/g++.dg/cpp1z/decomp22.C
@@ -17,5 +17,5 @@ int
 foo (C t)
 {
   auto[x0] = t;// { dg-warning "structured bindings only available 
with" "" { target c++14_down } }
-  return x0;
+  return x0; /* { dg-error "cannot convert" } */
 }
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp61.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp61.C
new file mode 100644
index 000..874844b2c61
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/decomp61.C
@@ -0,0 +1,53 @@
+// PR c++/115605
+// { dg-do compile { target c++17 } }
+// { dg-options "" }
+
+using size_t = decltype(sizeof(0));
+
+namespace std
+{
+  template
+  struct tuple_size;
+  template
+  struct tuple_element;
+}
+
+struct mytuple
+{
+  int t;
+  template
+  int &get()
+  {
+return t;
+  }
+};
+
+namespace std
+{
+  template<>
+  struct tuple_size
+  {
+static constexpr int value = 3;
+  };
+  template
+  struct tuple_element
+  {
+using type = int;
+  };
+}
+
+/* The tuple_size/tuple_element lookup should only be from std and not
+   from the current scope so these 2 functions should work. */
+int foo() {
+int const tuple_size = 5;
+mytuple array;
+auto [a, b, c] = array;
+return c;
+}
+int foo1() {
+int const tuple_element = 5;
+mytuple array;
+auto [a, b, c] = array;
+return c;
+}
+
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp62.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp62.C
new file mode 100644
index 000..694f3263bd8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/decomp62.C
@@ -0,0 +1,23 @@
+// PR c++/79205
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+template  struct B;
+template  struct B { int b; };
+template

Re: [PATCH] Hard register asm constraint

2024-06-24 Thread Stefan Schulze Frielinghaus

Ping.

On Mon, Jun 10, 2024 at 07:19:19AM +0200, Stefan Schulze Frielinghaus wrote:
> Ping.
> 
> On Fri, May 24, 2024 at 11:13:12AM +0200, Stefan Schulze Frielinghaus wrote:
> > This implements hard register constraints for inline asm.  A hard register
> > constraint is of the form {regname} where regname is any valid register.  
> > This
> > basically renders register asm superfluous.  For example, the snippet
> > 
> > int test (int x, int y)
> > {
> >   register int r4 asm ("r4") = x;
> >   register int r5 asm ("r5") = y;
> >   unsigned int copy = y;
> >   asm ("foo %0,%1,%2" : "+d" (r4) : "d" (r5), "d" (copy));
> >   return r4;
> > }
> > 
> > could be rewritten into
> > 
> > int test (int x, int y)
> > {
> >   asm ("foo %0,%1,%2" : "+{r4}" (x) : "{r5}" (y), "d" (y));
> >   return x;
> > }
> > 
> > As a side-effect this also solves the problem of call-clobbered registers.
> > That being said, I was wondering whether we could utilize this feature in 
> > order
> > to get rid of local register asm automatically?  For example, converting
> > 
> > // Result will be in r2 on s390
> > extern int bar (void);
> > 
> > void test (void)
> > {
> >   register int x asm ("r2") = 42;
> >   bar ();
> >   asm ("foo %0\n" :: "r" (x));
> > }
> > 
> > into
> > 
> > void test (void)
> > {
> >   int x = 42;
> >   bar ();
> >   asm ("foo %0\n" :: "{r2}" (x));
> > }
> > 
> > in order to get rid of the limitation of call-clobbered registers which may
> > lead to subtle bugs---especially if you think of non-obvious calls e.g.
> > introduced by sanitizer/tracer/whatever.  Since such a transformation has 
> > the
> > potential to break existing code do you see any edge cases where this might 
> > be
> > problematic or even show stoppers?  Currently, even
> > 
> > int test (void)
> > {
> >   register int x asm ("r2") = 42;
> >   register int y asm ("r2") = 24;
> >   asm ("foo %0,%1\n" :: "r" (x), "r" (y));
> > }
> > 
> > is allowed which seems error prone to me.  Thus, if 100% backwards
> > compatibility would be required, then automatically converting every 
> > register
> > asm to the new mechanism isn't viable.  Still quite a lot could be 
> > transformed.
> > Any thoughts?
> > 
> > Currently I allow multiple alternatives as demonstrated by
> > gcc/testsuite/gcc.target/s390/asm-hard-reg-2.c.  However, since a hard 
> > register
> > constraint is pretty specific I could also think of erroring out in case of
> > alternatives.  Are there any real use cases out there for multiple
> > alternatives where one would like to use hard register constraints?
> > 
> > With the current implementation we have a "user visible change" in the sense
> > that for
> > 
> > void test (void)
> > {
> >   register int x asm ("r2") = 42;
> >   register int y asm ("r2") = 24;
> >   asm ("foo %0,%1\n" : "=r" (x), "=r" (y));
> > }
> > 
> > we do not get the error
> > 
> >   "invalid hard register usage between output operands"
> > 
> > anymore but rather
> > 
> >   "multiple outputs to hard register: %r2"
> > 
> > This is due to the error handling in gimplify_asm_expr ().  Speaking of 
> > errors,
> > I also error out earlier as before which means that e.g. in pr87600-2.c only
> > the first error is reported and processing is stopped afterwards which means
> > the subsequent tests fail.
> > 
> > I've been skimming through all targets and it looks to me as if none is 
> > using
> > curly brackets for their constraints.  Of course, I may have missed 
> > something.
> > 
> > Cheers,
> > Stefan
> > 
> > PS: Current state for Clang: https://reviews.llvm.org/D105142
> > 
> > ---
> >  gcc/cfgexpand.cc  |  42 ---
> >  gcc/genpreds.cc   |   4 +-
> >  gcc/gimplify.cc   | 115 +-
> >  gcc/lra-constraints.cc|  17 +++
> >  gcc/recog.cc  |  14 ++-
> >  gcc/stmt.cc   | 102 +++-
> >  gcc/stmt.h|  10 +-
> >  .../gcc.target/s390/asm-hard-reg-1.c  | 103 
> >  .../gcc.target/s390/asm-hard-reg-2.c  |  29 +
> >  .../gcc.target/s390/asm-hard-reg-3.c  |  24 
> >  gcc/testsuite/lib/scanasm.exp |   4 +
> >  11 files changed, 407 insertions(+), 57 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-3.c
> > 
> > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> > index 557cb28733b..47f71a2e803 100644
> > --- a/gcc/cfgexpand.cc
> > +++ b/gcc/cfgexpand.cc
> > @@ -2955,44 +2955,6 @@ expand_asm_loc (tree string, int vol, location_t 
> > locus)
> >emit_insn (body);
> >  }
> >  
> > -/* Return the number of times character C occurs in string S.  */
> > -static int
> > -n_occurrences (int c, const char *s)
> > -{
> > -  in

[committed][RISC-V] Fix some of the testsuite fallout from late-combine patch

2024-06-24 Thread Jeff Law

This fixes most, but not all of the testsuite fallout from the 
late-combine patch.  Specifically in the vector space we're often able 
to eliminate a broadcast of an scalar element across a vector.  That 
eliminates the vsetvl related to the broadcast, but more importantly 
from the testsuite standpoint it turns .vv forms into .vf or .vx forms.


There were two paths we could have taken here.  One to accept .v*, 
ignoring the actual register operands.  Or to create new matches for the 
.vx and .vf variants.  I selected the latter as I'd like us to know if 
the code to avoid the broadcast regresses.


I'm pushing this through now so that we've got cleaner results and to 
prevent duplicate work.  I've got patch for the rest of the testsuite 
fallout, but I want to think about them a bit.


Pushed to the trunk,
Jeffcommit 41ff74aa581ed38d04c46e6c8839eab48e1b63de
Author: Jeff Law 
Date:   Mon Jun 24 23:22:21 2024 -0600

[committed][RISC-V] Fix some of the testsuite fallout from late-combine 
patch

This fixes most, but not all of the testsuite fallout from the late-combine
patch.  Specifically in the vector space we're often able to eliminate a
broadcast of an scalar element across a vector.  That eliminates the vsetvl
related to the broadcast, but more importantly from the testsuite 
standpoint it
turns .vv forms into .vf or .vx forms.

There were two paths we could have taken here.  One to accept .v*, ignoring 
the
actual register operands.  Or to create new matches for the .vx and .vf
variants.  I selected the latter as I'd like us to know if the code to avoid
the broadcast regresses.

I'm pushing this through now so that we've got cleaner results and to 
prevent
duplicate work.  I've got patch for the rest of the testsuite fallout, but I
want to think about them a bit.

gcc/testsuite
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adjust
expected test output after late-combine changes.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: 
Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c: 
Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnm

Re: [PATCH 07/11] Handle structs and classes for CodeView

2024-06-24 Thread Jeff Law





On 6/17/24 6:17 PM, Mark Harmstone wrote:

Translates DW_TAG_structure_type DIEs into LF_STRUCTURE symbols, and
DW_TAG_class_type DIEs into LF_CLASS symbols.

 gcc/
 * dwarf2codeview.cc
 (struct codeview_type): Add is_fwd_ref member.
 (struct codeview_subtype): Add lf_member to union.
 (struct codeview_custom_type): Add lf_structure to union.
 (struct codeview_deferred_type): New structure.
 (deferred_types, last_deferred_type): New variables.
 (get_type_num): Add new args to prototype.
 (write_lf_fieldlist): Handle LF_MEMBER subtypes.
 (write_lf_structure): New function.
 (write_custom_types): Call write_lf_structure.
 (get_type_num_pointer_type): Add in_struct argument.
 (get_type_num_const_type): Likewise.
 (get_type_num_volatile_type): Likewise.
 (add_enum_forward_def): Fix get_type_num call.
 (get_type_num_enumeration_type): Add in-struct argument.
 (add_deferred_type, flush_deferred_types): New functions.
 (add_struct_forward_def, get_type_num_struct): Likewise.
 (get_type_num): Handle self-referential structs.
 (add_variable): Fix get_type_num call.
 (codeview_debug_early_finish): Call flush_deferred_types.
 * dwarf2codeview.h (LF_CLASS, LF_STRUCTURE, LF_MEMBER): Define.

Thanks.  I've pushed this to the trunk.
jeff

Re: [PATCH 05/11] Handle const and varible modifiers for CodeView

2024-06-24 Thread Jeff Law





On 6/24/24 8:49 PM, Mark Harmstone wrote:

On 24/6/24 04:39, Jeff Law wrote:


So presumably you're freeing these objects elsewhere?  I see the free 
(custom_types), but I don' see where you free an subobjects.  Did I 
miss something?


I'll go ahead and commit, but please double check for memory leaks.


Thanks Jeff. I just realized I wrote "varible" rather than "volatile" - 
ah well.

Trivially fixable with a follow-up patch.



See patch 4 - write_custom_types loops through the custom_types linked 
list, and removes and frees the head until it's empty.
Thanks.  I suspected it was the walk down that list, but better to just 
ask the author to be sure :-)  Thanks.


jeff

RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-24 Thread Li, Pan2

> Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
> should have
> though it through more.

Never mind, but you enlighten me for even more optimize with some restrictions. 
I revisited the pattern, for example as below.

uint16_t a, b;
uint8_t result = (uint8_t)(a >= b ? a - b : 0);

=> result = (char unsigned).SAT_SUB (a, b)

If a has a def like below
uint8_t other = 0x1f;
a = (uint8_t)other

then we can safely convert result = (char unsigned).SAT_SUB (a, b) to
result = .SAT_SUB ((char unsigned)a, (char unsigned).b)

Then we may have better vectorized code if a is limited to char unsigned. Of 
course we can do that based on this patch.

Pan

-Original Message-
From: Tamar Christina  
Sent: Tuesday, June 25, 2024 12:01 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
jeffreya...@gmail.com; pins...@gmail.com
Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

> -Original Message-
> From: Li, Pan2 
> Sent: Tuesday, June 25, 2024 3:25 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> Thanks Tamar for comments. It indeed benefits the vectorized code, for 
> example in
> RISC-V, we may eliminate some vsetvel insn in loop for widen here.
> 
> > iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int) 
> > b_12(D));
> > is cheaper than
> > iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> 
> I am not sure if it has any correctness problem for this transform, take 
> uint16_t to
> uint8_t as example.
> 
> uint16_t a, b;
> uint8_t result = (uint8_t)(a >= b ? a - b : 0);
> 
> Given a = 0x100; // 256
>b = 0xff; // 255
> For iftmp.0_5 = .SAT_SUB ((char unsigned) a, (char unsigned) b) = .SAT_SUB (0,
> 255) = 0
> For iftmp.0_5 = (char unsigned).SAT_SUB (a, b) = (char unsigned).SAT_SUB (256,
> 255) = 1
> 
> Please help to correct me if any misunderstanding, thanks again for 
> enlightening.

Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
should have
though it through more.

Tamar.
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, June 25, 2024 4:00 AM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> Hi,
> 
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Monday, June 24, 2024 2:55 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > jeffreya...@gmail.com; pins...@gmail.com; Pan Li 
> > Subject: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> >
> > From: Pan Li 
> >
> > The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> > truncated as below:
> >
> > void test (uint16_t *x, unsigned b, unsigned n)
> > {
> >   unsigned a = 0;
> >   register uint16_t *p = x;
> >
> >   do {
> > a = *--p;
> > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
> >   } while (--n);
> > }
> >
> > It will have gimple before vect pass,  it cannot hit any pattern of
> > SAT_SUB and then cannot vectorize to SAT_SUB.
> >
> > _2 = a_11 - b_12(D);
> > iftmp.0_13 = (short unsigned int) _2;
> > _18 = a_11 >= b_12(D);
> > iftmp.0_5 = _18 ? iftmp.0_13 : 0;
> >
> > This patch would like to improve the pattern match to recog above
> > as truncate after .SAT_SUB pattern.  Then we will have the pattern
> > similar to below,  as well as eliminate the first 3 dead stmt.
> >
> > _2 = a_11 - b_12(D);
> > iftmp.0_13 = (short unsigned int) _2;
> > _18 = a_11 >= b_12(D);
> > iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> >
> 
> I guess this is because one branch of the  cond is a constant so the
> convert is folded in.  I was wondering though,  can't we just push
> in the truncate in this case?
> 
> i.e. in this case we know both types are unsigned and the difference
> positive and max value is the max value of the truncate type.
> 
> It seems like folding as a general rule
> 
>   _1 = *p_10;
>   a_11 = (unsigned int) _1;
>   _2 = a_11 - b_12(D);
>   iftmp.0_13 = (short unsigned int) _2;
>   _18 = a_11 >= b_12(D);
>   iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>   *p_10 = iftmp.0_5;
> 
> Into
> 
>   _1 = *p_10;
>   a_11 = (unsigned int) _1;
>   _2 = ((short unsigned int) a_11) - ((short unsigned int) b_12(D));
>   iftmp.0_13 = _2;
>   _18 = a_11 >= b_12(D);
>   iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>   *p_10 = iftmp.0_5;
> 
> Is valid (though might have missed something).  This would negate the need for
> this change to the vectorizer and saturation detection
> but also should generate better ve

RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-24 Thread Tamar Christina

> -Original Message-
> From: Li, Pan2 
> Sent: Tuesday, June 25, 2024 7:06 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> > Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
> > should
> have
> > though it through more.
> 
> Never mind, but you enlighten me for even more optimize with some 
> restrictions. I
> revisited the pattern, for example as below.
> 
> uint16_t a, b;
> uint8_t result = (uint8_t)(a >= b ? a - b : 0);
> 
> => result = (char unsigned).SAT_SUB (a, b)
> 
> If a has a def like below
> uint8_t other = 0x1f;
> a = (uint8_t)other

You can in principle do this by querying range information,
e.g.

  gimple_ranger ranger;
  int_range_max r;
  if (ranger.range_of_expr (r, oprnd0, stmt) && !r.undefined_p ())
{
...

We do this for instance in vect_recog_divmod_pattern.

Tamar

> 
> then we can safely convert result = (char unsigned).SAT_SUB (a, b) to
> result = .SAT_SUB ((char unsigned)a, (char unsigned).b)
> 
> Then we may have better vectorized code if a is limited to char unsigned. Of 
> course
> we can do that based on this patch.
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, June 25, 2024 12:01 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> > -Original Message-
> > From: Li, Pan2 
> > Sent: Tuesday, June 25, 2024 3:25 AM
> > To: Tamar Christina ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > jeffreya...@gmail.com; pins...@gmail.com
> > Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> >
> > Thanks Tamar for comments. It indeed benefits the vectorized code, for 
> > example
> in
> > RISC-V, we may eliminate some vsetvel insn in loop for widen here.
> >
> > > iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int)
> b_12(D));
> > > is cheaper than
> > > iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> >
> > I am not sure if it has any correctness problem for this transform, take 
> > uint16_t
> to
> > uint8_t as example.
> >
> > uint16_t a, b;
> > uint8_t result = (uint8_t)(a >= b ? a - b : 0);
> >
> > Given a = 0x100; // 256
> >b = 0xff; // 255
> > For iftmp.0_5 = .SAT_SUB ((char unsigned) a, (char unsigned) b) = .SAT_SUB 
> > (0,
> > 255) = 0
> > For iftmp.0_5 = (char unsigned).SAT_SUB (a, b) = (char unsigned).SAT_SUB 
> > (256,
> > 255) = 1
> >
> > Please help to correct me if any misunderstanding, thanks again for 
> > enlightening.
> 
> Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
> should
> have
> though it through more.
> 
> Tamar.
> >
> > Pan
> >
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Tuesday, June 25, 2024 4:00 AM
> > To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > jeffreya...@gmail.com; pins...@gmail.com
> > Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> >
> > Hi,
> >
> > > -Original Message-
> > > From: pan2...@intel.com 
> > > Sent: Monday, June 24, 2024 2:55 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com;
> richard.guent...@gmail.com;
> > > jeffreya...@gmail.com; pins...@gmail.com; Pan Li 
> > > Subject: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> > >
> > > From: Pan Li 
> > >
> > > The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> > > truncated as below:
> > >
> > > void test (uint16_t *x, unsigned b, unsigned n)
> > > {
> > >   unsigned a = 0;
> > >   register uint16_t *p = x;
> > >
> > >   do {
> > > a = *--p;
> > > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
> > >   } while (--n);
> > > }
> > >
> > > It will have gimple before vect pass,  it cannot hit any pattern of
> > > SAT_SUB and then cannot vectorize to SAT_SUB.
> > >
> > > _2 = a_11 - b_12(D);
> > > iftmp.0_13 = (short unsigned int) _2;
> > > _18 = a_11 >= b_12(D);
> > > iftmp.0_5 = _18 ? iftmp.0_13 : 0;
> > >
> > > This patch would like to improve the pattern match to recog above
> > > as truncate after .SAT_SUB pattern.  Then we will have the pattern
> > > similar to below,  as well as eliminate the first 3 dead stmt.
> > >
> > > _2 = a_11 - b_12(D);
> > > iftmp.0_13 = (short unsigned int) _2;
> > > _18 = a_11 >= b_12(D);
> > > iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> > >
> >
> > I guess this is because one branch of the  cond is a constant so the
> > convert is folded in.

[PING][PATCH v2] rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]

2024-06-24 Thread Peter Bergner

Ping.   [Message-ID: <1e003d78-3b2e-4263-830a-7c00a3e9d...@linux.ibm.com>]

Peter


On 6/18/24 5:59 PM, Peter Bergner wrote:
> Updated patch.  This passed bootstrap and regtesting on powerpc64le-linux
> with no regressions.  Ok for trunk?
> 
> Changes from v1:
> 1. Moved the disabling of shrink-wrapping to rs6000_emit_prologue
>and beefed up comment.  Used a more accurate test.
> 2. Added comment to the test case on why rop_ok is needed.
> 
> Peter
> 
> 
> rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]
> 
> Only disable shrink-wrapping when using -mrop-protect when we know we
> will be emitting the ROP-protect hash instructions (ie, non-leaf functions).
> 
> 2024-06-17  Peter Bergner  
> 
> gcc/
>   PR target/114759
>   * config/rs6000/rs6000.cc (rs6000_override_options_after_change): Move
>   the disabling of shrink-wrapping from here
>   * config/rs6000/rs6000-logue.cc (rs6000_emit_prologue): ...to here.
> 
> gcc/testsuite/
>   PR target/114759
>   * gcc.target/powerpc/pr114759-1.c: New test.
> ---
>  gcc/config/rs6000/rs6000-logue.cc |  5 +
>  gcc/config/rs6000/rs6000.cc   |  4 
>  gcc/testsuite/gcc.target/powerpc/pr114759-1.c | 16 
>  3 files changed, 21 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114759-1.c
> 
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index 193e2122c0f..c384e48e378 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -3018,6 +3018,11 @@ rs6000_emit_prologue (void)
> && (lookup_attribute ("no_split_stack",
>   DECL_ATTRIBUTES 
> (cfun->decl))
> == NULL));
> +  /* If we are inserting ROP-protect hash instructions, disable shrink-wrap
> + until the bug where the hashst insn is emitted in the wrong location
> + is fixed.  See PR101324 for details.  */
> +  if (info->rop_hash_size)
> +flag_shrink_wrap = 0;
>  
>frame_pointer_needed_indeed
>  = frame_pointer_needed && df_regs_ever_live_p 
> (HARD_FRAME_POINTER_REGNUM);
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index e4dc629ddcc..fd6e013c346 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -3427,10 +3427,6 @@ rs6000_override_options_after_change (void)
>  }
>else if (!OPTION_SET_P (flag_cunroll_grow_size))
>  flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
> -
> -  /* If we are inserting ROP-protect instructions, disable shrink wrap.  */
> -  if (rs6000_rop_protect)
> -flag_shrink_wrap = 0;
>  }
>  
>  #ifdef TARGET_USES_LINUX64_OPT
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr114759-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr114759-1.c
> new file mode 100644
> index 000..579e08e920f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr114759-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -mrop-protect 
> -fdump-rtl-pro_and_epilogue" } */
> +/* { dg-require-effective-target rop_ok } Only enable on supported ABIs. */
> +
> +/* Verify we still attempt shrink-wrapping when using -mrop-protect
> +   and there are no function calls.  */
> +
> +long
> +foo (long arg)
> +{
> +  if (arg)
> +asm ("" ::: "r20");
> +  return 0;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 
> "pro_and_epilogue" } } */

RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-24 Thread Li, Pan2

Got it, thanks Tamer, will have a try.

Pan

-Original Message-
From: Tamar Christina  
Sent: Tuesday, June 25, 2024 2:11 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
jeffreya...@gmail.com; pins...@gmail.com
Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

> -Original Message-
> From: Li, Pan2 
> Sent: Tuesday, June 25, 2024 7:06 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> > Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
> > should
> have
> > though it through more.
> 
> Never mind, but you enlighten me for even more optimize with some 
> restrictions. I
> revisited the pattern, for example as below.
> 
> uint16_t a, b;
> uint8_t result = (uint8_t)(a >= b ? a - b : 0);
> 
> => result = (char unsigned).SAT_SUB (a, b)
> 
> If a has a def like below
> uint8_t other = 0x1f;
> a = (uint8_t)other

You can in principle do this by querying range information,
e.g.

  gimple_ranger ranger;
  int_range_max r;
  if (ranger.range_of_expr (r, oprnd0, stmt) && !r.undefined_p ())
{
...

We do this for instance in vect_recog_divmod_pattern.

Tamar

> 
> then we can safely convert result = (char unsigned).SAT_SUB (a, b) to
> result = .SAT_SUB ((char unsigned)a, (char unsigned).b)
> 
> Then we may have better vectorized code if a is limited to char unsigned. Of 
> course
> we can do that based on this patch.
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, June 25, 2024 12:01 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> > -Original Message-
> > From: Li, Pan2 
> > Sent: Tuesday, June 25, 2024 3:25 AM
> > To: Tamar Christina ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > jeffreya...@gmail.com; pins...@gmail.com
> > Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> >
> > Thanks Tamar for comments. It indeed benefits the vectorized code, for 
> > example
> in
> > RISC-V, we may eliminate some vsetvel insn in loop for widen here.
> >
> > > iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int)
> b_12(D));
> > > is cheaper than
> > > iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> >
> > I am not sure if it has any correctness problem for this transform, take 
> > uint16_t
> to
> > uint8_t as example.
> >
> > uint16_t a, b;
> > uint8_t result = (uint8_t)(a >= b ? a - b : 0);
> >
> > Given a = 0x100; // 256
> >b = 0xff; // 255
> > For iftmp.0_5 = .SAT_SUB ((char unsigned) a, (char unsigned) b) = .SAT_SUB 
> > (0,
> > 255) = 0
> > For iftmp.0_5 = (char unsigned).SAT_SUB (a, b) = (char unsigned).SAT_SUB 
> > (256,
> > 255) = 1
> >
> > Please help to correct me if any misunderstanding, thanks again for 
> > enlightening.
> 
> Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
> should
> have
> though it through more.
> 
> Tamar.
> >
> > Pan
> >
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Tuesday, June 25, 2024 4:00 AM
> > To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > jeffreya...@gmail.com; pins...@gmail.com
> > Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> >
> > Hi,
> >
> > > -Original Message-
> > > From: pan2...@intel.com 
> > > Sent: Monday, June 24, 2024 2:55 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com;
> richard.guent...@gmail.com;
> > > jeffreya...@gmail.com; pins...@gmail.com; Pan Li 
> > > Subject: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> > >
> > > From: Pan Li 
> > >
> > > The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> > > truncated as below:
> > >
> > > void test (uint16_t *x, unsigned b, unsigned n)
> > > {
> > >   unsigned a = 0;
> > >   register uint16_t *p = x;
> > >
> > >   do {
> > > a = *--p;
> > > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
> > >   } while (--n);
> > > }
> > >
> > > It will have gimple before vect pass,  it cannot hit any pattern of
> > > SAT_SUB and then cannot vectorize to SAT_SUB.
> > >
> > > _2 = a_11 - b_12(D);
> > > iftmp.0_13 = (short unsigned int) _2;
> > > _18 = a_11 >= b_12(D);
> > > iftmp.0_5 = _18 ? iftmp.0_13 : 0;
> > >
> > > This patch would like to improve the pattern match to recog above
> > > as truncate after .SAT_SUB pattern.  Then we w

RE: [PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-06-24 Thread Kong, Lingling

Hi,


Gently ping for this.

This version has removed the target hook and added a new optab for cfcmov.

Thanks,
Lingling

From: Kong, Lingling 
Sent: Tuesday, June 18, 2024 3:41 PM
To: gcc-patches@gcc.gnu.org
Cc: Alexander Monakov ; Uros Bizjak ; 
lingling.ko...@gmail.com; Hongtao Liu ; Jeff Law 
; Richard Biener 
Subject: [PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass


APX CFCMOV feature implements conditionally faulting which means

that all memory faults are suppressed when the condition code

evaluates to false and load or store a memory operand. Now we

could load or store a memory operand may trap or fault for

conditional move.



In middle-end, now we don't support a conditional move if we knew

that a load from A or B could trap or fault. To enable CFCMOV, we

added a new optab.



Conditional move suppress_fault for condition mem store would not

move any arithmetic calculations. For condition mem load now just

support a conditional move one trap mem and one no trap and no mem

cases.



gcc/ChangeLog:



   * ifcvt.cc (noce_try_cmove_load_mem_notrap): Allow convert

   to cfcmov for conditional load.

   (noce_try_cmove_store_mem_notrap): Convert to conditional store.

   (noce_process_if_block): Ditto.

   * optabs.def (OPTAB_D): New optab.

---

gcc/ifcvt.cc   | 246 -

gcc/optabs.def |   1 +

2 files changed, 246 insertions(+), 1 deletion(-)



diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc

index 58ed42673e5..65c069b8cc6 100644

--- a/gcc/ifcvt.cc

+++ b/gcc/ifcvt.cc

@@ -783,6 +783,8 @@ static rtx noce_emit_cmove (struct noce_if_info *, rtx, 
enum rtx_code, rtx,

 rtx, rtx, rtx, rtx = NULL, rtx 
= NULL);

static bool noce_try_cmove (struct noce_if_info *);

static bool noce_try_cmove_arith (struct noce_if_info *);

+static bool noce_try_cmove_load_mem_notrap (struct noce_if_info *);

+static bool noce_try_cmove_store_mem_notrap (struct noce_if_info *, rtx *, 
rtx);

static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);

static bool noce_try_minmax (struct noce_if_info *);

static bool noce_try_abs (struct noce_if_info *);

@@ -2401,6 +2403,233 @@ noce_try_cmove_arith (struct noce_if_info *if_info)

   return false;

}



+/* When target support suppress memory fault, try more complex cases involving

+   conditional_move's source or dest may trap or fault.  */

+

+static bool

+noce_try_cmove_load_mem_notrap (struct noce_if_info *if_info)

+{

+  rtx a = if_info->a;

+  rtx b = if_info->b;

+  rtx x = if_info->x;

+

+  if (MEM_P (x))

+return false;

+  /* Just handle a conditional move from one trap MEM + other non_trap,

+ non mem cases.  */

+  if (!(MEM_P (a) ^ MEM_P (b)))

+  return false;

+  bool a_trap = may_trap_or_fault_p (a);

+  bool b_trap = may_trap_or_fault_p (b);

+

+  if (!(a_trap ^ b_trap))

+return false;

+  if (a_trap && !MEM_P (a))

+return false;

+  if (b_trap && !MEM_P (b))

+return false;

+

+  rtx orig_b;

+  rtx_insn *insn_a, *insn_b;

+  bool a_simple = if_info->then_simple;

+  bool b_simple = if_info->else_simple;

+  basic_block then_bb = if_info->then_bb;

+  basic_block else_bb = if_info->else_bb;

+  rtx target;

+  enum rtx_code code;

+  rtx cond = if_info->cond;

+  rtx_insn *ifcvt_seq;

+

+  /* if (test) x = *a; else x = c - d;

+ => x = c - d;

+ if (test)

+   x = *a;

+  */

+

+  code = GET_CODE (cond);

+  insn_a = if_info->insn_a;

+  insn_b = if_info->insn_b;

+  machine_mode x_mode = GET_MODE (x);

+

+  /* Because we only handle one trap MEM + other non_trap, non mem cases,

+ just move one trap MEM always in then_bb.  */

+  if (noce_reversed_cond_code (if_info) != UNKNOWN)

+{

+  bool reversep = false;

+  if (b_trap)

+ reversep = true;

+

+  if (reversep)

+ {

+   if (if_info->rev_cond)

+ {

+   cond = if_info->rev_cond;

+   code = GET_CODE (cond);

+ }

+   else

+ code = reversed_comparison_code (cond, if_info->jump);

+   std::swap (a, b);

+   std::swap (insn_a, insn_b);

+   std::swap (a_simple, b_simple);

+   std::swap (then_bb, else_bb);

+ }

+}

+

+  if (then_bb && else_bb

+  && (!bbs_ok_for_cmove_arith (then_bb, else_bb,  if_info->orig_x)

+   || !bbs_ok_for_cmove_arith (else_bb, then_bb,  
if_info->orig_x)))

+return false;

+

+  start_sequence ();

+

+  /* If one of the blocks is empty then the corresponding B or A value

+ came from the test block.  The non-empty complex block that we will

+ emit might clobber the register used by B or A, so move it to a pseudo

+ first.  */

+

+  rtx tmp_b = NULL_RTX;

+

+  /* Don't move trap m

79 matches

Mail list logo