[PATCH] middle-end/32667 - document cpymem and memcpy exact overlap requirement

2023-11-23 Thread Richard Biener
The following amends the cpymem documentation to mention that exact
overlap needs to be handled gracefully, also noting that the target
runtime is expected to behave the same way.

OK?

Thanks,
Richard.

PR middle-end/32667
* md.texi (cpymem): Document that exact overlap of source
and destination needs to work.  Mention the target runtime
may not treat this case as undefined.
---
 gcc/doc/md.texi | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index df6725ffc9c..8743b393b3c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6915,8 +6915,11 @@ individually copied data units in the block.
 
 The @code{cpymem@var{m}} patterns need not give special consideration
 to the possibility that the source and destination strings might
-overlap. These patterns are used to do inline expansion of
-@code{__builtin_memcpy}.
+overlap.  An exeption is the case where source and destination are
+equal.  These patterns are used to do inline expansion of
+@code{__builtin_memcpy}.  The target runtime is expected to handle
+the case of an exact overlap of source and destination gracefully
+as GCC does not consider that undefined behavior.
 
 @cindex @code{movmem@var{m}} instruction pattern
 @item @samp{movmem@var{m}}
-- 
2.35.3


Re: [PATCH] middle-end/32667 - document cpymem and memcpy exact overlap requirement

2023-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2023 at 08:00:49AM +, Richard Biener wrote:
> The following amends the cpymem documentation to mention that exact
> overlap needs to be handled gracefully, also noting that the target
> runtime is expected to behave the same way.
> 
> OK?
> 
> Thanks,
> Richard.
> 
>   PR middle-end/32667
>   * md.texi (cpymem): Document that exact overlap of source
>   and destination needs to work.  Mention the target runtime
>   may not treat this case as undefined.

The first added sentence is ok, for the second see the spot
Florian mentioned in the PR.

Jakub



Re: [pushed][PATCH] LoongArch: Increase cost of vector aligned store/load.

2023-11-23 Thread Jiahao Xu



在 2023/11/19 上午2:25, Xi Ruoyao 写道:

On Fri, 2023-11-17 at 10:21 +0800, chenglulu wrote:

Pushed to r14-5545.

在 2023/11/16 下午4:44, Jiahao Xu 写道:

Based on SPEC2017 performance evaluation results, it's better to make them equal
to the cost of unaligned store/load so as to avoid odd alignment peeling.

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Adjust.

/* snip */


+  case vector_load:
+  case vector_store:
     case unaligned_load:
     case unaligned_store:
    return 2;

It seems penalizing the vectorizing and causes:

FAIL: gcc.target/loongarch/vector/lasx/lasx-xvstelm.c  -mlasx  
scan-assembler-times xvstelm.w 8

Maybe we can make unaligned_load and unaligned_store cost 1 too instead
of increasing vector_load and vector_store?

If make the cost of unaligned_load and unaligned_store cost 1 too, it 
will result in a 1.5% overall performance decrease in SPEC CPU 2017 
fprate on 3A6000. Therefore, I don't think it's a good idea. The 
lasx-xvstelm.c file tests whether the vec_extract operation and fst can 
be combined into xvstelm.I think it would be better to add a compilation 
option, such as "-fno-vect-cost-model," to make it unaffected by the 
cost model.




RE: [PATCH v2] gcov: Fix integer types in gen_counter_update()

2023-11-23 Thread Jiang, Haochen
> -Original Message-
> From: Sebastian Huber 
> Sent: Wednesday, November 22, 2023 10:24 PM
> To: Christophe Lyon 
> Cc: Jakub Jelinek ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH v2] gcov: Fix integer types in gen_counter_update()
> 
> On 22.11.23 15:22, Christophe Lyon wrote:
> > On Tue, 21 Nov 2023 at 12:22, Sebastian Huber
> >   wrote:
> >> On 21.11.23 11:46, Jakub Jelinek wrote:
> >>> On Tue, Nov 21, 2023 at 11:42:06AM +0100, Sebastian Huber wrote:
>  On 21.11.23 11:34, Jakub Jelinek wrote:
> >> --- a/gcc/tree-profile.cc
> >> +++ b/gcc/tree-profile.cc
> >> @@ -281,10 +281,13 @@ gen_assign_counter_update
> (gimple_stmt_iterator *gsi, gcall *call, tree func,
> >>   if (result)
> >> {
> >>   tree result_type = TREE_TYPE (TREE_TYPE (func));
> >> -  tree tmp = make_temp_ssa_name (result_type, NULL, name);
> >> -  gimple_set_lhs (call, tmp);
> >> +  tree tmp1 = make_temp_ssa_name (result_type, NULL, name);
> >> +  gimple_set_lhs (call, tmp1);
> >>   gsi_insert_after (gsi, call, GSI_NEW_STMT);
> >> -  gassign *assign = gimple_build_assign (result, tmp);
> >> +  tree tmp2 = make_ssa_name (TREE_TYPE (result));
> >> +  gassign *assign = gimple_build_assign (tmp2, NOP_EXPR, tmp1);
> >> +  gsi_insert_after (gsi, assign, GSI_NEW_STMT);
> >> +  assign = gimple_build_assign (result, gimple_assign_lhs 
> >> (assign));
> > When you use a temporary tmp2 for the lhs of the conversion, you can
> just
> > use it here,
> >  assign = gimple_build_assign (result, tmp2);
> >
> > Ok for trunk with that change.
>  Just a question, could I also use
> 
>  tree tmp2 = make_temp_ssa_name (TREE_TYPE (result), NULL, name);
> 
>  ?
> 
>  This make_temp_ssa_name() is used throughout the file and the new
>  make_ssa_name() would be the first use in this file.
> >>> Yes.  The only difference is that it won't be _234 = (type) something;
> >>> but PROF_time_profile_234 = (type) something; in the dumps, but sure,
> >>> consistency is useful.
> >> Thanks for your help. I checked in an updated version.
> >>
> > Our CI bisected a regression to this commit:
> > Running gcc:gcc.dg/tree-prof/tree-prof.exp ...
> > FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile
> > "Read tp_first_run: 0" 1
> > FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile
> > "Read tp_first_run: 2" 1
> >
> > (on aarch64)
> >
> > Can you check?
> 
> Yes, I will have a look at it.

The same issue also happened on i386. You can also reproduce that on
x86-64 platforms.

Thx,
Haochen

> 
> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
> 
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-23 Thread chenglulu



在 2023/11/23 下午3:31, chenglulu 写道:


在 2023/11/23 下午3:11, Xi Ruoyao 写道:

On Thu, 2023-11-23 at 14:35 +0800, chenglulu wrote:

Hi,

   I don’t quite understand this part. Is it because define_insn 
would be

duplicated with the above implementation,

so define_insn_and_split is used?

Yes, but if you think duplicating the above implementation is better I
can dup it as well (as it's just a single line).

(I wrote it as a define_expand but it didn't work, then I modified it to
define_insn_and_split).


I just thought it was weird when I was looking at the code.

I modified this code to use define_expand:

    (define_expand "fix_trunc2"
      [(set (match_operand: 0 "register_operand" "=f")
        (fix: (match_operand:FVEC 1 "register_operand" 
"f")))]

      ""
      {
        emit_insn 
(gen__vftintrz__ (

      operands[0], operands[1]));
        DONE;
      }
      [(set_attr "type" "simd_fcvt")
       (set_attr "mode" "")])

Here are my test cases:

    typedef float __attribute__ ((mode (SF))) float_t;
    typedef int __attribute__ ((mode (SI))) int_t;

    extern int_t v[4];
    int_t
    lt_fixdfsi (float_t *x)
    {

      for (int i=0;i<4;i++)
        v[i] = x[i];
    }

This still achieves the desired effect, generating the following 
assembly code:


lt_fixdfsi:
.LFB0 = .
    .cfi_startproc

    or    $r13,$r4,$r0     # 16    [c=4 l=4]  *movdi_64bit/0
    la.global    $r12,v     # 8    [c=4 l=12]  *movdi_64bit/1
    vld    $vr0,$r13,0     # 6    [c=12 l=4]  movv4sf_lsx/1
    vftintrz.w.s    $vr0,$vr0     # 7    [c=12 l=4] lsx_vftintrz_w_s
    vst    $vr0,$r12,0     # 9    [c=4 l=4]  movv4si_lsx/2

So I don't know if I'm getting it right?:-(


The fix_truncv4sfv4si2 template is indeed called when debugging with gdb.

So I think we can use define_expand here.


+(define_insn_and_split "fix_trunc2"
+  [(set (match_operand: 0 "register_operand" "=f")
+    (fix: (match_operand:FVEC 1 "register_operand" "f")))]
+  ""
+  "#"
+  ""
+  [(const_int 0)]
+  {
+    emit_insn 
(gen__vftintrz__ (

+  operands[0], operands[1]));
+    DONE;
+  }
+  [(set_attr "type" "simd_fcvt")
+   (set_attr "mode" "")])




Re: [PATCH] middle-end/32667 - document cpymem and memcpy exact overlap requirement

2023-11-23 Thread Richard Biener
On Thu, 23 Nov 2023, Jakub Jelinek wrote:

> On Thu, Nov 23, 2023 at 08:00:49AM +, Richard Biener wrote:
> > The following amends the cpymem documentation to mention that exact
> > overlap needs to be handled gracefully, also noting that the target
> > runtime is expected to behave the same way.
> > 
> > OK?
> > 
> > Thanks,
> > Richard.
> > 
> > PR middle-end/32667
> > * md.texi (cpymem): Document that exact overlap of source
> > and destination needs to work.  Mention the target runtime
> > may not treat this case as undefined.
> 
> The first added sentence is ok, for the second see the spot
> Florian mentioned in the PR.

Like this?

Thanks,
Richard.

>From 93f4d22374ad2ea8bb5821083d2422c8b0a3313b Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 23 Nov 2023 08:54:56 +0100
Subject: [PATCH] middle-end/32667 - document cpymem and memcpy exact overlap
 requirement
To: gcc-patches@gcc.gnu.org

The following amends the cpymem documentation to mention that exact
overlap needs to be handled gracefully, also noting that the target
runtime is expected to behave the same way where -ffreestanding
docs mention the set of routines required.

PR middle-end/32667
* md.texi (cpymem): Document that exact overlap of source
and destination needs to work.
* standards.texi (ffreestanding): Mention memcpy is required
to handle the exact overlap case.
---
 gcc/doc/md.texi| 5 +++--
 gcc/doc/standards.texi | 4 +++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index df6725ffc9c..87e1c9ed20e 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6915,8 +6915,9 @@ individually copied data units in the block.
 
 The @code{cpymem@var{m}} patterns need not give special consideration
 to the possibility that the source and destination strings might
-overlap. These patterns are used to do inline expansion of
-@code{__builtin_memcpy}.
+overlap.  An exeption is the case where source and destination are
+equal, this case needs to be handled correctly.
+These patterns are used to do inline expansion of @code{__builtin_memcpy}.
 
 @cindex @code{movmem@var{m}} instruction pattern
 @item @samp{movmem@var{m}}
diff --git a/gcc/doc/standards.texi b/gcc/doc/standards.texi
index 4eb43f14f90..6eebb9426f3 100644
--- a/gcc/doc/standards.texi
+++ b/gcc/doc/standards.texi
@@ -184,7 +184,9 @@ GNU C library).  @xref{Standard Libraries,,Standard 
Libraries}.
 Most of the compiler support routines used by GCC are present in
 @file{libgcc}, but there are a few exceptions.  GCC requires the
 freestanding environment provide @code{memcpy}, @code{memmove},
-@code{memset} and @code{memcmp}.
+@code{memset} and @code{memcmp}.  Contrary to the standards
+covering @code{memcpy} GCC expects the case of an exact overlap
+of source and destination to work and not invoke undefined behavior.
 Finally, if @code{__builtin_trap} is used, and the target does
 not implement the @code{trap} pattern, then GCC emits a call
 to @code{abort}.
-- 
2.35.3



Re: libstdc++: Speed up push_back

2023-11-23 Thread Matthias Kretz
On Sunday, 19 November 2023 22:53:37 CET Jan Hubicka wrote:
> Sadly it is really hard to work out this
> from IPA passes, since we basically care whether the iterator points to
> the same place as the end pointer, which are both passed by reference.
> This is inter-procedural value numbering that is quite out of reach.

I've done a fair share of branching on __builtin_constant_p in 
std::experimental::simd to improve code-gen. It's powerful! But maybe we 
also need the other side of the story to tell the optimizer: "I know you 
can't const-prop everything; but this variable / expression, even if you 
need to put in a lot of effort, the performance difference will be worth 
it."

For std::vector, the remaining capacity could be such a value. The 
functions f() and g() are equivalent (their code-gen isn't https://
compiler-explorer.com/z/r44ejK1qz):

#include 

auto
f()
{
  std::vector x;
  x.reserve(10);
  for (int i = 0; i < 10; ++i)
x.push_back(0);
  return x;
}

auto
g()
{ return std::vector(10, 0); }

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
 std::simd
──


Re: [PATCH] tree: Fix up try_catch_may_fallthru [PR112619]

2023-11-23 Thread Richard Biener
On Wed, 22 Nov 2023, Jakub Jelinek wrote:

> On Wed, Nov 22, 2023 at 01:21:12PM +0100, Jakub Jelinek wrote:
> > So, pedantically perhaps just assuming TRY_CATCH_EXPR where second argument
> > is not STATEMENT_LIST to be the CATCH_EXPR/EH_FILTER_EXPR case could work
> > for C++, but there are other FEs and it would be fragile (and weird, given
> > that STATEMENT_LIST with single stmt in it vs. that stmt ought to be
> > generally interchangeable).
> 
> Looking at other FE, e.g. go/go-gcc.cc clearly has:
> stat_tree = build2_loc(location.gcc_location(), TRY_CATCH_EXPR,
>void_type_node, stat_tree,
>build2_loc(location.gcc_location(), CATCH_EXPR,
>   void_type_node, NULL, except_tree));
> so CATCH_EXPR is immediately the second operand of TRY_CATCH_EXPR.
> d/toir.cc has:
> /* Back-end expects all catches in a TRY_CATCH_EXPR to be enclosed in a
>statement list, however pop_stmt_list may optimize away the list
>if there is only a single catch to push.  */
> if (TREE_CODE (catches) != STATEMENT_LIST)
>   {
> tree stmt_list = alloc_stmt_list ();
> append_to_statement_list_force (catches, &stmt_list);
> catches = stmt_list;
>   }
> 
> add_stmt (build2 (TRY_CATCH_EXPR, void_type_node, trybody, catches));
> so I assume it run into the try_catch_may_fallthru issue (because gimplifier
> clearly doesn't require that).
> rust/rust-gcc.cc copies go-gcc.cc and also creates CATCH_EXPR directly in
> TRY_CATCH_EXPR's operand.
> 
> Note, the only time one runs into the ICE is when the first operand (i.e.
> try body) doesn't fall thru, otherwise the function returns true early.

OK, I think this suggests we should be more forgiving here, meaning
your patch is OK.  Unless Jason has any additional comments today.

Thanks,
Richard.


Re: Re: [PATCH] gimple-vr-values:Add constraint for gimple-cond optimization

2023-11-23 Thread Feng Wang
On 2023-11-23 14:34 Andrew Pinski wrote:



>



>On Wed, Nov 22, 2023 at 10:07 PM Feng Wang  wrote:



>>



>> This patch add another condition for gimple-cond optimization. Refer to



>> the following test case.



>> int foo1 (int data, int res)



>> {



>>   res = data & 0xf;



>>   res |= res << 4;



>>   if (res < 0x22)



>> return 0x22;



>>   return res;



>> }



>> with the compilation flag "-march=rv64gc_zba_zbb -mabi=lp64d -O2",



>> before this patch the compilation result is



>> foo1:



>> andi    a0,a0,15



>> slliw   a5,a0,4



>> addw    a3,a5,a0



>> li  a4,33



>> add a0,a5,a0



>> bleu    a3,a4,.L5



>> ret



>> .L5:



>> li  a0,34



>> ret



>> after this patch the compilation result is



>> foo1:



>> andi    a0,a0,15



>> slliw   a5,a0,4



>> add a5,a5,a0



>> li  a0,34



>> max a0,a5,a0



>> ret



>> The reason is in the pass_early_vrp, the arg0 of gimple_cond



>> is replaced,but the PHI node still use the arg0.



>> The some of evrp pass logs are as follows



>>  gimple_assign 



>>   gimple_assign 



>>   gimple_cond 



>> goto ; [INV]



>>   else



>> goto ; [INV]



>>



>>    :



>>   // predicted unlikely by early return (on trees) predictor.



>>



>>    :



>>   # gimple_phi <_2, 34(3), res_5(2)>



>> The arg0 of gimple_cond is replaced by _9,but the gimple_phi still



>> uses res_5,it will cause optimization fail of PHI node to MAX_EXPR.



>> So the next_use_is_phi is added to control the replacement.



>



>I don't think this is the correct appoarch here.



>We end up with the same original issue if we had wrote it like:



>```



>int foo1 (int data, int res)



>{



>  res = data & 0xf;



>  unsigned int r = res;



>  r*=17;



>  res = r;



>  if (r < 0x22)



>    return 0x22;



>  return res;



>}



>```



>I suspect instead we should extend the match.pd patterns to match this max.



>We should be able to extend:



>```



>(for cmp (lt le gt ge eq ne)



> (simplify



>  (cond (cmp (convert1? @1) INTEGER_CST@3) (convert2? @1) INTEGER_CST@2)



>  (with



>```



>To match instead by changing the second @1 with @4 and then using



>bitwise_equal_p . If @1 != @4 but bitwise_equal_p is true, you need to



>make sure the outer convert1/convert2 are nop conversions so that you



>get the same extension I think ...



>



>Note you could instead improve minmax_replacement but I have been in



>the process of moving those changes to match.pd.



>



>Thanks,



>Andrew Pinski

Thanks for your feedback. The minmax replacement happens in phiopt pass, there 
is one condition
that requires the "arg_false"(from PHI node) should be same with "smaller"(from 
gimple_cond).
So I made this change. But as you said, this modification is not very suitable, 
and I have not considered
it comprehensively. I'm not very familiar with match.pd, can it solve this 
judgment problem?
Thanks,
Feng Wang

>



>>



>> gcc/ChangeLog:



>>



>> * vr-values.cc (next_use_is_phi):



>> (simplify_using_ranges::simplify_casted_compare):



>> add new function next_use_is_phi to control the replacement.



>>



>> gcc/testsuite/ChangeLog:



>>



>> * gcc.target/riscv/zbb-min-max-04.c: New test.



>> ---



>>  gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c | 14 ++



>>  gcc/vr-values.cc    | 15 ++-



>>  2 files changed, 28 insertions(+), 1 deletion(-)



>>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c



>>



>> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c 
>> b/gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c



>> new file mode 100644



>> index 000..8c3e87a35e0



>> --- /dev/null



>> +++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c



>> @@ -0,0 +1,14 @@



>> +/* { dg-do compile } */



>> +/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64d" } */



>> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */



>> +



>> +int foo1 (int data, int res)



>> +{



>> +  res = data & 0xf;



>> +  res |= res << 4;



>> +  if (res < 0x22)



>> +    return 0x22;



>> +  return res;



>> +}



>> +



>> +/* { dg-final { scan-assembler-times "max\t" 1 } } */



>> \ No newline at end of file



>> diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc



>> index ecb294131b0..1f7a727c638 100644



>> --- a/gcc/vr-values.cc



>> +++ b/gcc/vr-values.cc



>> @@ -1263,6 +1263,18 @@ 
>> simplify_using_ranges::simplify_compare_using_ranges_1 (tree_code 
>> &cond_code, tr



>>    return happened;



>>  }



>>



>> +/* Return true if the next use of SSA_NAME is PHI node */



>> +bool



>> +next_use_is_phi (tree arg)



>> +{



>> +  use_operand_p imm = &(SSA_NAME_IMM_USE_NODE (arg));



>> +  use_operand_p next = imm->next;



>> +  if (nex

Re: [PATCH v2] gcov: Fix integer types in gen_counter_update()

2023-11-23 Thread Sebastian Huber

On 23.11.23 09:11, Jiang, Haochen wrote:

-Original Message-
From: Sebastian Huber
Sent: Wednesday, November 22, 2023 10:24 PM
To: Christophe Lyon
Cc: Jakub Jelinek;gcc-patches@gcc.gnu.org
Subject: Re: [PATCH v2] gcov: Fix integer types in gen_counter_update()

On 22.11.23 15:22, Christophe Lyon wrote:

On Tue, 21 Nov 2023 at 12:22, Sebastian Huber
   wrote:

On 21.11.23 11:46, Jakub Jelinek wrote:

On Tue, Nov 21, 2023 at 11:42:06AM +0100, Sebastian Huber wrote:

On 21.11.23 11:34, Jakub Jelinek wrote:

--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -281,10 +281,13 @@ gen_assign_counter_update

(gimple_stmt_iterator *gsi, gcall *call, tree func,

   if (result)
 {
   tree result_type = TREE_TYPE (TREE_TYPE (func));
-  tree tmp = make_temp_ssa_name (result_type, NULL, name);
-  gimple_set_lhs (call, tmp);
+  tree tmp1 = make_temp_ssa_name (result_type, NULL, name);
+  gimple_set_lhs (call, tmp1);
   gsi_insert_after (gsi, call, GSI_NEW_STMT);
-  gassign *assign = gimple_build_assign (result, tmp);
+  tree tmp2 = make_ssa_name (TREE_TYPE (result));
+  gassign *assign = gimple_build_assign (tmp2, NOP_EXPR, tmp1);
+  gsi_insert_after (gsi, assign, GSI_NEW_STMT);
+  assign = gimple_build_assign (result, gimple_assign_lhs (assign));

When you use a temporary tmp2 for the lhs of the conversion, you can

just

use it here,
  assign = gimple_build_assign (result, tmp2);

Ok for trunk with that change.

Just a question, could I also use

tree tmp2 = make_temp_ssa_name (TREE_TYPE (result), NULL, name);

?

This make_temp_ssa_name() is used throughout the file and the new
make_ssa_name() would be the first use in this file.

Yes.  The only difference is that it won't be _234 = (type) something;
but PROF_time_profile_234 = (type) something; in the dumps, but sure,
consistency is useful.

Thanks for your help. I checked in an updated version.


Our CI bisected a regression to this commit:
Running gcc:gcc.dg/tree-prof/tree-prof.exp ...
FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile
"Read tp_first_run: 0" 1
FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile
"Read tp_first_run: 2" 1

(on aarch64)

Can you check?

Yes, I will have a look at it.

The same issue also happened on i386. You can also reproduce that on
x86-64 platforms.


I was able to reproduce it using a native aarch64 GCC on cfarm185, but I 
have some difficulties to understand what this test case does actually.


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact

2023-11-23 Thread chenglulu

I tested it and it was fine. I never knew this could be used like this.

Thank you!

在 2023/11/20 上午8:47, Xi Ruoyao 写道:

No functional change, just a cleanup.

gcc/ChangeLog:

* config/loongarch/loongarch.md (lrint_allow_inexact): Remove.
(2): Check if 
== UNSPEC_FTINT instead of .
---
  gcc/config/loongarch/loongarch.md | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 78ed63f2132..1e019815451 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -585,9 +585,6 @@ (define_int_attr lrint_pattern [(UNSPEC_FTINT "lrint")
  (define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
(UNSPEC_FTINTRM "rm")
(UNSPEC_FTINTRP "rp")])
-(define_int_attr lrint_allow_inexact [(UNSPEC_FTINT "1")
- (UNSPEC_FTINTRM "0")
- (UNSPEC_FTINTRP "0")])
  
  ;; Iterator and attributes for bytepick.d

  (define_int_iterator bytepick_w_ashift_amount [8 16 24])
@@ -2384,7 +2381,7 @@ (define_insn "2"
(unspec:ANYFI [(match_operand:ANYF 1 "register_operand" "f")]
  LRINT))]
"TARGET_HARD_FLOAT &&
-   (
+   ( == UNSPEC_FTINT
  || flag_fp_int_builtin_inexact
  || !flag_trapping_math)"
"ftint.. %0,%1"




Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-23 Thread Richard Biener
On Thu, Nov 23, 2023 at 4:02 AM Kewen.Lin  wrote:
>
> on 2023/11/22 18:25, Richard Biener wrote:
> > On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin  wrote:
> >>
> >> on 2023/11/17 20:55, Alexander Monakov wrote:
> >>>
> >>> On Fri, 17 Nov 2023, Kewen.Lin wrote:
> > I don't think you can run cleanup_cfg after sched_init. I would suggest
> > to put it early in schedule_insns.
> 
>  Thanks for the suggestion, I placed it at the beginning of 
>  haifa_sched_init
>  instead, since schedule_insns invokes haifa_sched_init, although the
>  calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
>  ahead but they are all "setup" functions, shouldn't affect or be affected
>  by this placement.
> >>>
> >>> I was worried because sched_init invokes df_analyze, and I'm not sure if
> >>> cfg_cleanup can invalidate it.
> >>
> >> Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
> >> considers df, like compact_blocks checks df, try_optimize_cfg invokes
> >> df_analyze etc., but I agree that moving cleanup_cfg before sched_init
> >> makes more sense.
> >>
> >>>
> > I suspect this may be caused by invoking cleanup_cfg too late.
> 
>  By looking into some failures, I found that although cleanup_cfg is 
>  executed
>  there would be still some empty blocks left, by analyzing a few failures 
>  there
>  are at least such cases:
>    1. empty function body
>    2. block holding a label for return.
>    3. block without any successor.
>    4. block which becomes empty after scheduling some other block.
>    5. block which looks mergeable with its always successor but left.
>    ...
> 
>  For 1,2, there is one single successor EXIT block, I think they don't 
>  affect
>  state transition, for 3, it's the same.  For 4, it depends on if we can 
>  have
>  the assumption this kind of empty block doesn't have the chance to have 
>  debug
>  insn (like associated debug insn should be moved along), I'm not sure.  
>  For 5,
>  a reduced test case is:
> >>>
> >>> Oh, I should have thought of cases like these, really sorry about the slip
> >>> of attention, and thanks for showing a testcase for item 5. As Richard as
> >>> saying in his response, cfg_cleanup cannot be a fix here. The thing to 
> >>> check
> >>> would be changing no_real_insns_p to always return false, and see if the
> >>> situation looks recoverable (if it breaks bootstrap, regtest statistics of
> >>> a non-bootstrapped compiler are still informative).
> >>
> >> As you suggested, I forced no_real_insns_p to return false all the time, 
> >> some
> >> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't 
> >> be
> >> encountered in those places, so the adjustments for most of them are just 
> >> to
> >> consider NOTE_P or this kind of special block and so on.  One draft patch 
> >> is
> >> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
> >> btw, it's without the previous cfg_cleanup adjustment (hope it can get more
> >> empty blocks and expose more issues).  The draft isn't qualified for code
> >> review but I hope it can provide some information on what kinds of changes
> >> are needed for the proposal.  If this is the direction which we all agree 
> >> on,
> >> I'll further refine it and post a formal patch.  One thing I want to note 
> >> is
> >> that this patch disable one assertion below:
> >>
> >> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> >> index e5964f54ead..abd334864fb 100644
> >> --- a/gcc/sched-rgn.cc
> >> +++ b/gcc/sched-rgn.cc
> >> @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
> >>  }
> >>
> >>/* Sanity check: verify that all region insns were scheduled.  */
> >> -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> >> +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> >>
> >>sched_finish_ready_list ();
> >>
> >> Some cases can cause this assertion to fail, it's due to the mismatch on
> >> to-be-scheduled and scheduled insn counts.  The reason why it happens is 
> >> that
> >> one block previously has only one INSN_P but while scheduling some other 
> >> blocks
> >> it gets moved as well then we ends up with an empty block so that the only
> >> NOTE_P insn was counted then, but since this block isn't empty initially 
> >> and
> >> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't 
> >> count
> >> it in.  It can be fixed with special-casing this kind of block for counting
> >> like initially recording which block is empty and if a block isn't recorded
> >> before then fix up the count for it accordingly.  I'm not sure if someone 
> >> may
> >> have an argument that all the complication make this proposal beaten by
> >> previous special-casing debug insn approach, looking forward to more 
> >> comments.
> >
> > Just a comment that the NOTE_P thing is odd - do we only ever have those for
> > otherwise empty B

Re: [PATCH] middle-end/32667 - document cpymem and memcpy exact overlap requirement

2023-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2023 at 08:13:47AM +, Richard Biener wrote:
> On Thu, 23 Nov 2023, Jakub Jelinek wrote:
> 
> > On Thu, Nov 23, 2023 at 08:00:49AM +, Richard Biener wrote:
> > > The following amends the cpymem documentation to mention that exact
> > > overlap needs to be handled gracefully, also noting that the target
> > > runtime is expected to behave the same way.
> > > 
> > > OK?
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > >   PR middle-end/32667
> > >   * md.texi (cpymem): Document that exact overlap of source
> > >   and destination needs to work.  Mention the target runtime
> > >   may not treat this case as undefined.
> > 
> > The first added sentence is ok, for the second see the spot
> > Florian mentioned in the PR.
> 
> Like this?
> 
> Thanks,
> Richard.
> 
> >From 93f4d22374ad2ea8bb5821083d2422c8b0a3313b Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Thu, 23 Nov 2023 08:54:56 +0100
> Subject: [PATCH] middle-end/32667 - document cpymem and memcpy exact overlap
>  requirement
> To: gcc-patches@gcc.gnu.org
> 
> The following amends the cpymem documentation to mention that exact
> overlap needs to be handled gracefully, also noting that the target
> runtime is expected to behave the same way where -ffreestanding
> docs mention the set of routines required.
> 
>   PR middle-end/32667
>   * md.texi (cpymem): Document that exact overlap of source
>   and destination needs to work.
>   * standards.texi (ffreestanding): Mention memcpy is required
>   to handle the exact overlap case.
> ---
>  gcc/doc/md.texi| 5 +++--
>  gcc/doc/standards.texi | 4 +++-
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index df6725ffc9c..87e1c9ed20e 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6915,8 +6915,9 @@ individually copied data units in the block.
>  
>  The @code{cpymem@var{m}} patterns need not give special consideration
>  to the possibility that the source and destination strings might
> -overlap. These patterns are used to do inline expansion of
> -@code{__builtin_memcpy}.
> +overlap.  An exeption is the case where source and destination are

s/exeption/exception/

> +equal, this case needs to be handled correctly.
> +These patterns are used to do inline expansion of @code{__builtin_memcpy}.
>  
>  @cindex @code{movmem@var{m}} instruction pattern
>  @item @samp{movmem@var{m}}
> diff --git a/gcc/doc/standards.texi b/gcc/doc/standards.texi
> index 4eb43f14f90..6eebb9426f3 100644
> --- a/gcc/doc/standards.texi
> +++ b/gcc/doc/standards.texi
> @@ -184,7 +184,9 @@ GNU C library).  @xref{Standard Libraries,,Standard 
> Libraries}.
>  Most of the compiler support routines used by GCC are present in
>  @file{libgcc}, but there are a few exceptions.  GCC requires the
>  freestanding environment provide @code{memcpy}, @code{memmove},
> -@code{memset} and @code{memcmp}.
> +@code{memset} and @code{memcmp}.  Contrary to the standards
> +covering @code{memcpy} GCC expects the case of an exact overlap
> +of source and destination to work and not invoke undefined behavior.
>  Finally, if @code{__builtin_trap} is used, and the target does
>  not implement the @code{trap} pattern, then GCC emits a call
>  to @code{abort}.

Ok with that typo fix.

Jakub



Re: [PATCH] c++, v4: Implement C++26 P2741R3 - user-generated static_assert messages [PR110348]

2023-11-23 Thread Jakub Jelinek
On Wed, Nov 22, 2023 at 04:53:48PM -0500, Jason Merrill wrote:
> I agree it's weird to get two of the same error, but maybe instead of
> duplicating the error, we could look up data only if size succeeded, and
> then error once if either failed?

Here is what I've committed after another bootstrap/regtest on x86_64-linux
and i686-linux.  Besides the above requested change I've tweaked 2 lines
in the test not to rely on a particular std::size_t exact type because
otherwise the test failed on i686-linux.  And accepting there only the
current
unsigned int
long unsigned int
long long unsinged int
unsigned __int20__ (or how exactly is this one spelled in diagnostics)
seems fragile.

Thanks a lot for the review of this (and sorry it took so long on my side
because I've missed the first review).

2023-11-23  Jakub Jelinek  

PR c++/110348
gcc/
* doc/invoke.texi (-Wno-c++26-extensions): Document.
gcc/c-family/
* c.opt (Wc++26-extensions): New option.
* c-cppbuiltin.cc (c_cpp_builtins): For C++26 predefine
__cpp_static_assert to 202306L rather than 201411L.
gcc/cp/
* parser.cc: Implement C++26 P2741R3 - user-generated static_assert
messages.
(cp_parser_static_assert): Parse message argument as
conditional-expression if it is not a pure string literal or
several of them concatenated followed by closing paren.
* semantics.cc (finish_static_assert): Handle message which is not
STRING_CST.  For condition with bare parameter packs return early.
* pt.cc (tsubst_expr) : Also tsubst_expr
message and make sure that if it wasn't originally STRING_CST, it
isn't after tsubst_expr either.
gcc/testsuite/
* g++.dg/cpp26/static_assert1.C: New test.
* g++.dg/cpp26/feat-cxx26.C (__cpp_static_assert): Expect
202306L rather than 201411L.
* g++.dg/cpp0x/udlit-error1.C: Expect different diagnostics for
static_assert with user-defined literal.

--- gcc/doc/invoke.texi.jj  2023-11-22 10:14:56.021376360 +0100
+++ gcc/doc/invoke.texi 2023-11-22 10:17:41.328065157 +0100
@@ -9107,6 +9107,13 @@ Do not warn about C++23 constructs in co
 an older C++ standard.  Even without this option, some C++23 constructs
 will only be diagnosed if @option{-Wpedantic} is used.
 
+@opindex Wc++26-extensions
+@opindex Wno-c++26-extensions
+@item -Wno-c++26-extensions @r{(C++ and Objective-C++ only)}
+Do not warn about C++26 constructs in code being compiled using
+an older C++ standard.  Even without this option, some C++26 constructs
+will only be diagnosed if @option{-Wpedantic} is used.
+
 @opindex Wcast-qual
 @opindex Wno-cast-qual
 @item -Wcast-qual
--- gcc/c-family/c.opt.jj   2023-11-22 10:14:55.963377171 +0100
+++ gcc/c-family/c.opt  2023-11-22 10:17:41.328065157 +0100
@@ -498,6 +498,10 @@ Wc++23-extensions
 C++ ObjC++ Var(warn_cxx23_extensions) Warning Init(1)
 Warn about C++23 constructs in code compiled with an older standard.
 
+Wc++26-extensions
+C++ ObjC++ Var(warn_cxx26_extensions) Warning Init(1)
+Warn about C++26 constructs in code compiled with an older standard.
+
 Wcast-function-type
 C ObjC C++ ObjC++ Var(warn_cast_function_type) Warning EnabledBy(Wextra)
 Warn about casts between incompatible function types.
--- gcc/c-family/c-cppbuiltin.cc.jj 2023-11-22 10:14:55.962377185 +0100
+++ gcc/c-family/c-cppbuiltin.cc2023-11-22 10:17:41.329065143 +0100
@@ -1023,7 +1023,8 @@ c_cpp_builtins (cpp_reader *pfile)
{
  /* Set feature test macros for C++17.  */
  cpp_define (pfile, "__cpp_unicode_characters=201411L");
- cpp_define (pfile, "__cpp_static_assert=201411L");
+ if (cxx_dialect <= cxx23)
+   cpp_define (pfile, "__cpp_static_assert=201411L");
  cpp_define (pfile, "__cpp_namespace_attributes=201411L");
  cpp_define (pfile, "__cpp_enumerator_attributes=201411L");
  cpp_define (pfile, "__cpp_nested_namespace_definitions=201411L");
@@ -1086,6 +1087,7 @@ c_cpp_builtins (cpp_reader *pfile)
{
  /* Set feature test macros for C++26.  */
  cpp_define (pfile, "__cpp_constexpr=202306L");
+ cpp_define (pfile, "__cpp_static_assert=202306L");
}
   if (flag_concepts)
 {
--- gcc/cp/parser.cc.jj 2023-11-22 10:14:55.969377087 +0100
+++ gcc/cp/parser.cc2023-11-22 10:17:41.335065058 +0100
@@ -16616,6 +16616,7 @@ cp_parser_linkage_specification (cp_pars
static_assert-declaration:
  static_assert ( constant-expression , string-literal ) ;
  static_assert ( constant-expression ) ; (C++17)
+ static_assert ( constant-expression, conditional-expression ) ; (C++26)
 
If MEMBER_P, this static_assert is a class member.  */
 
@@ -16646,10 +16647,10 @@ cp_parser_static_assert (cp_parser *pars
 
   /* Parse the constant-expression.  Allow a non-constant expression
  here in order to give better diagnostics in finish_static_assert.  */
-  condition =
- 

Re: [PATCH v3 3/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate shift

2023-11-23 Thread chenglulu

LGTM.

Thanks.

在 2023/11/20 上午8:47, Xi Ruoyao 写道:

Remove unnecessary UNSPECs and make the [x]vrotr[i] instructions useful
with GNU vectors and auto vectorization.

gcc/ChangeLog:

* config/loongarch/lsx.md (bitimm): Move to ...
(UNSPEC_LSX_VROTR): Remove.
(lsx_vrotr_): Remove.
(lsx_vrotri_): Remove.
* config/loongarch/lasx.md (UNSPEC_LASX_XVROTR): Remove.
(lsx_vrotr_): Remove.
(lsx_vrotri_): Remove.
* config/loongarch/simd.md (bitimm): ... here.  Expand it to
cover LASX modes.
(vrotr3): New define_insn.
(vrotri3): New define_insn.
* config/loongarch/loongarch-builtins.cc:
(CODE_FOR_lsx_vrotr_b): Use standard pattern name.
(CODE_FOR_lsx_vrotr_h): Likewise.
(CODE_FOR_lsx_vrotr_w): Likewise.
(CODE_FOR_lsx_vrotr_d): Likewise.
(CODE_FOR_lasx_xvrotr_b): Likewise.
(CODE_FOR_lasx_xvrotr_h): Likewise.
(CODE_FOR_lasx_xvrotr_w): Likewise.
(CODE_FOR_lasx_xvrotr_d): Likewise.
(CODE_FOR_lsx_vrotri_b): Define to standard pattern name.
(CODE_FOR_lsx_vrotri_h): Likewise.
(CODE_FOR_lsx_vrotri_w): Likewise.
(CODE_FOR_lsx_vrotri_d): Likewise.
(CODE_FOR_lasx_xvrotri_b): Likewise.
(CODE_FOR_lasx_xvrotri_h): Likewise.
(CODE_FOR_lasx_xvrotri_w): Likewise.
(CODE_FOR_lasx_xvrotri_d): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-rotr.c: New test.
---
  gcc/config/loongarch/lasx.md  | 22 
  gcc/config/loongarch/loongarch-builtins.cc| 16 +
  gcc/config/loongarch/lsx.md   | 28 ---
  gcc/config/loongarch/simd.md  | 29 +++
  .../gcc.target/loongarch/vect-rotr.c  | 36 +++
  5 files changed, 81 insertions(+), 50 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 023a023b44e..116b30c0774 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -138,7 +138,6 @@ (define_c_enum "unspec" [
UNSPEC_LASX_XVHSUBW_Q_D
UNSPEC_LASX_XVHADDW_QU_DU
UNSPEC_LASX_XVHSUBW_QU_DU
-  UNSPEC_LASX_XVROTR
UNSPEC_LASX_XVADD_Q
UNSPEC_LASX_XVSUB_Q
UNSPEC_LASX_XVREPLVE
@@ -4232,18 +4231,6 @@ (define_insn "lasx_xvhsubw_qu_du"
[(set_attr "type" "simd_int_arith")
 (set_attr "mode" "V4DI")])
  
-;;XVROTR.B   XVROTR.H   XVROTR.W   XVROTR.D

-;;TODO-478
-(define_insn "lasx_xvrotr_"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-  (match_operand:ILASX 2 "register_operand" "f")]
- UNSPEC_LASX_XVROTR))]
-  "ISA_HAS_LASX"
-  "xvrotr.\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "")])
-
  ;;XVADD.Q
  ;;TODO2
  (define_insn "lasx_xvadd_q"
@@ -4426,15 +4413,6 @@ (define_insn "lasx_xvexth_qu_du"
[(set_attr "type" "simd_fcvt")
 (set_attr "mode" "V4DI")])
  
-(define_insn "lasx_xvrotri_"

-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (rotatert:ILASX (match_operand:ILASX 1 "register_operand" "f")
-  (match_operand 2 "const__operand" "")))]
-  "ISA_HAS_LASX"
-  "xvrotri.\t%u0,%u1,%2"
-  [(set_attr "type" "simd_shf")
-   (set_attr "mode" "")])
-
  (define_insn "lasx_xvextl_q_d"
[(set (match_operand:V4DI 0 "register_operand" "=f")
(unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f")]
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index a6fcc1c731e..5d037ab7f10 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -369,6 +369,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
  #define CODE_FOR_lsx_vsrli_h CODE_FOR_vlshrv8hi3
  #define CODE_FOR_lsx_vsrli_w CODE_FOR_vlshrv4si3
  #define CODE_FOR_lsx_vsrli_d CODE_FOR_vlshrv2di3
+#define CODE_FOR_lsx_vrotr_b CODE_FOR_vrotrv16qi3
+#define CODE_FOR_lsx_vrotr_h CODE_FOR_vrotrv8hi3
+#define CODE_FOR_lsx_vrotr_w CODE_FOR_vrotrv4si3
+#define CODE_FOR_lsx_vrotr_d CODE_FOR_vrotrv2di3
+#define CODE_FOR_lsx_vrotri_b CODE_FOR_rotrv16qi3
+#define CODE_FOR_lsx_vrotri_h CODE_FOR_rotrv8hi3
+#define CODE_FOR_lsx_vrotri_w CODE_FOR_rotrv4si3
+#define CODE_FOR_lsx_vrotri_d CODE_FOR_rotrv2di3
  #define CODE_FOR_lsx_vsub_b CODE_FOR_subv16qi3
  #define CODE_FOR_lsx_vsub_h CODE_FOR_subv8hi3
  #define CODE_FOR_lsx_vsub_w CODE_FOR_subv4si3
@@ -634,6 +642,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
  #define CODE_FOR_lasx_xvsrli_h CODE_FOR_vlshrv16hi3
  #define CODE_FOR_lasx_xvsrli_w CODE_FOR_vlshrv8si3
  #define CODE_FOR_lasx_xvsrli_d CODE_FOR_vlshrv4di3
+#define CODE_FOR_lasx_xvrotr_b CODE_FOR_vrotrv32qi3
+#define CODE_FOR_lasx_xvrotr_h CODE_FOR_vrotrv16hi3
+#define CODE_FOR_lasx_xvrotr_w CODE_FOR_vrotrv8si3
+#define CODE_FOR_lasx_xvrotr

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-23 Thread Xi Ruoyao
On Thu, 2023-11-23 at 15:31 +0800, chenglulu wrote:
> I modified this code to use define_expand:
> 
>  (define_expand "fix_trunc2"
>        [(set (match_operand: 0 "register_operand" "=f")
>          (fix: (match_operand:FVEC 1 "register_operand" "f")))]
>        ""
>    {
>      emit_insn (gen__vftintrz__ (
>    operands[0], operands[1]));
>      DONE;
>    }
>    [(set_attr "type" "simd_fcvt")
>     (set_attr "mode" "")])

For

float x[4];
int y[4];

void test()
{
for (int i = 0; i < 4; i++)
y[i] = __builtin_rintf(x[i]);
}

it produces

la.local$r12,.LANCHOR0
vld $vr0,$r12,0
vfrint.s$vr0,$vr0
vftintrz.w.s$vr0,$vr0
vst $vr0,$r12,16
jr  $r1

But with a define_insn or define_insn_and_split:

la.local$r12,.LANCHOR0
vld $vr0,$r12,0
vftint.w.s  $vr0,$vr0
vst $vr0,$r12,16
jr  $r1

(Our scalar code also generates sub-optimal frint.s-ftintxx.w.s
sequences.  I guess should fix the scalar code later as well.)

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact

2023-11-23 Thread Xi Ruoyao
On Thu, 2023-11-23 at 16:23 +0800, chenglulu wrote:
> I tested it and it was fine. I never knew this could be used like
> this.

I remember when I wrote r13-3920 I tried this but failed.  Maybe
something has been improved in machine description parser, or perhaps I
just did some stupid thing that time...

> Thank you!
> 
> 在 2023/11/20 上午8:47, Xi Ruoyao 写道:
> > No functional change, just a cleanup.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch.md (lrint_allow_inexact):
> > Remove.
> > (2): Check if 
> > == UNSPEC_FTINT instead of .
> > ---
> >   gcc/config/loongarch/loongarch.md | 5 +
> >   1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch.md
> > b/gcc/config/loongarch/loongarch.md
> > index 78ed63f2132..1e019815451 100644
> > --- a/gcc/config/loongarch/loongarch.md
> > +++ b/gcc/config/loongarch/loongarch.md
> > @@ -585,9 +585,6 @@ (define_int_attr lrint_pattern [(UNSPEC_FTINT
> > "lrint")
> >   (define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
> >         (UNSPEC_FTINTRM "rm")
> >         (UNSPEC_FTINTRP "rp")])
> > -(define_int_attr lrint_allow_inexact [(UNSPEC_FTINT "1")
> > -     (UNSPEC_FTINTRM "0")
> > -     (UNSPEC_FTINTRP "0")])
> >   
> >   ;; Iterator and attributes for bytepick.d
> >   (define_int_iterator bytepick_w_ashift_amount [8 16 24])
> > @@ -2384,7 +2381,7 @@ (define_insn
> > "2"
> >     (unspec:ANYFI [(match_operand:ANYF 1 "register_operand"
> > "f")]
> >       LRINT))]
> >     "TARGET_HARD_FLOAT &&
> > -   (
> > +   ( == UNSPEC_FTINT
> >   || flag_fp_int_builtin_inexact
> >   || !flag_trapping_math)"
> >     "ftint.. %0,%1"
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] testsuite, Darwin: Add support for Mach-O function body scans.

2023-11-23 Thread Christophe Lyon
Hi Iain,

On Mon, 6 Nov 2023 at 11:58, Richard Sandiford
 wrote:
>
> Iain Sandoe  writes:
> > Hi Richard,
> >
> >> On 5 Nov 2023, at 12:11, Richard Sandiford  
> >> wrote:
> >>
> >> Iain Sandoe  writes:
> >
>  On 26 Oct 2023, at 21:00, Iain Sandoe  wrote:
> >>>
> > On 26 Oct 2023, at 20:49, Richard Sandiford 
> >>> wrote:
> >
> > Iain Sandoe  writes:
> >> This was written before Thomas' modification to the ELF-handling to 
> >> allow
> >> a config-based change for target details.  I did consider updating this
> >> to try and use that scheme, but I think that it would sit a little
> >> awkwardly, since there are some differences in the start-up scanning 
> >> for
> >> Mach-O.  I would say that in all probability we could improve things 
> >> but
> >> I'd like to put this forward as a well-tested initial implementation.
> >
> > Sorry, I would prefer to extend the existing function instead.
> > E.g. there's already some divergence between the Mach-O version
> > and the default version, in that the Mach-O version doesn't print
> > verbose messages.  I also don't think that the current default code
> > is so watertight that it'll never need to be updated in future.
> 
>  Fair enough, will explore what can be done (as I recall last I looked the
>  primary difference was in the initial start-up scan).
> >>>
> >>> I’ve done this as attached.
> >>>
> >>> For the record, when doing it, it gave rise to the same misgivings that 
> >>> led
> >>> to the separate implementation before.
> >>>
> >>> * as we add formats and uncover asm oddities, they all need to be handled
> >>>   in one set of code, IMO it could be come quite convoluted.
> >>>
> >>> * now making a change to the MACH-O code, means I have to check I did not
> >>>   inadvertently break ELF (and likewise, in theory, an ELF change should 
> >>> check
> >>>   MACH-O, but many folks do/can not do that).
> >>>
> >>> Maybe there’s some half-way-house where code can usefully be shared 
> >>> without
> >>> those down-sides.
> >>>
> >>> Anyway, to make progress, is the revised version OK for trunk? (tested on
> >>> aarch64-linux and aarch64-darwin).
> >>
> >> Sorry for the slow reply.  I was hoping we'd be able to share a bit more
> >> code than that, and avoid an isMACHO toggle.  Does something like the
> >> attached adaption of your patch work?  Only spot-checked on
> >> aarch64-linux-gnu so far.
> >>
> >> (The patch tries to avoid capturing the user label prefix, hopefully
> >> avoiding the needsULP thing.)
> >
> > Yes, this works for me too for Arm64 Darwin (and probably is fine for other
> > Darwin archs in case we implement body tests there).  If we decide to emit
> > some comment-based markers to delineat functions without unwind data,
> > we can just amend the start and end.
> >
> > thanks,
> > Iain
> > (doing some wider testing, but for now the only mach-o cases are in the
> >  arm64 code, so the fact that those passed so far is pretty good 
> > indication).
>
> OK, great.  It passed testing for me too, so please go ahead and commit
> if it does for you.
>
> > -
> >
> > As an aside what’s the intention for cases like this?
> >
> >   .data
> > foo:
> >   . ….
> >   .size foo, .-foo
>
> ATM there's no way for the test to say that specific pseudo-ops are
> interesting to it.  Same for labels.  It might be useful to add
> support for that though.
>
> Thanks,
> Richard
>

As you have probably already noticed with the notification from our
CI, this patch causes
FAIL: gcc.target/arm/pr95646.c check-function-bodies __acle_se_bar
At quick glance it's not obvious to me why check_function_body
does not print "body" and "against" debug traces, so there's not hint in gcc.log

I guess running the testsuite with -verbose or -v would help?

Can you have a look?

Thanks,

Christophe

> >
> >
> >
> >>
> >> Thanks,
> >> Richard
> >>
> >>
> >> diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
> >> index 5df80325dff..2434550f0c3 100644
> >> --- a/gcc/testsuite/lib/scanasm.exp
> >> +++ b/gcc/testsuite/lib/scanasm.exp
> >> @@ -785,23 +785,34 @@ proc configure_check-function-bodies { config } {
> >>
> >> # Regexp for the start of a function definition (name in \1).
> >> if { [istarget nvptx*-*-*] } {
> >> -set up_config(start) {^// BEGIN(?: GLOBAL|) FUNCTION DEF: 
> >> ([a-zA-Z_]\S+)$}
> >> +set up_config(start) {
> >> +{^// BEGIN(?: GLOBAL|) FUNCTION DEF: ([a-zA-Z_]\S+)$}
> >> +}
> >> +} elseif { [istarget *-*-darwin*] } {
> >> +set up_config(start) {
> >> +{^_([a-zA-Z_]\S+):$}
> >> +{^LFB[0-9]+:}
> >> +}
> >> } else {
> >> -set up_config(start) {^([a-zA-Z_]\S+):$}
> >> +set up_config(start) {{^([a-zA-Z_]\S+):$}}
> >> }
> >>
> >> # Regexp for the end of a function definition.
> >> if { [istarget nvptx*-*-*] } {
> >>  set up_config(end) {^\}$}
> >> +} elseif { [ista

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-23 Thread Xi Ruoyao
On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote:
> The fix_truncv4sfv4si2 template is indeed called when debugging with
> gdb.
> 
> So I think we can use define_expand here.

The problem is cases where we want to combine an rint call with float-
to-int conversion:

float x[4];
int y[4];

void test()
{
for (int i = 0; i < 4; i++)
y[i] = __builtin_rintf(x[i]);
}

With define_expand we get "vfrint + vftintrz", but with define_insn we
get a single "vftint".

Arguably the generic code should try to handle this (PR86609), but it's
"not sure if that's a good idea in general" (comment 1 in the PR) so we
can do this in a target-specific way.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH, v4] Fortran: restrictions on integer arguments to SYSTEM_CLOCK [PR112609]

2023-11-23 Thread Mikael Morin

Le 22/11/2023 à 21:36, Harald Anlauf a écrit :

Hi Mikael!

On 11/22/23 10:36, Mikael Morin wrote:

(...)


diff --git a/gcc/fortran/error.cc b/gcc/fortran/error.cc
index 2ac51e95e4d..be715b50469 100644
--- a/gcc/fortran/error.cc
+++ b/gcc/fortran/error.cc
@@ -980,7 +980,11 @@ char const*
 notify_std_msg(int std)
 {

-  if (std & GFC_STD_F2018_DEL)
+  if (std & GFC_STD_F2023_DEL)
+    return _("Fortran 2023 deleted feature:");


As there are officially no deleted feature in f2023, maybe use a
slightly different wording?  Say "Not allowed in fortran 2023" or
"forbidden in Fortran 2023" or similar?


+  else if (std & GFC_STD_F2023)
+    return _("Fortran 2023:");
+  else if (std & GFC_STD_F2018_DEL)
 return _("Fortran 2018 deleted feature:");
   else if (std & GFC_STD_F2018_OBS)
 return _("Fortran 2018 obsolescent feature:");


I skimmed over existing error messages, and since "forbidden" did
not show up and since "Not allowed" exists but not at the beginning
of a message, I found that

"Prohibited in Fortran 2023"

appeared to be a good alternative.

Not being a native speaker, I hope that someone speaks up if this
is not appropriate.  And since I do not explicitly verify that part
in the testcase, it can be changed.


diff --git a/gcc/fortran/libgfortran.h b/gcc/fortran/libgfortran.h
index bdddb317ab0..af7a170c2b1 100644
--- a/gcc/fortran/libgfortran.h
+++ b/gcc/fortran/libgfortran.h
@@ -19,9 +19,10 @@ along with GCC; see the file COPYING3.  If not see


 /* Flags to specify which standard/extension contains a feature.
-   Note that no features were obsoleted nor deleted in F2003 nor in
F2023.
+   Note that no features were obsoleted nor deleted in F2003.


I think we can add a comment that F2023 has no deleted feature, but some
more stringent restrictions in f2023 forbid some previously valid code.


    Please remember to keep those definitions in sync with
    gfortran.texi.  */
+#define GFC_STD_F2023_DEL    (1<<13)    /* Deleted in F2023.  */
 #define GFC_STD_F2023    (1<<12)    /* New in F2023.  */
 #define GFC_STD_F2018_DEL    (1<<11)    /* Deleted in F2018.  */
 #define GFC_STD_F2018_OBS    (1<<10)    /* Obsolescent in F2018.  */
@@ -41,12 +42,13 @@ along with GCC; see the file COPYING3.  If not see
  * are allowed with a certain -std option.  */
 #define GFC_STD_OPT_F95    (GFC_STD_F77 | GFC_STD_F95 |
GFC_STD_F95_OBS  \
 | GFC_STD_F2008_OBS | GFC_STD_F2018_OBS \
-    | GFC_STD_F2018_DEL)
+    | GFC_STD_F2018_DEL | GFC_STD_F2023_DEL)
 #define GFC_STD_OPT_F03    (GFC_STD_OPT_F95 | GFC_STD_F2003)
 #define GFC_STD_OPT_F08    (GFC_STD_OPT_F03 | GFC_STD_F2008)
 #define GFC_STD_OPT_F18    ((GFC_STD_OPT_F08 | GFC_STD_F2018) \
 & (~GFC_STD_F2018_DEL))

F03, F08 and F18 should have GFC_STD_F2023_DEL (and also F03 and F08
should have GFC_STD_F2018_DEL).


Well, these macros do an incremental bitwise-or, so the bit representing
GFC_STD_F2023_DEL is included everywhere.  I also ran the testcases with
different -std= options to check.


Ah, yes.  I confused the GFC_STD_OPT* values with the GFC_STD_* ones.


OK with this fixed (and the previous comments as you wish), if Steve has
no more comments.

Thanks for the patch.




If there are no further comments, I will commit once I am able to
fully build again with --disable-bootstrap and -march=native ...

Thanks,
Harald


Thanks again.



Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-23 Thread Kewen.Lin
on 2023/11/23 16:20, Richard Biener wrote:
> On Thu, Nov 23, 2023 at 4:02 AM Kewen.Lin  wrote:
>>
>> on 2023/11/22 18:25, Richard Biener wrote:
>>> On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin  wrote:

 on 2023/11/17 20:55, Alexander Monakov wrote:
>
> On Fri, 17 Nov 2023, Kewen.Lin wrote:
>>> I don't think you can run cleanup_cfg after sched_init. I would suggest
>>> to put it early in schedule_insns.
>>
>> Thanks for the suggestion, I placed it at the beginning of 
>> haifa_sched_init
>> instead, since schedule_insns invokes haifa_sched_init, although the
>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
>> ahead but they are all "setup" functions, shouldn't affect or be affected
>> by this placement.
>
> I was worried because sched_init invokes df_analyze, and I'm not sure if
> cfg_cleanup can invalidate it.

 Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
 considers df, like compact_blocks checks df, try_optimize_cfg invokes
 df_analyze etc., but I agree that moving cleanup_cfg before sched_init
 makes more sense.

>
>>> I suspect this may be caused by invoking cleanup_cfg too late.
>>
>> By looking into some failures, I found that although cleanup_cfg is 
>> executed
>> there would be still some empty blocks left, by analyzing a few failures 
>> there
>> are at least such cases:
>>   1. empty function body
>>   2. block holding a label for return.
>>   3. block without any successor.
>>   4. block which becomes empty after scheduling some other block.
>>   5. block which looks mergeable with its always successor but left.
>>   ...
>>
>> For 1,2, there is one single successor EXIT block, I think they don't 
>> affect
>> state transition, for 3, it's the same.  For 4, it depends on if we can 
>> have
>> the assumption this kind of empty block doesn't have the chance to have 
>> debug
>> insn (like associated debug insn should be moved along), I'm not sure.  
>> For 5,
>> a reduced test case is:
>
> Oh, I should have thought of cases like these, really sorry about the slip
> of attention, and thanks for showing a testcase for item 5. As Richard as
> saying in his response, cfg_cleanup cannot be a fix here. The thing to 
> check
> would be changing no_real_insns_p to always return false, and see if the
> situation looks recoverable (if it breaks bootstrap, regtest statistics of
> a non-bootstrapped compiler are still informative).

 As you suggested, I forced no_real_insns_p to return false all the time, 
 some
 issues got exposed, almost all of them are asserting NOTE_P insn shouldn't 
 be
 encountered in those places, so the adjustments for most of them are just 
 to
 consider NOTE_P or this kind of special block and so on.  One draft patch 
 is
 attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
 btw, it's without the previous cfg_cleanup adjustment (hope it can get more
 empty blocks and expose more issues).  The draft isn't qualified for code
 review but I hope it can provide some information on what kinds of changes
 are needed for the proposal.  If this is the direction which we all agree 
 on,
 I'll further refine it and post a formal patch.  One thing I want to note 
 is
 that this patch disable one assertion below:

 diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
 index e5964f54ead..abd334864fb 100644
 --- a/gcc/sched-rgn.cc
 +++ b/gcc/sched-rgn.cc
 @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
  }

/* Sanity check: verify that all region insns were scheduled.  */
 -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
 +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);

sched_finish_ready_list ();

 Some cases can cause this assertion to fail, it's due to the mismatch on
 to-be-scheduled and scheduled insn counts.  The reason why it happens is 
 that
 one block previously has only one INSN_P but while scheduling some other 
 blocks
 it gets moved as well then we ends up with an empty block so that the only
 NOTE_P insn was counted then, but since this block isn't empty initially 
 and
 NOTE_P gets skipped in a normal block, the count to-be-scheduled can't 
 count
 it in.  It can be fixed with special-casing this kind of block for counting
 like initially recording which block is empty and if a block isn't recorded
 before then fix up the count for it accordingly.  I'm not sure if someone 
 may
 have an argument that all the complication make this proposal beaten by
 previous special-casing debug insn approach, looking forward to more 
 comments.
>>>
>>> Just a comment that the NOTE_P thing is odd - do we

Re: [PATCH] testsuite, Darwin: Add support for Mach-O function body scans.

2023-11-23 Thread Iain Sandoe
Hi Christophe.

> On 23 Nov 2023, at 09:02, Christophe Lyon  wrote:
> 
> Hi Iain,
> 
> On Mon, 6 Nov 2023 at 11:58, Richard Sandiford
>  wrote:
>> 
>> Iain Sandoe  writes:
>>> Hi Richard,
>>> 
 On 5 Nov 2023, at 12:11, Richard Sandiford  
 wrote:
 
 Iain Sandoe  writes:
>>> 
>> On 26 Oct 2023, at 21:00, Iain Sandoe  wrote:
> 
>>> On 26 Oct 2023, at 20:49, Richard Sandiford 
> wrote:
>>> 
>>> Iain Sandoe  writes:
 This was written before Thomas' modification to the ELF-handling to 
 allow
 a config-based change for target details.  I did consider updating this
 to try and use that scheme, but I think that it would sit a little
 awkwardly, since there are some differences in the start-up scanning 
 for
 Mach-O.  I would say that in all probability we could improve things 
 but
 I'd like to put this forward as a well-tested initial implementation.
>>> 
>>> Sorry, I would prefer to extend the existing function instead.
>>> E.g. there's already some divergence between the Mach-O version
>>> and the default version, in that the Mach-O version doesn't print
>>> verbose messages.  I also don't think that the current default code
>>> is so watertight that it'll never need to be updated in future.
>> 
>> Fair enough, will explore what can be done (as I recall last I looked the
>> primary difference was in the initial start-up scan).
> 
> I’ve done this as attached.
> 
> For the record, when doing it, it gave rise to the same misgivings that 
> led
> to the separate implementation before.
> 
> * as we add formats and uncover asm oddities, they all need to be handled
>  in one set of code, IMO it could be come quite convoluted.
> 
> * now making a change to the MACH-O code, means I have to check I did not
>  inadvertently break ELF (and likewise, in theory, an ELF change should 
> check
>  MACH-O, but many folks do/can not do that).
> 
> Maybe there’s some half-way-house where code can usefully be shared 
> without
> those down-sides.
> 
> Anyway, to make progress, is the revised version OK for trunk? (tested on
> aarch64-linux and aarch64-darwin).
 
 Sorry for the slow reply.  I was hoping we'd be able to share a bit more
 code than that, and avoid an isMACHO toggle.  Does something like the
 attached adaption of your patch work?  Only spot-checked on
 aarch64-linux-gnu so far.
 
 (The patch tries to avoid capturing the user label prefix, hopefully
 avoiding the needsULP thing.)
>>> 
>>> Yes, this works for me too for Arm64 Darwin (and probably is fine for other
>>> Darwin archs in case we implement body tests there).  If we decide to emit
>>> some comment-based markers to delineat functions without unwind data,
>>> we can just amend the start and end.
>>> 
>>> thanks,
>>> Iain
>>> (doing some wider testing, but for now the only mach-o cases are in the
>>> arm64 code, so the fact that those passed so far is pretty good indication).
>> 
>> OK, great.  It passed testing for me too, so please go ahead and commit
>> if it does for you.
>> 
>>> -
>>> 
>>> As an aside what’s the intention for cases like this?
>>> 
>>>  .data
>>> foo:
>>>  . ….
>>>  .size foo, .-foo
>> 
>> ATM there's no way for the test to say that specific pseudo-ops are
>> interesting to it.  Same for labels.  It might be useful to add
>> support for that though.
>> 
>> Thanks,
>> Richard
>> 
> 
> As you have probably already noticed with the notification from our
> CI, this patch causes
> FAIL: gcc.target/arm/pr95646.c check-function-bodies __acle_se_bar
> At quick glance it's not obvious to me why check_function_body
> does not print "body" and "against" debug traces, so there's not hint in 
> gcc.log

Yeah, I’ve reproduced this (it did not show on either Richard’s nor my aarch64 
testing)
... and have a potential fix.

the problem is this:

.global bar
 …
. global __acle_se_bar

foo:
__acle_se_bar:
  …

=

The change in code prevernt the second label overriding the first (but the scan 
checks for the second).

Actually, that’s not legal Mach-O (two global labels cannot have the same 
address).

I have a fix that re-allows the override (thinking if I should assume Mach-O 
will never do this or skip the change for mach-o)

——
  

Iain



Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-23 Thread chenglulu



在 2023/11/23 下午5:02, Xi Ruoyao 写道:

On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote:

The fix_truncv4sfv4si2 template is indeed called when debugging with
gdb.

So I think we can use define_expand here.

The problem is cases where we want to combine an rint call with float-
to-int conversion:

float x[4];
int y[4];

void test()
{
for (int i = 0; i < 4; i++)
y[i] = __builtin_rintf(x[i]);
}

With define_expand we get "vfrint + vftintrz", but with define_insn we
get a single "vftint".

Arguably the generic code should try to handle this (PR86609), but it's
"not sure if that's a good idea in general" (comment 1 in the PR) so we
can do this in a target-specific way.

I tried to use Ofast to compile, and found that a vftint was generated, 
and at.006t.gimple appeared.


If O2 was compiled, __builtin_rintf would be generated, but Ofast would 
generate __builtin_irintf




Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact

2023-11-23 Thread chenglulu



在 2023/11/23 下午4:58, Xi Ruoyao 写道:

On Thu, 2023-11-23 at 16:23 +0800, chenglulu wrote:

I tested it and it was fine. I never knew this could be used like
this.

I remember when I wrote r13-3920 I tried this but failed.  Maybe
something has been improved in machine description parser, or perhaps I
just did some stupid thing that time...


But I think this is a really cool implementation!

When I look at this code and compare it to our scalar implementation, it 
seems


that our scalar implementation still lacks an "lround".




Thank you!

在 2023/11/20 上午8:47, Xi Ruoyao 写道:

No functional change, just a cleanup.

gcc/ChangeLog:

* config/loongarch/loongarch.md (lrint_allow_inexact):
Remove.
(2): Check if 
== UNSPEC_FTINT instead of .
---
   gcc/config/loongarch/loongarch.md | 5 +
   1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md
b/gcc/config/loongarch/loongarch.md
index 78ed63f2132..1e019815451 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -585,9 +585,6 @@ (define_int_attr lrint_pattern [(UNSPEC_FTINT
"lrint")
   (define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
        (UNSPEC_FTINTRM "rm")
        (UNSPEC_FTINTRP "rp")])
-(define_int_attr lrint_allow_inexact [(UNSPEC_FTINT "1")
-     (UNSPEC_FTINTRM "0")
-     (UNSPEC_FTINTRP "0")])
   
   ;; Iterator and attributes for bytepick.d

   (define_int_iterator bytepick_w_ashift_amount [8 16 24])
@@ -2384,7 +2381,7 @@ (define_insn
"2"
    (unspec:ANYFI [(match_operand:ANYF 1 "register_operand"
"f")]
      LRINT))]
     "TARGET_HARD_FLOAT &&
-   (
+   ( == UNSPEC_FTINT
   || flag_fp_int_builtin_inexact
   || !flag_trapping_math)"
     "ftint.. %0,%1"




Re: [PATCH] testsuite, Darwin: Add support for Mach-O function body scans.

2023-11-23 Thread Christophe Lyon
On Thu, 23 Nov 2023 at 10:09, Iain Sandoe  wrote:
>
> Hi Christophe.
>
> > On 23 Nov 2023, at 09:02, Christophe Lyon  
> > wrote:
> >
> > Hi Iain,
> >
> > On Mon, 6 Nov 2023 at 11:58, Richard Sandiford
> >  wrote:
> >>
> >> Iain Sandoe  writes:
> >>> Hi Richard,
> >>>
>  On 5 Nov 2023, at 12:11, Richard Sandiford  
>  wrote:
> 
>  Iain Sandoe  writes:
> >>>
> >> On 26 Oct 2023, at 21:00, Iain Sandoe  wrote:
> >
> >>> On 26 Oct 2023, at 20:49, Richard Sandiford 
> >>> 
> > wrote:
> >>>
> >>> Iain Sandoe  writes:
>  This was written before Thomas' modification to the ELF-handling to 
>  allow
>  a config-based change for target details.  I did consider updating 
>  this
>  to try and use that scheme, but I think that it would sit a little
>  awkwardly, since there are some differences in the start-up scanning 
>  for
>  Mach-O.  I would say that in all probability we could improve things 
>  but
>  I'd like to put this forward as a well-tested initial implementation.
> >>>
> >>> Sorry, I would prefer to extend the existing function instead.
> >>> E.g. there's already some divergence between the Mach-O version
> >>> and the default version, in that the Mach-O version doesn't print
> >>> verbose messages.  I also don't think that the current default code
> >>> is so watertight that it'll never need to be updated in future.
> >>
> >> Fair enough, will explore what can be done (as I recall last I looked 
> >> the
> >> primary difference was in the initial start-up scan).
> >
> > I’ve done this as attached.
> >
> > For the record, when doing it, it gave rise to the same misgivings that 
> > led
> > to the separate implementation before.
> >
> > * as we add formats and uncover asm oddities, they all need to be 
> > handled
> >  in one set of code, IMO it could be come quite convoluted.
> >
> > * now making a change to the MACH-O code, means I have to check I did 
> > not
> >  inadvertently break ELF (and likewise, in theory, an ELF change should 
> > check
> >  MACH-O, but many folks do/can not do that).
> >
> > Maybe there’s some half-way-house where code can usefully be shared 
> > without
> > those down-sides.
> >
> > Anyway, to make progress, is the revised version OK for trunk? (tested 
> > on
> > aarch64-linux and aarch64-darwin).
> 
>  Sorry for the slow reply.  I was hoping we'd be able to share a bit more
>  code than that, and avoid an isMACHO toggle.  Does something like the
>  attached adaption of your patch work?  Only spot-checked on
>  aarch64-linux-gnu so far.
> 
>  (The patch tries to avoid capturing the user label prefix, hopefully
>  avoiding the needsULP thing.)
> >>>
> >>> Yes, this works for me too for Arm64 Darwin (and probably is fine for 
> >>> other
> >>> Darwin archs in case we implement body tests there).  If we decide to emit
> >>> some comment-based markers to delineat functions without unwind data,
> >>> we can just amend the start and end.
> >>>
> >>> thanks,
> >>> Iain
> >>> (doing some wider testing, but for now the only mach-o cases are in the
> >>> arm64 code, so the fact that those passed so far is pretty good 
> >>> indication).
> >>
> >> OK, great.  It passed testing for me too, so please go ahead and commit
> >> if it does for you.
> >>
> >>> -
> >>>
> >>> As an aside what’s the intention for cases like this?
> >>>
> >>>  .data
> >>> foo:
> >>>  . ….
> >>>  .size foo, .-foo
> >>
> >> ATM there's no way for the test to say that specific pseudo-ops are
> >> interesting to it.  Same for labels.  It might be useful to add
> >> support for that though.
> >>
> >> Thanks,
> >> Richard
> >>
> >
> > As you have probably already noticed with the notification from our
> > CI, this patch causes
> > FAIL: gcc.target/arm/pr95646.c check-function-bodies __acle_se_bar
> > At quick glance it's not obvious to me why check_function_body
> > does not print "body" and "against" debug traces, so there's not hint in 
> > gcc.log
>
> Yeah, I’ve reproduced this (it did not show on either Richard’s nor my 
> aarch64 testing)
> ... and have a potential fix.
>

It makes sense, aarch64 and arm are different targets.

> the problem is this:
>
> .global bar
>  …
> . global __acle_se_bar
>
> foo:
> __acle_se_bar:
>   …
>
> =
>
> The change in code prevernt the second label overriding the first (but the 
> scan checks for the second).
>
> Actually, that’s not legal Mach-O (two global labels cannot have the same 
> address).
>
> I have a fix that re-allows the override (thinking if I should assume Mach-O 
> will never do this or skip the change for mach-o)
>
Good news, thanks!

Christophe

> ——
>
>
> Iain
>


RE: [PATCH v4] libgfortran: Replace mutex with rwlock

2023-11-23 Thread Zhu, Lipeng
> [CCing Ian as libgcc maintainer]
> 
> On Wed, 1 Nov 2023 10:14:37 +
> "Zhu, Lipeng"  wrote:
> 
> > > >
> > > > Hi Lipeng,
> > > >
> > > > >>> Sure, as your comments, in the patch V6, I added 3 test cases
> > > > >>> with OpenMP to test different cases in concurrency respectively:
> > > > >>> 1. find and create unit very frequently to stress read lock and 
> > > > >>> write
> lock.
> > > > >>> 2. only access the unit which exist in cache to stress read lock.
> > > > >>> 3. access the same unit in concurrency.
> > > > >>> For the third test case, it also help to find a bug:  When
> > > > >>> unit can't be found in cache nor unit list in read phase, then
> > > > >>> threads will try to acquire write lock to insert the same
> > > > >>> unit, this will cause duplicate key
> > > > >> error.
> > > > >>> To fix this bug, I get the unit from unit list once again
> > > > >>> before insert in write
> > > > >> lock.
> > > > >>> More details you can refer the patch v6.
> > > > >>>
> > > > >>
> > > > >> Could you help to review this update? I really appreciate your
> assistance.
> > > > >>
> > > >
> > > > > Could you help to review this update?  Any concern will be
> appreciated.
> > > >
> > > > Fortran parts are OK (I think I wrote that already), we need
> > > > somebody for the non-Fortran parts.
> > > >
> > > Hi Thomas,
> > >
> > > Thanks for your response. Very appreciate for your patience and help.
> > >
> > > > Jakub, could you maybe take a look?
> > > >
> > > > Best regards
> > > >
> > > > Thomas
> > >
> > > Hi Jakub,
> > >
> > > Can you help to take a look at the change for libgcc part that added
> > > several rwlock macros in libgcc/gthr-posix.h?
> > >
> >
> > Hi Jakub,
> >
> > Could you help to review this, any comment will be greatly appreciated.
> 
> Latest version is at
> https://inbox.sourceware.org/gcc-patches/20230818031818.2161842-1-
> lipeng@intel.com/
> 
Thanks Bernhard.

Hi Ian, 
Could you help to review the changes for libgcc part?  
Very looking forward to your help.

> >
> > > Best Regards,
> > > Lipeng Zhu
> >



Re: [PATCH, testsuite, fortran] fix invalid testcases (missing MOLD argument to NULL)

2023-11-23 Thread Mikael Morin

Hello,

Le 22/11/2023 à 22:02, Harald Anlauf a écrit :

Dear all,

testcases assumed_rank_8.f90 and assumed_rank_10.f90 are invalid:
NULL() is passed without MOLD to an assumed-rank dummy argument.

This is detected by NAG, but not yet by gfortran (see pr104819).
gfortran even ignores the MOLD argument; the dump-tree is identical
if MOLD is there or not.

Now these testcases are { dg-do run }.  Therefore I would like to
fix these testcases, independent of the work on fixing pr104819.

Comments?


Makes sense; OK from my point of view.

Mikael


[PATCH] lower-bitint: Fix up -fnon-call-exceptions bit-field load lowering [PR112668]

2023-11-23 Thread Jakub Jelinek
Hi!

As the following testcase shows, there are some bugs in the
-fnon-call-exceptions bit-field load lowering.  In particular, there
is a case where we want to emit a load early in the initialization
(before m_init_gsi) and because that load might throw exception, need
to split block after the load so that it has an EH edge.
Now, across this splitting, we have m_init_gsi, save_gsi (something
we put back into m_gsi afterwards) statement iterators and m_preheader_bb
which is used to determine the pre-header edge of a loop (if any).
As the testcase shows, both of these statement iterators and m_preheader_bb
as well need adjustments if the block was split.  If the stmt iterators
refer to a statement, they need to be updated so that if the statement is
in the bb after the split gsi_bb and gsi_seq is updated, otherwise they
ought to be the start of the new (second) bb.
Similarly, m_preheader_bb should be updated to the second bb if it was
the first before.  Other spots where we insert something before m_init_gsi
don't split blocks in there and are fine.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-11-23  Jakub Jelinek  

PR middle-end/112668
* gimple-lower-bitint.cc (bitint_large_huge::handle_load): When
splitting gsi_bb (m_init_gsi) basic block, update m_preheader_bb
if needed, fix up update of m_init_gsi and update saved m_gsi
as well if needed.

* gcc.dg/bitint-40.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2023-11-14 10:52:16.0 +0100
+++ gcc/gimple-lower-bitint.cc  2023-11-22 14:34:17.327140002 +0100
@@ -1687,7 +1687,22 @@ bitint_large_huge::handle_load (gimple *
  edge e = split_block (gsi_bb (m_gsi), g);
  make_edge (e->src, eh_edge->dest, EDGE_EH)->probability
= profile_probability::very_unlikely ();
- m_init_gsi.bb = e->dest;
+ m_init_gsi = gsi_last_bb (e->dest);
+ if (!gsi_end_p (m_init_gsi))
+   gsi_next (&m_init_gsi);
+ if (gsi_bb (save_gsi) == e->src)
+   {
+ if (gsi_end_p (save_gsi))
+   {
+ save_gsi = gsi_last_bb (e->dest);
+ if (!gsi_end_p (save_gsi))
+   gsi_next (&save_gsi);
+   }
+ else
+   save_gsi = gsi_for_stmt (gsi_stmt (save_gsi));
+   }
+ if (m_preheader_bb == e->src)
+   m_preheader_bb = e->dest;
}
}
  m_gsi = save_gsi;
--- gcc/testsuite/gcc.dg/bitint-40.c.jj 2023-11-22 13:47:12.380580107 +0100
+++ gcc/testsuite/gcc.dg/bitint-40.c2023-11-22 14:35:50.225842768 +0100
@@ -0,0 +1,29 @@
+/* PR middle-end/112668 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c23 -fnon-call-exceptions" } */
+
+#if __BITINT_MAXWIDTH__ >= 156
+struct T156 { _BitInt(156) a : 2; unsigned _BitInt(156) b : 135; _BitInt(156) 
c : 2; };
+extern void foo156 (struct T156 *);
+
+unsigned _BitInt(156)
+bar156 (int i)
+{
+  struct T156 r156[12];
+  foo156 (&r156[0]);
+  return r156[i].b;
+}
+#endif
+
+#if __BITINT_MAXWIDTH__ >= 495
+struct T495 { _BitInt(495) a : 2; unsigned _BitInt(495) b : 471; _BitInt(495) 
c : 2; };
+extern void foo495 (struct T495 *r495);
+
+unsigned _BitInt(495)
+bar495 (int i)
+{
+  struct T495 r495[12];
+  foo495 (r495);
+  return r495[i].b;
+}
+#endif

Jakub



[PATCH] testsuite, lib: Re-allow mulitple function start labels.

2023-11-23 Thread Iain Sandoe
Tested on a cross to armv8l-unknown-linux-gnueabihf where the failing
testcase is restored, and on aarch64-linux-gnu where no change is seen
on the aarch64.exp suite.  Also tested on arm64 Darwin for aarch64.exp
and aarch64-darwin.exp.

OK for trunk, or some alternative would be better?
Iain

--- 8< ---

The change applied in r14-5760-g2a46e0e7e20 changed the behaviour of
functions with assembly like:

bar:
__acle_se_bar:

Where both bar and __acle_se_bar are globals refering to the same
function body.  The old behaviour overrides 'bar' with '__acle_se_bar'
and the scan tests for that label.

The change here re-allows the override.

Case like this are not legal Mach-O (where two global symbols cannot
have the same address in the assembler output).  However, given the
constraints on the Mach-O scanning, it does not seem that it is
necessary to skip the change (any incorrect case should be easily
evident in the assembler).

gcc/testsuite/ChangeLog:

* lib/scanasm.exp: Allow multiple function start symbols,
taking the last as the function name.

Signed-off-by: Iain Sandoe 
---
 gcc/testsuite/lib/scanasm.exp | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 85ee54ff9a8..7ec3cfce02b 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -877,7 +877,15 @@ proc parse_function_bodies { config filename result } {
set in_function 0
}
} elseif { $in_function } {
-   if { [regexp $up_config(end) $line] } {
+   # We allow multiple function start labels, taking the last one seen
+   # as the function name.
+   if { [regexp [lindex $up_config(start) 0] \
+$line dummy maybe_function_name] } {
+   verbose "parse_function_bodies: overriding $function_name with 
$maybe_function_name"
+   set function_name $maybe_function_name
+   set in_function 1
+   set function_body ""
+   } elseif { [regexp $up_config(end) $line] } {
verbose "parse_function_bodies: $function_name:\n$function_body"
set up_result($function_name) $function_body
set in_function 0
-- 
2.39.2 (Apple Git-143)



[PATCH] expr: Fix &bitint_var handling in initializers [PR112336]

2023-11-23 Thread Jakub Jelinek
Hi!

As the following testcase shows, we ICE when trying to emit ADDR_EXPR of
a bitint variable which doesn't have mode width.
The problem is in the EXTEND_BITINT stuff which makes sure we treat the
padding bits on memory reads from user bitint vars as undefined.
When expanding ADDR_EXPR on such vars inside outside of initializers,
expand_expr_addr* uses EXPAND_CONST_ADDRESS modifier and EXTEND_BITINT
does nothing, but in initializers it keeps using EXPAND_INITIALIZER
modifier.  So, we need to treat EXPAND_INITIALIZER the same as
EXPAND_CONST_ADDRESS for this regard.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-11-23  Jakub Jelinek  

PR middle-end/112336
* expr.cc (EXTEND_BITINT): Don't call reduce_to_bit_field_precision
if modifier is EXPAND_INITIALIZER.

* gcc.dg/bitint-41.c: New test.

--- gcc/expr.cc.jj  2023-11-14 18:26:05.401613476 +0100
+++ gcc/expr.cc 2023-11-22 19:03:59.121599029 +0100
@@ -10698,6 +10698,7 @@ expand_expr_real_1 (tree exp, rtx target
 && mode != BLKmode \
 && modifier != EXPAND_MEMORY   \
 && modifier != EXPAND_WRITE
\
+&& modifier != EXPAND_INITIALIZER  \
 && modifier != EXPAND_CONST_ADDRESS)   \
? reduce_to_bit_field_precision ((expr), NULL_RTX, type) : (expr))
 
--- gcc/testsuite/gcc.dg/bitint-41.c.jj 2023-11-22 19:09:48.986726861 +0100
+++ gcc/testsuite/gcc.dg/bitint-41.c2023-11-22 19:09:29.804993983 +0100
@@ -0,0 +1,36 @@
+/* PR middle-end/112336 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c2x" } */
+
+unsigned _BitInt(1) v1;
+unsigned _BitInt(1) *p1 = &v1;
+signed _BitInt(2) v2;
+signed _BitInt(2) *p2 = &v2;
+unsigned _BitInt(11) v11;
+unsigned _BitInt(11) *p11 = &v11;
+signed _BitInt(12) v12;
+signed _BitInt(12) *p12 = &v12;
+unsigned _BitInt(21) v21;
+unsigned _BitInt(21) *p21 = &v21;
+signed _BitInt(22) v22;
+signed _BitInt(22) *p22 = &v22;
+unsigned _BitInt(31) v31;
+unsigned _BitInt(31) *p31 = &v31;
+signed _BitInt(32) v32;
+signed _BitInt(32) *p32 = &v32;
+unsigned _BitInt(41) v41;
+unsigned _BitInt(41) *p41 = &v41;
+signed _BitInt(42) v42;
+signed _BitInt(42) *p42 = &v42;
+#if __BITINT_MAXWIDTH__ >= 128
+unsigned _BitInt(127) v127;
+unsigned _BitInt(127) *p127 = &v127;
+signed _BitInt(128) v128;
+signed _BitInt(128) *p128 = &v128;
+#endif
+#if __BITINT_MAXWIDTH__ >= 258
+unsigned _BitInt(257) v257;
+unsigned _BitInt(257) *p257 = &v257;
+signed _BitInt(258) v258;
+signed _BitInt(258) *p258 = &v258;
+#endif

Jakub



Re: [PATCH] lower-bitint: Fix up -fnon-call-exceptions bit-field load lowering [PR112668]

2023-11-23 Thread Richard Biener
On Thu, Nov 23, 2023 at 10:43 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As the following testcase shows, there are some bugs in the
> -fnon-call-exceptions bit-field load lowering.  In particular, there
> is a case where we want to emit a load early in the initialization
> (before m_init_gsi) and because that load might throw exception, need
> to split block after the load so that it has an EH edge.
> Now, across this splitting, we have m_init_gsi, save_gsi (something
> we put back into m_gsi afterwards) statement iterators and m_preheader_bb
> which is used to determine the pre-header edge of a loop (if any).
> As the testcase shows, both of these statement iterators and m_preheader_bb
> as well need adjustments if the block was split.  If the stmt iterators
> refer to a statement, they need to be updated so that if the statement is
> in the bb after the split gsi_bb and gsi_seq is updated, otherwise they
> ought to be the start of the new (second) bb.
> Similarly, m_preheader_bb should be updated to the second bb if it was
> the first before.  Other spots where we insert something before m_init_gsi
> don't split blocks in there and are fine.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Wheee ...

> 2023-11-23  Jakub Jelinek  
>
> PR middle-end/112668
> * gimple-lower-bitint.cc (bitint_large_huge::handle_load): When
> splitting gsi_bb (m_init_gsi) basic block, update m_preheader_bb
> if needed, fix up update of m_init_gsi and update saved m_gsi
> as well if needed.
>
> * gcc.dg/bitint-40.c: New test.
>
> --- gcc/gimple-lower-bitint.cc.jj   2023-11-14 10:52:16.0 +0100
> +++ gcc/gimple-lower-bitint.cc  2023-11-22 14:34:17.327140002 +0100
> @@ -1687,7 +1687,22 @@ bitint_large_huge::handle_load (gimple *
>   edge e = split_block (gsi_bb (m_gsi), g);
>   make_edge (e->src, eh_edge->dest, EDGE_EH)->probability
> = profile_probability::very_unlikely ();
> - m_init_gsi.bb = e->dest;
> + m_init_gsi = gsi_last_bb (e->dest);

shouldn't that be gsi_start_bb (e->dest) if we want to continue inserting
before the "old" stmt?

> + if (!gsi_end_p (m_init_gsi))
> +   gsi_next (&m_init_gsi);

That would always put it at the end?

> + if (gsi_bb (save_gsi) == e->src)
> +   {
> + if (gsi_end_p (save_gsi))
> +   {
> + save_gsi = gsi_last_bb (e->dest);
> + if (!gsi_end_p (save_gsi))
> +   gsi_next (&save_gsi);
> +   }
> + else
> +   save_gsi = gsi_for_stmt (gsi_stmt (save_gsi));

uhm.  It might be better to instead of doing save_gsi = m_gsi
save gsi_stmt () and gsi_bb () to avoid accessing the now
possibly invalid iterator?

If there were only one iterator I'd say we want a

  split_block_{after,before} (&gsi);

which hides the detail of updating the iterator.  But you have the
additional issue of possibly updating another iterator where as said
the better solution would be to reconstruct it from a gimple *
(or basic_block if at end).  Maybe we can have a
gsi_update_after_spli_block (&gsi, basic_block-that-was-split)?

If you think any of this would be an improvement (but also see
the gsi_last_bb vs gsi_start issue) feel free to improve.

Otherwise OK as-is.

Richard.

> +   }
> + if (m_preheader_bb == e->src)
> +   m_preheader_bb = e->dest;
> }
> }
>   m_gsi = save_gsi;
> --- gcc/testsuite/gcc.dg/bitint-40.c.jj 2023-11-22 13:47:12.380580107 +0100
> +++ gcc/testsuite/gcc.dg/bitint-40.c2023-11-22 14:35:50.225842768 +0100
> @@ -0,0 +1,29 @@
> +/* PR middle-end/112668 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -fnon-call-exceptions" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 156
> +struct T156 { _BitInt(156) a : 2; unsigned _BitInt(156) b : 135; 
> _BitInt(156) c : 2; };
> +extern void foo156 (struct T156 *);
> +
> +unsigned _BitInt(156)
> +bar156 (int i)
> +{
> +  struct T156 r156[12];
> +  foo156 (&r156[0]);
> +  return r156[i].b;
> +}
> +#endif
> +
> +#if __BITINT_MAXWIDTH__ >= 495
> +struct T495 { _BitInt(495) a : 2; unsigned _BitInt(495) b : 471; 
> _BitInt(495) c : 2; };
> +extern void foo495 (struct T495 *r495);
> +
> +unsigned _BitInt(495)
> +bar495 (int i)
> +{
> +  struct T495 r495[12];
> +  foo495 (r495);
> +  return r495[i].b;
> +}
> +#endif
>
> Jakub
>


Re: [PATCH] c: Add __builtin_stdc_bit_{width,floor,ceil} builtins

2023-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2023 at 12:09:05AM +, Joseph Myers wrote:
> OK with tests added for unsigned _BitInt(1).  Specifically, unsigned 
> _BitInt(1) is a bit of a degenerate case for stdc_bit_ceil (always 
> returning 1 after evaluating the argument's side effects); I think the 
> code that builds of constant 2 of that type (a constant only used in dead 
> code) should still work (and produce a constant 0), and that the 
> documentation is also still correct in the case where converting 2 to the 
> type produces 0, but given those degeneracies I think it's worth testing 
> unsigned _BitInt(1) with these functions to make sure they do behave as 
> expected.

Thanks, here is incremental diff between what was posted and what was
committed:

--- gcc/testsuite/gcc.dg/builtin-stdc-bit-1.c   2023-11-20 16:25:22.548758830 
+0100
+++ gcc/testsuite/gcc.dg/builtin-stdc-bit-1.c   2023-11-23 10:08:50.133761681 
+0100
@@ -668,6 +668,87 @@
 __builtin_abort ();
   if (__builtin_stdc_has_single_bit (b++) || b != 14)
 __builtin_abort ();
+#if __BITINT_MAXWIDTH__ >= 64
+  if (__builtin_stdc_leading_zeros (0uwb) != 1
+  || !expr_has_type (__builtin_stdc_leading_zeros (0uwb), unsigned int)
+  || __builtin_stdc_leading_zeros (1uwb) != 0
+  || !expr_has_type (__builtin_stdc_leading_zeros (1uwb), unsigned int))
+__builtin_abort ();
+  if (__builtin_stdc_leading_ones (0uwb) != 0
+  || !expr_has_type (__builtin_stdc_leading_ones (0uwb), unsigned int)
+  || __builtin_stdc_leading_ones (1uwb) != 1
+  || !expr_has_type (__builtin_stdc_leading_ones (1uwb), unsigned int))
+__builtin_abort ();
+  if (__builtin_stdc_trailing_zeros (0uwb) != 1
+  || !expr_has_type (__builtin_stdc_trailing_zeros (0uwb), unsigned int)
+  || __builtin_stdc_trailing_zeros (1uwb) != 0
+  || !expr_has_type (__builtin_stdc_trailing_zeros (1uwb), unsigned int))
+__builtin_abort ();
+  if (__builtin_stdc_trailing_ones (0uwb) != 0
+  || !expr_has_type (__builtin_stdc_trailing_ones (0uwb), unsigned int)
+  || __builtin_stdc_trailing_ones (1uwb) != 1
+  || !expr_has_type (__builtin_stdc_trailing_ones (1uwb), unsigned int))
+__builtin_abort ();
+  if (__builtin_stdc_first_leading_zero (0uwb) != 1
+  || !expr_has_type (__builtin_stdc_first_leading_zero (0uwb), unsigned 
int)
+  || __builtin_stdc_first_leading_zero (1uwb) != 0
+  || !expr_has_type (__builtin_stdc_first_leading_zero (1uwb), unsigned 
int))
+__builtin_abort ();
+  if (__builtin_stdc_first_leading_one (0uwb) != 0
+  || !expr_has_type (__builtin_stdc_first_leading_one (0uwb), unsigned int)
+  || __builtin_stdc_first_leading_one (1uwb) != 1
+  || !expr_has_type (__builtin_stdc_first_leading_one (1uwb), unsigned 
int))
+__builtin_abort ();
+  if (__builtin_stdc_first_trailing_zero (0uwb) != 1
+  || !expr_has_type (__builtin_stdc_first_trailing_zero (0uwb), unsigned 
int)
+  || __builtin_stdc_first_trailing_zero (1uwb) != 0
+  || !expr_has_type (__builtin_stdc_first_trailing_zero (1uwb), unsigned 
int))
+__builtin_abort ();
+  if (__builtin_stdc_first_trailing_one (0uwb) != 0
+  || !expr_has_type (__builtin_stdc_first_trailing_one (0uwb), unsigned 
int)
+  || __builtin_stdc_first_trailing_one (1uwb) != 1
+  || !expr_has_type (__builtin_stdc_first_trailing_one (1uwb), unsigned 
int))
+__builtin_abort ();
+  if (__builtin_stdc_count_zeros (0uwb) != 1
+  || !expr_has_type (__builtin_stdc_count_zeros (0uwb), unsigned int)
+  || __builtin_stdc_count_zeros (1uwb) != 0
+  || !expr_has_type (__builtin_stdc_count_zeros (1uwb), unsigned int))
+__builtin_abort ();
+  if (__builtin_stdc_count_ones (0uwb) != 0
+  || !expr_has_type (__builtin_stdc_count_ones (0uwb), unsigned int)
+  || __builtin_stdc_count_ones (1uwb) != 1
+  || !expr_has_type (__builtin_stdc_count_ones (1uwb), unsigned int))
+__builtin_abort ();
+  if (__builtin_stdc_has_single_bit (0uwb)
+  || !expr_has_type (__builtin_stdc_has_single_bit (0uwb), _Bool)
+  || !__builtin_stdc_has_single_bit (1uwb)
+  || !expr_has_type (__builtin_stdc_has_single_bit (1uwb), _Bool))
+__builtin_abort ();
+  if (__builtin_stdc_bit_width (0uwb) != 0
+  || !expr_has_type (__builtin_stdc_bit_width (0uwb), unsigned int)
+  || __builtin_stdc_bit_width (1uwb) != 1
+  || !expr_has_type (__builtin_stdc_bit_width (1uwb), unsigned int))
+__builtin_abort ();
+  if (__builtin_stdc_bit_floor (0uwb) != 0
+  || !expr_has_type (__builtin_stdc_bit_floor (0uwb), unsigned _BitInt(1))
+  || __builtin_stdc_bit_floor (1uwb) != 1
+  || !expr_has_type (__builtin_stdc_bit_floor (1uwb), unsigned _BitInt(1)))
+__builtin_abort ();
+  if (__builtin_stdc_bit_ceil (0uwb) != 1
+  || !expr_has_type (__builtin_stdc_bit_ceil (0uwb), unsigned _BitInt(1))
+  || __builtin_stdc_bit_ceil (1uwb) != 1
+  || !expr_has_type (__builtin_stdc_bit_ceil (1uwb), unsigned _BitInt(1)))
+__builtin_abort 

Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-23 Thread Xi Ruoyao
On Thu, 2023-11-23 at 17:12 +0800, chenglulu wrote:
> 
> 在 2023/11/23 下午5:02, Xi Ruoyao 写道:
> > On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote:
> > > The fix_truncv4sfv4si2 template is indeed called when debugging with
> > > gdb.
> > > 
> > > So I think we can use define_expand here.
> > The problem is cases where we want to combine an rint call with float-
> > to-int conversion:
> > 
> > float x[4];
> > int y[4];
> > 
> > void test()
> > {
> > for (int i = 0; i < 4; i++)
> > y[i] = __builtin_rintf(x[i]);
> > }
> > 
> > With define_expand we get "vfrint + vftintrz", but with define_insn we
> > get a single "vftint".
> > 
> > Arguably the generic code should try to handle this (PR86609), but it's
> > "not sure if that's a good idea in general" (comment 1 in the PR) so we
> > can do this in a target-specific way.
> > 
> I tried to use Ofast to compile, and found that a vftint was generated, 
> and at.006t.gimple appeared.
> 
> If O2 was compiled, __builtin_rintf would be generated, but Ofast would 
> generate __builtin_irintf

Indeed...  It seems the FE will only generate __builtin_irintf when -
fno-math-errno -funsafe-math-optimizations.

But I cannot see why this is necessary (at least for us): the rintf
function does not set errno at all, and to me using vftint.w.s here is
safe: if the rounded result can be represented as a 32-bit int,
obviously there is no issue;  otherwise, per C23 section F.4 we should
raise FE_INVALID and produce unspecified result.  It seems our ftint.w.s
instruction has the required semantics.

+Uros and Joseph for some comment about the expected behavior of
(int)rintf(x).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[Committed] RISC-V: Refine some codes of riscv-v.cc[NFC]

2023-11-23 Thread Juzhe-Zhong
This patch is NFC patch to refine unreasonable codes I notice.

Tested on zvl128b/zvl256b/zvl512b/zvl1024b no regression.

Committed.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_vlmax_gather_insn): Refine codes.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(modulo_sel_indices): Ditto.
(expand_vec_perm): Ditto.
(shuffle_generic_patterns): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 54 +
 1 file changed, 18 insertions(+), 36 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 7d3e8038dab..24b09c0dd2d 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -294,8 +294,6 @@ public:
   "vsetvl zero, rs1/imm".  */
poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
len = gen_int_mode (nunits, Pmode);
-   if (!satisfies_constraint_K (len))
- len = force_reg (Pmode, len);
vls_p = true;
  }
else if (can_create_pseudo_p ())
@@ -846,24 +844,6 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
 }
   else if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
 icode = code_for_pred_gatherei16 (data_mode);
-  else if (CONST_VECTOR_P (sel)
-   && GET_MODE_BITSIZE (GET_MODE_INNER (sel_mode)) > 16
-   && riscv_get_v_regno_alignment (data_mode) > 1)
-{
-  /* If the inner mode of data is not QI or HI and data_lmul > 1,
- emitting vrgatherei16.vv instruction will lower register
- pressure.
- data_mode  sel_mode  ei16
- RVVM1QIRVVM1QI   RVVM2HI  not needed
- RVVM2QIRVVM2QI   RVVM4HI  not needed
- RVVM2HIRVVM2HI   RVVM2HI  not needed
- RVVM2SIRVVM2SI   RVVM1HI  need
- RVVM4SIRVVM4SI   RVVM2HI  need
- RVVM8DIRVVM8DI   RVVM2HI  need */
-  PUT_MODE (sel, get_vector_mode (HImode,
-GET_MODE_NUNITS (data_mode)).require ());
-  icode = code_for_pred_gatherei16 (data_mode);
-}
   else
 icode = code_for_pred_gather (data_mode);
   rtx ops[] = {target, op, sel};
@@ -877,13 +857,13 @@ emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx 
sel, rtx mask)
   insn_code icode;
   machine_mode data_mode = GET_MODE (target);
   machine_mode sel_mode = GET_MODE (sel);
-  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
-icode = code_for_pred_gatherei16 (data_mode);
-  else if (const_vec_duplicate_p (sel, &elt))
+  if (const_vec_duplicate_p (sel, &elt))
 {
   icode = code_for_pred_gather_scalar (data_mode);
   sel = elt;
 }
+  else if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
+icode = code_for_pred_gatherei16 (data_mode);
   else
 icode = code_for_pred_gather (data_mode);
   rtx ops[] = {target, mask, target, op, sel};
@@ -2703,15 +2683,21 @@ expand_vec_cmp_float (rtx target, rtx_code code, rtx 
op0, rtx op1,
   return false;
 }
 
-/* Modulo all SEL indices to ensure they are all in range if [0, MAX_SEL].  */
+/* Modulo all SEL indices to ensure they are all in range if [0, MAX_SEL].
+   MAX_SEL is nunits - 1 if rtx_equal_p (op0, op1). Otherwise, it is
+   2 * nunits - 1.  */
 static rtx
-modulo_sel_indices (rtx sel, poly_uint64 max_sel)
+modulo_sel_indices (rtx op0, rtx op1, rtx sel)
 {
   rtx sel_mod;
   machine_mode sel_mode = GET_MODE (sel);
   poly_uint64 nunits = GET_MODE_NUNITS (sel_mode);
-  /* If SEL is variable-length CONST_VECTOR, we don't need to modulo it.  */
-  if (!nunits.is_constant () && CONST_VECTOR_P (sel))
+  poly_uint64 max_sel = rtx_equal_p (op0, op1) ? nunits - 1 : 2 * nunits - 1;
+  /* If SEL is variable-length CONST_VECTOR, we don't need to modulo it.
+ Or if SEL is constant-length within [0, MAX_SEL], no need to modulo the
+ indice.  */
+  if (CONST_VECTOR_P (sel)
+  && (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, 
max_sel)))
 sel_mod = sel;
   else
 {
@@ -2761,9 +2747,7 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
  out-of-range indices, so we need to modulo all the vec_perm indices
  to ensure they are all in range of [0, nunits - 1] when op0 == op1
  or all in range of [0, 2 * nunits - 1] when op0 != op1.  */
-  rtx sel_mod
-= modulo_sel_indices (sel,
- rtx_equal_p (op0, op1) ? nunits - 1 : 2 * nunits - 1);
+  rtx sel_mod = modulo_sel_indices (op0, op1, sel);
 
   /* Check if the two values vectors are the same.  */
   if (rtx_equal_p (op0, op1))
@@ -2772,15 +2756,13 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
   return;
 }
 
-  rtx max_sel = gen_const_vector_dup (sel_mode, 2 * nunits - 1);
-
   /* This following sequence is handling the case that:
  __builtin_shufflevector (vec1, vec2, index...), the index can be any
  value in range of [0, 2 * nunits - 1].  */
   machine_mode mask_mode;
   mask_mode = get_mask_mode (data_mode);
   

[PATCH] c++: Make g++.dg/opt/pr110879.C require C++11 [PR110879]

2023-11-23 Thread Jonathan Wakely
Jakub noted in the PR that this test fails for -std=c++98.

Tested x86_64-linux.

OK for trunk?

-- >8 --

The _M_realloc_insert member does not have the trivial relocation
optimization for C++98, which seems to be why the _M_end_of_storage
member does not get optimized away. Make this test unsupported for
C++98.

gcc/testsuite/ChangeLog:

PR libstdc++/110879
* g++.dg/opt/pr110879.C: Require C++11 or later.
---
 gcc/testsuite/g++.dg/opt/pr110879.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/opt/pr110879.C 
b/gcc/testsuite/g++.dg/opt/pr110879.C
index 7f0a0a80b8a..57556605a51 100644
--- a/gcc/testsuite/g++.dg/opt/pr110879.C
+++ b/gcc/testsuite/g++.dg/opt/pr110879.C
@@ -1,4 +1,4 @@
-// { dg-do compile }
+// { dg-do compile { target c++11 } }
 // { dg-options "-O3 -fdump-tree-optimized" }
 
 #include 
-- 
2.42.0



Re: [PATCH v4] Introduce strub: machine-independent stack scrubbing

2023-11-23 Thread Alexandre Oliva
Hello, Richi,

Thanks for the extensive review!

On Nov 22, 2023, Richard Biener  wrote:

> On Mon, Nov 20, 2023 at 1:40 PM Alexandre Oliva  wrote:
>> 
>> On Oct 26, 2023, Alexandre Oliva  wrote:
>> 
>> >> This is a refreshed and improved version of the version posted back in
>> >> June.  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621936.html
>> 
>> > Ping? https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633675.html
>> > I'm combining the gcc/ipa-strub.cc bits from
>> > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633526.html
>> 
>> Ping?
>> Retested on x86_64-linux-gnu, with and without -fstrub=all.

> @@ -898,7 +899,24 @@ decl_attributes (tree *node, tree attributes, int flags,
>TYPE_NAME (tt) = *node;
>  }

> -  *anode = cur_and_last_decl[0];
> +  if (*anode != cur_and_last_decl[0])
> +{
> +  /* Even if !spec->function_type_required, allow the attribute
> + handler to request the attribute to be applied to the function
> + type, rather than to the function pointer type, by setting
> + cur_and_last_decl[0] to the function type.  */
> +  if (!fn_ptr_tmp
> +  && POINTER_TYPE_P (*anode)
> +  && TREE_TYPE (*anode) == cur_and_last_decl[0]
> +  && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (*anode)))
> + {
> +  fn_ptr_tmp = TREE_TYPE (*anode);
> +  fn_ptr_quals = TYPE_QUALS (*anode);
> +  anode = &fn_ptr_tmp;
> + }
> +  *anode = cur_and_last_decl[0];
> +}
> +

> what is this a workaround for?

For the fact that the strub attribute attaches to types, whether data or
function types, so we can't have fn_type_req, but when it's a function
or pointer-to-function type, we want to affect the function type, rather
than the pointer type, when the attribute has an argument.  The argument
names the strub mode for a function; that only applies to function
types, never to data types.

The hunk above introduces the means for the attribute handler to choose
what to attach the attribute t.

> Isn't there a suitable parsing position for placing the attribute?

It's been a while, but IIRC the need for this first came up in Ada,
where attributes can't just go anywhere, and it was further complicated
by the fact that Ada doesn't have first-class function or procedure
types, only access-to-them, but we needed some means for the attributes
to apply to the function type.

> +#ifndef STACK_GROWS_DOWNWARD
> +# define STACK_TOPS GT
> +#else
> +# define STACK_TOPS LT
> +#endif

> according to docs this is defined to 0 or 1 so the above looks wrong
> (it's always defined).

Ugh.  Thanks, will fix.  (I'm pretty sure I had notes somewhere stating
that stack-grows-upwards hadn't been tested, and that was for the sheer
lack of platforms making that choice, but I hoped it wasn't that broken
:-(

> +  if (optimize < 2 || optimize_size || flag_no_inline)
> +return NULL_RTX;

> I'm wondering about these checks in the expansions of the builtins,
> I think this is about inline expanding or emitting a libcall, right?

Yeah.

> I wonder if you should use optimize_function_for_speed (cfun) instead?
> Usually -fno-inline shouldn't affect such calls, but -fno-builtin-FOO would.
> I have no strong opinion here though.

I've occasionally wondered whether builtins were the best framework for
these semi-internal calls.

> The new builtins seem undocumented - usually those are documented
> within extend.texi

Erhm...  Weird.  I had documentation for them.

(checks)

No, it's there, in extend.texi, right after __builtin_stack_address.
It's admittedly a big patch :-/

> I guess placing __builtin___strub_enter calls in the code manually
> will break in interesting ways - if that's not supposed to happen the
> trick is to embed a space in the name of the built-in.

Yeah, I was a little torn between the choices here.  On the one hand, I
needed visible symbols for the out-of-line implementations, so I figured
that trying to hide the builtins wouldn't bring any advantage.

However, I've also designed the builtins with interfaces that would
avoid disruption even with explicit calls.  __strub_enter and
__strub_update only initialize or adjust a pointer handed to them.
__strub_leave will erase things from the top of the stack to the
pointer, so if the watermark is "active stack", nothing happens, and
things only get cleared if it points to "unused stack space".  There's
potential for disruption if one passes a statically-allocated pointer to
it, but nothing much different from memsetting that memory range, core
wars-style.

> -symtab_node::reset (void)
> +symtab_node::reset (bool preserve_comdat_group)

> not sure what for, I'll leave Honza to comment.

This restores the possibility of getting the pre-PR107897 behavior, that
the strub wrapper/wrapped splitting relied on.  Conceptually, the
original function becomes the wrapped one, and the wrapper that calls it
is kind of an implementation detail to preserve the exposed API/ABI
while introducing strubbing around the body, so preserving the comdat
group makes sense. 

Re: [PATCH] lower-bitint: Fix up -fnon-call-exceptions bit-field load lowering [PR112668]

2023-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2023 at 11:01:08AM +0100, Richard Biener wrote:
> > --- gcc/gimple-lower-bitint.cc.jj   2023-11-14 10:52:16.0 +0100
> > +++ gcc/gimple-lower-bitint.cc  2023-11-22 14:34:17.327140002 +0100
> > @@ -1687,7 +1687,22 @@ bitint_large_huge::handle_load (gimple *
> >   edge e = split_block (gsi_bb (m_gsi), g);
> >   make_edge (e->src, eh_edge->dest, 
> > EDGE_EH)->probability
> > = profile_probability::very_unlikely ();
> > - m_init_gsi.bb = e->dest;
> > + m_init_gsi = gsi_last_bb (e->dest);
> 
> shouldn't that be gsi_start_bb (e->dest) if we want to continue inserting
> before the "old" stmt?
> 
> > + if (!gsi_end_p (m_init_gsi))
> > +   gsi_next (&m_init_gsi);
> 
> That would always put it at the end?
> 
> > + if (gsi_bb (save_gsi) == e->src)
> > +   {
> > + if (gsi_end_p (save_gsi))
> > +   {
> > + save_gsi = gsi_last_bb (e->dest);
> > + if (!gsi_end_p (save_gsi))
> > +   gsi_next (&save_gsi);
> > +   }
> > + else
> > +   save_gsi = gsi_for_stmt (gsi_stmt (save_gsi));
> 
> uhm.  It might be better to instead of doing save_gsi = m_gsi
> save gsi_stmt () and gsi_bb () to avoid accessing the now
> possibly invalid iterator?
> 
> If there were only one iterator I'd say we want a
> 
>   split_block_{after,before} (&gsi);
> 
> which hides the detail of updating the iterator.  But you have the
> additional issue of possibly updating another iterator where as said
> the better solution would be to reconstruct it from a gimple *
> (or basic_block if at end).  Maybe we can have a
> gsi_update_after_spli_block (&gsi, basic_block-that-was-split)?
> 
> If you think any of this would be an improvement (but also see
> the gsi_last_bb vs gsi_start issue) feel free to improve.
> 
> Otherwise OK as-is.

The code has 2 iterators and pretty much everything in the pass inserts
statements before iterator.
m_gsi is the iterator before which everything is inserted, initially
initialized to
  m_gsi = gsi_for_stmt (stmt);
where stmt is the statement we are lowering, but updated in many places
when splitting a bb etc.  So, it needs to behave right for the insert
before behavior, gsi_end_p means insert at the end of bb.
Then m_init_gsi is initially one statement earlier, so insert code
after that statement instead:
  m_init_gsi = m_gsi;
  gsi_prev (&m_init_gsi);
with gsi_end_p meaning insert at the start of bb.

Because all the pass infrastructure is for inserting before rather than
after, when inserting (temporarily) after m_init_gsi, it does
  gimple_stmt_iterator save_gsi = m_gsi;
  m_gsi = m_init_gsi;
  if (gsi_end_p (m_gsi))
m_gsi = gsi_after_labels (gsi_bb (m_gsi));
  else
gsi_next (&m_gsi);
and then it can insert before m_gsi and finally when done there
  m_gsi = save_gsi;
The problematic splitting of the bb is during this temporary override
to insert stuff after m_init_gsi.
For the save_gsi update, I believe I'm reconstructing it from
the stmt if any and set to gsi_end_p if it was gsi_end_p before:
+ if (gsi_end_p (save_gsi))
+   {
+ save_gsi = gsi_last_bb (e->dest);
+ if (!gsi_end_p (save_gsi))
+   gsi_next (&save_gsi);
+   }
+ else
+   save_gsi = gsi_for_stmt (gsi_stmt (save_gsi));
I don't know how else to re-create a gsi_end_p iterator from a bb
(other option I know of would be gsi_start_bb and gsi_prev if !gsi_end_p,
but stmt after gsi_last_bb seems to match more the intent).

Now, regarding m_init_gsi, I think I'll need to play around, maybe
I should have in the end insert after and update behavior rather than
insert after, and that could be achieved by adding
  m_init_gsi = m_gsi;
  gsi_prev (&m_init_gsi);
before the
  m_gsi = save_gsi;
restore in all the 3 places and then no other updates of m_init_gsi would be
needed.  Except gsi_prev likely won't like the gsi_end_p (m_gsi) case,
so maybe
  m_init_gsi = m_gsi;
  if (gsi_end_p (m_init_gsi))
m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
  else
gsi_prev (&m_init_gsi);
  m_gsi = save_gsi;

Jakub



Re: [PATCH] c++: Make g++.dg/opt/pr110879.C require C++11 [PR110879]

2023-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2023 at 10:54:56AM +, Jonathan Wakely wrote:
> Jakub noted in the PR that this test fails for -std=c++98.
> 
> Tested x86_64-linux.
> 
> OK for trunk?
> 
> -- >8 --
> 
> The _M_realloc_insert member does not have the trivial relocation
> optimization for C++98, which seems to be why the _M_end_of_storage
> member does not get optimized away. Make this test unsupported for
> C++98.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR libstdc++/110879
>   * g++.dg/opt/pr110879.C: Require C++11 or later.

Ok, thanks.

> diff --git a/gcc/testsuite/g++.dg/opt/pr110879.C 
> b/gcc/testsuite/g++.dg/opt/pr110879.C
> index 7f0a0a80b8a..57556605a51 100644
> --- a/gcc/testsuite/g++.dg/opt/pr110879.C
> +++ b/gcc/testsuite/g++.dg/opt/pr110879.C
> @@ -1,4 +1,4 @@
> -// { dg-do compile }
> +// { dg-do compile { target c++11 } }
>  // { dg-options "-O3 -fdump-tree-optimized" }
>  
>  #include 
> -- 
> 2.42.0

Jakub



Re: [PATCH v3] RISC-V: Implement TLS Descriptors.

2023-11-23 Thread Florian Weimer
* Tatsuyuki Ishi:

> I've considered gating this behind a GAS feature test, but it seems
> nontrivial especially for restricting the variants available at runtime.
> Since TLS descriptors is not selected by default, I've decided to leave it
> ungated.
>
> In other news, I have made some progress on binutils side, and I'll try to
> update the GAS / ld patch set with relaxation included, by the end of this
> month.

Is there a glibc patch with the run-time implementation already?

I'm curious how you are going to implement saving the vector register
file

Thanks,
Florian



[PATCH] RISC-V: Disable AVL propagation of vrgather instruction

2023-11-23 Thread Juzhe-Zhong
This patch fixes following FAILs in zvl1024b of both RV32/RV64:

FAIL: gcc.c-torture/execute/990128-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/990128-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
FAIL: gcc.c-torture/execute/990128-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
FAIL: gcc.c-torture/execute/990128-1.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/990128-1.c   -O3 -g  execution test
FAIL: gcc.dg/torture/pr58955-2.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test

The root case can be simpliy described in this following small case:

https://godbolt.org/z/7GaxbEGzG

#include "riscv_vector.h"

typedef int64_t v1024b __attribute__ ((vector_size (128)));

void foo (void *out, void *in, int64_t a, int64_t b)
{
  v1024b v = {a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a};
  v1024b v2 = {b,b,b,b,b,b,b,b,b,b,b,b,b,b,b,b};
  v1024b index = *(v1024b*)in;
  v1024b v3 = __builtin_shuffle (v, v2, index);
  __riscv_vse64_v_i64m1 (out, (vint64m1_t)v3, 10);
}

Incorrect ASM:

foo:
li  a5,31
vsetivlizero,10,e64,m1,ta,mu
vmv.v.x v2,a5
vl1re64.v   v1,0(a1)
vmv.v.x v4,a2
vand.vv v1,v1,v2
vmv.v.x v3,a3
vmsgeu.vi   v0,v1,16
vrgather.vv v2,v4,v1   --> AVL = VLMAX according to codes.
vadd.vi v1,v1,-16
vrgather.vv v2,v3,v1,v0.t  --> AVL = VLMAX according to codes.
vse64.v v2,0(a0)   --> AVL = 10 according to codes.
ret

For vrgather dest, source, index instruction, when index may has the value > 
the following store AVL
that is index value > 10.  In this situation, the codes above will end up with:

The source vector of vrgather has undefined value on index >= AVL (which is 10 
in this case).

So disable AVL propagation for vrgather instruction.


PR target/112599

gcc/ChangeLog:

* config/riscv/riscv-avlprop.cc (alv_can_be_propagated_p): New function.
(vlmax_ta_p): Disable vrgather AVL propagation.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112599-1.c: New test.

---
 gcc/config/riscv/riscv-avlprop.cc   | 13 -
 .../gcc.target/riscv/rvv/autovec/pr112599-1.c   | 17 +
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112599-1.c

diff --git a/gcc/config/riscv/riscv-avlprop.cc 
b/gcc/config/riscv/riscv-avlprop.cc
index 1f6ba405342..68b9af07d99 100644
--- a/gcc/config/riscv/riscv-avlprop.cc
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -104,10 +104,21 @@ avlprop_type_to_str (enum avlprop_type type)
 }
 }
 
+/* Return true if the AVL of the INSN can be propagated.  */
+static bool
+alv_can_be_propagated_p (rtx_insn *rinsn)
+{
+  /* The index of "vrgather dest, source, index" may pick up the
+ element which has index >= AVL, so we can't strip the elements
+ that has index >= AVL of source register.  */
+  return get_attr_type (rinsn) != TYPE_VGATHER;
+}
+
 static bool
 vlmax_ta_p (rtx_insn *rinsn)
 {
-  return vlmax_avl_type_p (rinsn) && tail_agnostic_p (rinsn);
+  return vlmax_avl_type_p (rinsn) && tail_agnostic_p (rinsn)
+&& alv_can_be_propagated_p (rinsn);
 }
 
 static machine_mode
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112599-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112599-1.c
new file mode 100644
index 000..911b6922b4a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112599-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh_zfh_zvl1024b -mabi=lp64d -O3 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include "riscv_vector.h"
+
+typedef int64_t v1024b __attribute__ ((vector_size (128)));
+
+void foo (void *out, void *in, int64_t a, int64_t b)
+{
+  v1024b v = {a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a};
+  v1024b v2 = {b,b,b,b,b,b,b,b,b,b,b,b,b,b,b,b};
+  v1024b index = *(v1024b*)in;
+  v1024b v3 = __builtin_shuffle (v, v2, index);
+  __riscv_vse64_v_i64m1 (out, (vint64m1_t)v3, 10);
+}
+
+/* { dg-final { scan-assembler {vsetivli\s+zero,\s*16} } } */
-- 
2.36.3



Re: [PATCH v3] RISC-V: Implement TLS Descriptors.

2023-11-23 Thread Tatsuyuki Ishi
> On Nov 23, 2023, at 19:57, Florian Weimer  wrote:
> 
> * Tatsuyuki Ishi:
> 
>> I've considered gating this behind a GAS feature test, but it seems
>> nontrivial especially for restricting the variants available at runtime.
>> Since TLS descriptors is not selected by default, I've decided to leave it
>> ungated.
>> 
>> In other news, I have made some progress on binutils side, and I'll try to
>> update the GAS / ld patch set with relaxation included, by the end of this
>> month.
> 
> Is there a glibc patch with the run-time implementation already?
> 
> I'm curious how you are going to implement saving the vector register
> file

There is, please see [1].  The vector register file handling is missing right
now as I’m not sure if we have agreed upon a calling convention for RVV.

In the spec, I have already specified the interaction with RVV:

> Any other registers are callee-saved. This includes any vector registers
when the vector extension is supported.

Once the calling convention is decided, I will add saving of all caller-saved
registers into the TLSDESC stub.

[1]: 
https://inbox.sourceware.org/libc-alpha/20230914084033.222120-1-ishitatsuy...@gmail.com/

> Thanks,
> Florian
> 



[aarch64] PR111702 - ICE in insert_regs after interleave+zip1 vector initialization patch

2023-11-23 Thread Prathamesh Kulkarni
Hi Richard,
For the test-case mentioned in PR111702, compiling with -O2
-frounding-math -fstack-protector-all results in following ICE during
cse2 pass:

test.c: In function 'foo':
test.c:119:1: internal compiler error: in insert_regs, at cse.cc:1120
  119 | }
  | ^
0xb7ebb0 insert_regs
../../gcc/gcc/cse.cc:1120
0x1f95134 merge_equiv_classes
../../gcc/gcc/cse.cc:1764
0x1f9b9ab cse_insn
../../gcc/gcc/cse.cc:4793
0x1f9fe30 cse_extended_basic_block
../../gcc/gcc/cse.cc:6577
0x1f9fe30 cse_main
../../gcc/gcc/cse.cc:6722
0x1fa0984 rest_of_handle_cse2
../../gcc/gcc/cse.cc:7620
0x1fa0984 execute
../../gcc/gcc/cse.cc:7675

This happens only with interleave+zip1 vector initialization with
-frounding-math -fstack-protector-all, while it compiles OK without
-fstack-protector-all. Also, it compiles OK with fallback sequence
code-gen (with or without -fstack-protector-all). Unfortunately, I
haven't been able to reduce the test-case further :/

>From the test-case, it seems only the vector initializer for type J
uses interleave+zip1 approach, while rest of the vector initializers
use fallback sequence.

J is defined as:
typedef _Float16 __attribute__((__vector_size__ (16))) J;

and the initializer is:
(J) { 11654, 4801, 5535, 9743, 61680}

interleave+zip1 sequence for above initializer J:
mode = V8HF

vals: (parallel:V8HF [
(reg:HF 642)
(reg:HF 645)
(reg:HF 648)
(reg:HF 651)
(reg:HF 654)
(const_double:HF 0.0 [0x0.0p+0]) repeated x3
])

target: (reg:V8HF 641)
seq:
(insn 1058 0 1059 (set (reg:V4HF 657)
(const_vector:V4HF [
(const_double:HF 0.0 [0x0.0p+0]) repeated x4
])) "test.c":81:8 -1
 (nil))
(insn 1059 1058 1060 (set (reg:V4HF 657)
(vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 642))
(reg:V4HF 657)
(const_int 1 [0x1]))) "test.c":81:8 -1
 (nil))
(insn 1060 1059 1061 (set (reg:V4HF 657)
(vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 648))
(reg:V4HF 657)
(const_int 2 [0x2]))) "test.c":81:8 -1
 (nil))
(insn 1061 1060 1062 (set (reg:V4HF 657)
(vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 654))
(reg:V4HF 657)
(const_int 4 [0x4]))) "test.c":81:8 -1
 (nil))
(insn 1062 1061 1063 (set (reg:V4HF 658)
(const_vector:V4HF [
(const_double:HF 0.0 [0x0.0p+0]) repeated x4
])) "test.c":81:8 -1
 (nil))
(insn 1063 1062 1064 (set (reg:V4HF 658)
(vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 645))
(reg:V4HF 658)
(const_int 1 [0x1]))) "test.c":81:8 -1
 (nil))
(insn 1064 1063 1065 (set (reg:V4HF 658)
(vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 651))
(reg:V4HF 658)
(const_int 2 [0x2]))) "test.c":81:8 -1
 (nil))
(insn 1065 1064 0 (set (reg:V8HF 641)
(unspec:V8HF [
(subreg:V8HF (reg:V4HF 657) 0)
(subreg:V8HF (reg:V4HF 658) 0)
] UNSPEC_ZIP1)) "test.c":81:8 -1
 (nil))

It seems to me that the above sequence correctly initializes the
vector into r641 ?
insns 1058-1061 construct r657 = { r642, r648, r654, 0 }
insns 1062-1064 construct r658 = { r645, r651, 0, 0 }
and zip1 will create r641 = { r642, r645, r648, r651, r654, 0, 0, 0 }

For the above test, it seems that with interleave+zip1 approach and
-fstack-protector-all,
in cse pass, there are two separate equivalence classes created for
(const_int 1), that need
to be merged in cse_insn:

   if (elt->first_same_value != src_eqv_elt->first_same_value)
{
  /* The REG_EQUAL is indicating that two formerly distinct
 classes are now equivalent.  So merge them.  */
  merge_equiv_classes (elt, src_eqv_elt);

elt equivalence chain:
Equivalence chain for (subreg:QI (reg:V16QI 671) 0):
(subreg:QI (reg:V16QI 671) 0)
(const_int 1 [0x1])

src_eqv_elt equivalence chain:
Equivalence chain for (const_int 1 [0x1]):
(reg:QI 34 v2)
(reg:QI 32 v0)
(reg:QI 34 v2)
(const_int 1 [0x1])
(vec_select:QI (reg:V16QI 671)
(parallel [
(const_int 1 [0x1])
]))
(vec_select:QI (reg:V16QI 32 v0)
(parallel [
(const_int 1 [0x1])
]))
(vec_select:QI (reg:V16QI 33 v1)
(parallel [
(const_int 2 [0x2])
]))
(vec_select:QI (reg:V16QI 33 v1)
(parallel [
(const_int 1 [0x1])
]))

The issue is that merge_equiv_classes doesn't seem to deal correctly with
multiple occurences of same register in class2 (src_eqv_elt), which
has two occurrences of
(reg:QI 34 v2)

In merge_equiv_classes, on first iteration, it will remove (reg:QI 34)
from reg_equiv_table
by calling delete_equiv_reg(34), and in insert_regs it will create an
entry for (reg:QI 34) in qty_table with new quantity number, and
create new equivalence in reg_eqv_table.

When we again come across (reg:QI 34) in class2, it will
unconditional

Re: [PATCH v3] RISC-V: Implement TLS Descriptors.

2023-11-23 Thread Florian Weimer
* Tatsuyuki Ishi:

> There is, please see [1].  The vector register file handling is missing right
> now as I’m not sure if we have agreed upon a calling convention for RVV.

> [1]: 
> https://inbox.sourceware.org/libc-alpha/20230914084033.222120-1-ishitatsuy...@gmail.com/

Thank you, I have raised my concern on the other thread.

Thanks,
Florian



Re: PR111754

2023-11-23 Thread Prathamesh Kulkarni
On Wed, 15 Nov 2023 at 20:44, Prathamesh Kulkarni
 wrote:
>
> On Wed, 8 Nov 2023 at 21:57, Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 26 Oct 2023 at 09:43, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Thu, 26 Oct 2023 at 04:09, Richard Sandiford
> > >  wrote:
> > > >
> > > > Prathamesh Kulkarni  writes:
> > > > > On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
> > > > >  wrote:
> > > > >>
> > > > >> Hi,
> > > > >>
> > > > >> Sorry the slow review.  I clearly didn't think this through properly
> > > > >> when doing the review of the original patch, so I wanted to spend
> > > > >> some time working on the code to get a better understanding of
> > > > >> the problem.
> > > > >>
> > > > >> Prathamesh Kulkarni  writes:
> > > > >> > Hi,
> > > > >> > For the following test-case:
> > > > >> >
> > > > >> > typedef float __attribute__((__vector_size__ (16))) F;
> > > > >> > F foo (F a, F b)
> > > > >> > {
> > > > >> >   F v = (F) { 9 };
> > > > >> >   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > > > >> > }
> > > > >> >
> > > > >> > Compiling with -O2 results in following ICE:
> > > > >> > foo.c: In function ‘foo’:
> > > > >> > foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> > > > >> > 6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > > > >> >   |  ^~
> > > > >> > 0x7f3185 wi::int_traits
> > > > >> >>::decompose(long*, unsigned int, std::pair
> > > > >> > const&)
> > > > >> > ../../gcc/gcc/rtl.h:2314
> > > > >> > 0x7f3185 wide_int_ref_storage > > > >> > false>::wide_int_ref_storage
> > > > >> >>(std::pair const&)
> > > > >> > ../../gcc/gcc/wide-int.h:1089
> > > > >> > 0x7f3185 generic_wide_int
> > > > >> >>::generic_wide_int
> > > > >> >>(std::pair const&)
> > > > >> > ../../gcc/gcc/wide-int.h:847
> > > > >> > 0x7f3185 poly_int<1u, generic_wide_int > > > >> > false> > >::poly_int
> > > > >> >>(poly_int_full, std::pair const&)
> > > > >> > ../../gcc/gcc/poly-int.h:467
> > > > >> > 0x7f3185 poly_int<1u, generic_wide_int > > > >> > false> > >::poly_int
> > > > >> >>(std::pair const&)
> > > > >> > ../../gcc/gcc/poly-int.h:453
> > > > >> > 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> > > > >> > ../../gcc/gcc/rtl.h:2383
> > > > >> > 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> > > > >> > ../../gcc/gcc/rtx-vector-builder.h:122
> > > > >> > 0xfd4e1b vector_builder > > > >> > rtx_vector_builder>::elt(unsigned int) const
> > > > >> > ../../gcc/gcc/vector-builder.h:253
> > > > >> > 0xfd4d11 rtx_vector_builder::build()
> > > > >> > ../../gcc/gcc/rtx-vector-builder.cc:73
> > > > >> > 0xc21d9c const_vector_from_tree
> > > > >> > ../../gcc/gcc/expr.cc:13487
> > > > >> > 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > > > >> > expand_modifier, rtx_def**, bool)
> > > > >> > ../../gcc/gcc/expr.cc:11059
> > > > >> > 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, 
> > > > >> > expand_modifier)
> > > > >> > ../../gcc/gcc/expr.h:310
> > > > >> > 0xaee682 expand_return
> > > > >> > ../../gcc/gcc/cfgexpand.cc:3809
> > > > >> > 0xaee682 expand_gimple_stmt_1
> > > > >> > ../../gcc/gcc/cfgexpand.cc:3918
> > > > >> > 0xaee682 expand_gimple_stmt
> > > > >> > ../../gcc/gcc/cfgexpand.cc:4044
> > > > >> > 0xaf28f0 expand_gimple_basic_block
> > > > >> > ../../gcc/gcc/cfgexpand.cc:6100
> > > > >> > 0xaf4996 execute
> > > > >> > ../../gcc/gcc/cfgexpand.cc:6835
> > > > >> >
> > > > >> > IIUC, the issue is that fold_vec_perm returns a vector having 
> > > > >> > float element
> > > > >> > type with res_nelts_per_pattern == 3, and later ICE's when it tries
> > > > >> > to derive element v[3], not present in the encoding, while trying 
> > > > >> > to
> > > > >> > build rtx vector
> > > > >> > in rtx_vector_builder::build():
> > > > >> >  for (unsigned int i = 0; i < nelts; ++i)
> > > > >> > RTVEC_ELT (v, i) = elt (i);
> > > > >> >
> > > > >> > The attached patch tries to fix this by returning false from
> > > > >> > valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and
> > > > >> > input vector has non-integral element type, so for VLA vectors, it
> > > > >> > will only build result with dup sequence (nelts_per_pattern < 3) 
> > > > >> > for
> > > > >> > non-integral element type.
> > > > >> >
> > > > >> > For VLS vectors, this will still work for stepped sequence since it
> > > > >> > will then use the "VLS exception" in fold_vec_perm_cst, and set:
> > > > >> > res_npattern = res_nelts and
> > > > >> > res_nelts_per_pattern = 1
> > > > >> >
> > > > >> > and fold the above case to:
> > > > >> > F foo (F a, F b)
> > > > >> > {
> > > > >> >[local count: 1073741824]:
> > > > >> >   return { 0.0, 9.0e+0, 0.0, 0.0 };
> > > > >> > }
> > > > >> >
> > > > >> > But I am not sure if this is entirely correct, since:
> > > > >> > tree res = out_elts.buil

Re: [PATCH] expr: Fix &bitint_var handling in initializers [PR112336]

2023-11-23 Thread Richard Biener
On Thu, Nov 23, 2023 at 11:05 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As the following testcase shows, we ICE when trying to emit ADDR_EXPR of
> a bitint variable which doesn't have mode width.
> The problem is in the EXTEND_BITINT stuff which makes sure we treat the
> padding bits on memory reads from user bitint vars as undefined.
> When expanding ADDR_EXPR on such vars inside outside of initializers,
> expand_expr_addr* uses EXPAND_CONST_ADDRESS modifier and EXTEND_BITINT
> does nothing, but in initializers it keeps using EXPAND_INITIALIZER
> modifier.  So, we need to treat EXPAND_INITIALIZER the same as
> EXPAND_CONST_ADDRESS for this regard.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2023-11-23  Jakub Jelinek  
>
> PR middle-end/112336
> * expr.cc (EXTEND_BITINT): Don't call reduce_to_bit_field_precision
> if modifier is EXPAND_INITIALIZER.
>
> * gcc.dg/bitint-41.c: New test.
>
> --- gcc/expr.cc.jj  2023-11-14 18:26:05.401613476 +0100
> +++ gcc/expr.cc 2023-11-22 19:03:59.121599029 +0100
> @@ -10698,6 +10698,7 @@ expand_expr_real_1 (tree exp, rtx target
>  && mode != BLKmode \
>  && modifier != EXPAND_MEMORY   \
>  && modifier != EXPAND_WRITE  
>   \
> +&& modifier != EXPAND_INITIALIZER  \
>  && modifier != EXPAND_CONST_ADDRESS)   \
> ? reduce_to_bit_field_precision ((expr), NULL_RTX, type) : (expr))
>
> --- gcc/testsuite/gcc.dg/bitint-41.c.jj 2023-11-22 19:09:48.986726861 +0100
> +++ gcc/testsuite/gcc.dg/bitint-41.c2023-11-22 19:09:29.804993983 +0100
> @@ -0,0 +1,36 @@
> +/* PR middle-end/112336 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c2x" } */
> +
> +unsigned _BitInt(1) v1;
> +unsigned _BitInt(1) *p1 = &v1;
> +signed _BitInt(2) v2;
> +signed _BitInt(2) *p2 = &v2;
> +unsigned _BitInt(11) v11;
> +unsigned _BitInt(11) *p11 = &v11;
> +signed _BitInt(12) v12;
> +signed _BitInt(12) *p12 = &v12;
> +unsigned _BitInt(21) v21;
> +unsigned _BitInt(21) *p21 = &v21;
> +signed _BitInt(22) v22;
> +signed _BitInt(22) *p22 = &v22;
> +unsigned _BitInt(31) v31;
> +unsigned _BitInt(31) *p31 = &v31;
> +signed _BitInt(32) v32;
> +signed _BitInt(32) *p32 = &v32;
> +unsigned _BitInt(41) v41;
> +unsigned _BitInt(41) *p41 = &v41;
> +signed _BitInt(42) v42;
> +signed _BitInt(42) *p42 = &v42;
> +#if __BITINT_MAXWIDTH__ >= 128
> +unsigned _BitInt(127) v127;
> +unsigned _BitInt(127) *p127 = &v127;
> +signed _BitInt(128) v128;
> +signed _BitInt(128) *p128 = &v128;
> +#endif
> +#if __BITINT_MAXWIDTH__ >= 258
> +unsigned _BitInt(257) v257;
> +unsigned _BitInt(257) *p257 = &v257;
> +signed _BitInt(258) v258;
> +signed _BitInt(258) *p258 = &v258;
> +#endif
>
> Jakub
>


Re: [PATCH] lower-bitint: Fix up -fnon-call-exceptions bit-field load lowering [PR112668]

2023-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2023 at 11:56:33AM +0100, Jakub Jelinek wrote:
> Now, regarding m_init_gsi, I think I'll need to play around, maybe
> I should have in the end insert after and update behavior rather than
> insert after, and that could be achieved by adding
> m_init_gsi = m_gsi;
> gsi_prev (&m_init_gsi);
> before the
> m_gsi = save_gsi;
> restore in all the 3 places and then no other updates of m_init_gsi would be
> needed.  Except gsi_prev likely won't like the gsi_end_p (m_gsi) case,
> so maybe
> m_init_gsi = m_gsi;
> if (gsi_end_p (m_init_gsi))
>   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
> else
>   gsi_prev (&m_init_gsi);
> m_gsi = save_gsi;

That seems to work and I think it is the right thing to do.

Here is an updated patch to do that.  Passed
make check-gcc -j32 -k GCC_TEST_RUN_EXPENSIVE=1 
RUNTESTFLAGS='GCC_TEST_RUN_EXPENSIVE=1 dg.exp=*bitint* dg-torture.exp=*bitint*'
so far.

2023-11-23  Jakub Jelinek  

PR middle-end/112668
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): After
temporarily adding statements after m_init_gsi, update m_init_gsi
such that later additions after it will be after the added statements.
(bitint_large_huge::handle_load): Likewise.  When splitting
gsi_bb (m_init_gsi) basic block, update m_preheader_bb if needed
and update saved m_gsi as well if needed.

* gcc.dg/bitint-40.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2023-11-22 22:55:14.260164718 +0100
+++ gcc/gimple-lower-bitint.cc  2023-11-23 12:39:35.030364243 +0100
@@ -1294,6 +1294,11 @@ bitint_large_huge::handle_cast (tree lhs
  g = gimple_build_assign (n, RSHIFT_EXPR, t, lpm1);
  insert_before (g);
  m_data[save_data_cnt + 1] = add_cast (m_limb_type, n);
+ m_init_gsi = m_gsi;
+ if (gsi_end_p (m_init_gsi))
+   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
+ else
+   gsi_prev (&m_init_gsi);
  m_gsi = save_gsi;
}
  else if (m_upwards_2limb * limb_prec < TYPE_PRECISION (rhs_type))
@@ -1523,6 +1528,11 @@ bitint_large_huge::handle_cast (tree lhs
  insert_before (g);
  rext = add_cast (m_limb_type, gimple_assign_lhs (g));
}
+ m_init_gsi = m_gsi;
+ if (gsi_end_p (m_init_gsi))
+   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
+ else
+   gsi_prev (&m_init_gsi);
  m_gsi = save_gsi;
}
   tree t;
@@ -1687,9 +1697,27 @@ bitint_large_huge::handle_load (gimple *
  edge e = split_block (gsi_bb (m_gsi), g);
  make_edge (e->src, eh_edge->dest, EDGE_EH)->probability
= profile_probability::very_unlikely ();
- m_init_gsi.bb = e->dest;
+ m_gsi = gsi_after_labels (e->dest);
+ if (gsi_bb (save_gsi) == e->src)
+   {
+ if (gsi_end_p (save_gsi))
+   {
+ save_gsi = gsi_last_bb (e->dest);
+ if (!gsi_end_p (save_gsi))
+   gsi_next (&save_gsi);
+   }
+ else
+   save_gsi = gsi_for_stmt (gsi_stmt (save_gsi));
+   }
+ if (m_preheader_bb == e->src)
+   m_preheader_bb = e->dest;
}
}
+ m_init_gsi = m_gsi;
+ if (gsi_end_p (m_init_gsi))
+   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
+ else
+   gsi_prev (&m_init_gsi);
  m_gsi = save_gsi;
  tree out;
  prepare_data_in_out (iv, idx, &out);
--- gcc/testsuite/gcc.dg/bitint-40.c.jj 2023-11-22 13:47:12.380580107 +0100
+++ gcc/testsuite/gcc.dg/bitint-40.c2023-11-22 14:35:50.225842768 +0100
@@ -0,0 +1,29 @@
+/* PR middle-end/112668 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c23 -fnon-call-exceptions" } */
+
+#if __BITINT_MAXWIDTH__ >= 156
+struct T156 { _BitInt(156) a : 2; unsigned _BitInt(156) b : 135; _BitInt(156) 
c : 2; };
+extern void foo156 (struct T156 *);
+
+unsigned _BitInt(156)
+bar156 (int i)
+{
+  struct T156 r156[12];
+  foo156 (&r156[0]);
+  return r156[i].b;
+}
+#endif
+
+#if __BITINT_MAXWIDTH__ >= 495
+struct T495 { _BitInt(495) a : 2; unsigned _BitInt(495) b : 471; _BitInt(495) 
c : 2; };
+extern void foo495 (struct T495 *r495);
+
+unsigned _BitInt(495)
+bar495 (int i)
+{
+  struct T495 r495[12];
+  foo495 (r495);
+  return r495[i].b;
+}
+#endif


Jakub



Re: [PATCH] RISC-V: Disable AVL propagation of vrgather instruction

2023-11-23 Thread Robin Dapp
I was just about to post a similar-ish patch that fixes pr65518.c
but you were faster ;)

Therefore LGTM.  You can add PR/target 112670.

Regards
 Robin


Re: Re: [PATCH] RISC-V: Disable AVL propagation of vrgather instruction

2023-11-23 Thread juzhe.zh...@rivai.ai
Oh. You mean this patch also fixes FLTO failed case ?



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-23 19:55
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Disable AVL propagation of vrgather instruction
I was just about to post a similar-ish patch that fixes pr65518.c
but you were faster ;)
 
Therefore LGTM.  You can add PR/target 112670.
 
Regards
Robin
 


Re: [PATCH] RISC-V: Disable AVL propagation of vrgather instruction

2023-11-23 Thread Robin Dapp
> Oh. You mean this patch also fixes FLTO failed case ?

Yes, it's the same issue.  There we have a fixed vl (known via LTO)
that is being propagated "into" gathers and we end up missing
gather elements.

Regards
 Robin


[Committed V2] RISC-V: Disable AVL propagation of vrgather instruction

2023-11-23 Thread Juzhe-Zhong


This patch fixes following FAILs in zvl1024b of both RV32/RV64:

FAIL: gcc.c-torture/execute/990128-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/990128-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
FAIL: gcc.c-torture/execute/990128-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
FAIL: gcc.c-torture/execute/990128-1.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/990128-1.c   -O3 -g  execution test
FAIL: gcc.dg/torture/pr58955-2.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test

The root case can be simpliy described in this following small case:

https://godbolt.org/z/7GaxbEGzG

#include "riscv_vector.h"

typedef int64_t v1024b __attribute__ ((vector_size (128)));

void foo (void *out, void *in, int64_t a, int64_t b)
{
  v1024b v = {a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a};
  v1024b v2 = {b,b,b,b,b,b,b,b,b,b,b,b,b,b,b,b};
  v1024b index = *(v1024b*)in;
  v1024b v3 = __builtin_shuffle (v, v2, index);
  __riscv_vse64_v_i64m1 (out, (vint64m1_t)v3, 10);
}

Incorrect ASM:

foo:
li  a5,31
vsetivlizero,10,e64,m1,ta,mu
vmv.v.x v2,a5
vl1re64.v   v1,0(a1)
vmv.v.x v4,a2
vand.vv v1,v1,v2
vmv.v.x v3,a3
vmsgeu.vi   v0,v1,16
vrgather.vv v2,v4,v1   --> AVL = VLMAX according to codes.
vadd.vi v1,v1,-16
vrgather.vv v2,v3,v1,v0.t  --> AVL = VLMAX according to codes.
vse64.v v2,0(a0)   --> AVL = 10 according to codes.
ret

For vrgather dest, source, index instruction, when index may has the value > 
the following store AVL
that is index value > 10.  In this situation, the codes above will end up with:

The source vector of vrgather has undefined value on index >= AVL (which is 10 
in this case).

So disable AVL propagation for vrgather instruction.


PR target/112599
PR/target/112670

gcc/ChangeLog:

* config/riscv/riscv-avlprop.cc (alv_can_be_propagated_p): New function.
(vlmax_ta_p): Disable vrgather AVL propagation.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112599-1.c: New test.

---
 gcc/config/riscv/riscv-avlprop.cc   | 13 -
 .../gcc.target/riscv/rvv/autovec/pr112599-1.c   | 17 +
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112599-1.c

diff --git a/gcc/config/riscv/riscv-avlprop.cc 
b/gcc/config/riscv/riscv-avlprop.cc
index 1f6ba405342..68b9af07d99 100644
--- a/gcc/config/riscv/riscv-avlprop.cc
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -104,10 +104,21 @@ avlprop_type_to_str (enum avlprop_type type)
 }
 }
 
+/* Return true if the AVL of the INSN can be propagated.  */
+static bool
+alv_can_be_propagated_p (rtx_insn *rinsn)
+{
+  /* The index of "vrgather dest, source, index" may pick up the
+ element which has index >= AVL, so we can't strip the elements
+ that has index >= AVL of source register.  */
+  return get_attr_type (rinsn) != TYPE_VGATHER;
+}
+
 static bool
 vlmax_ta_p (rtx_insn *rinsn)
 {
-  return vlmax_avl_type_p (rinsn) && tail_agnostic_p (rinsn);
+  return vlmax_avl_type_p (rinsn) && tail_agnostic_p (rinsn)
+&& alv_can_be_propagated_p (rinsn);
 }
 
 static machine_mode
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112599-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112599-1.c
new file mode 100644
index 000..911b6922b4a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112599-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh_zfh_zvl1024b -mabi=lp64d -O3 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include "riscv_vector.h"
+
+typedef int64_t v1024b __attribute__ ((vector_size (128)));
+
+void foo (void *out, void *in, int64_t a, int64_t b)
+{
+  v1024b v = {a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a};
+  v1024b v2 = {b,b,b,b,b,b,b,b,b,b,b,b,b,b,b,b};
+  v1024b index = *(v1024b*)in;
+  v1024b v3 = __builtin_shuffle (v, v2, index);
+  __riscv_vse64_v_i64m1 (out, (vint64m1_t)v3, 10);
+}
+
+/* { dg-final { scan-assembler {vsetivli\s+zero,\s*16} } } */
-- 
2.36.3



Re: Re: [PATCH] RISC-V: Disable AVL propagation of vrgather instruction

2023-11-23 Thread juzhe.zh...@rivai.ai
Thanks Robin.

I have sent V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637921.html 
with adding PR/target 112670

Could you commit it for me ?

I am sorry that make you doing redundant work.
I didn't realize they are same issue :)


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-23 19:58
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Disable AVL propagation of vrgather instruction
> Oh. You mean this patch also fixes FLTO failed case ?
 
Yes, it's the same issue.  There we have a fixed vl (known via LTO)
that is being propagated "into" gathers and we end up missing
gather elements.
 
Regards
Robin
 


Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-23 Thread Xi Ruoyao
On Thu, 2023-11-23 at 18:12 +0800, Xi Ruoyao wrote:
> On Thu, 2023-11-23 at 17:12 +0800, chenglulu wrote:
> > 
> > 在 2023/11/23 下午5:02, Xi Ruoyao 写道:
> > > On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote:
> > > > The fix_truncv4sfv4si2 template is indeed called when debugging with
> > > > gdb.
> > > > 
> > > > So I think we can use define_expand here.
> > > The problem is cases where we want to combine an rint call with float-
> > > to-int conversion:
> > > 
> > > float x[4];
> > > int y[4];
> > > 
> > > void test()
> > > {
> > >   for (int i = 0; i < 4; i++)
> > >   y[i] = __builtin_rintf(x[i]);
> > > }
> > > 
> > > With define_expand we get "vfrint + vftintrz", but with define_insn we
> > > get a single "vftint".
> > > 
> > > Arguably the generic code should try to handle this (PR86609), but it's
> > > "not sure if that's a good idea in general" (comment 1 in the PR) so we
> > > can do this in a target-specific way.
> > > 
> > I tried to use Ofast to compile, and found that a vftint was generated, 
> > and at.006t.gimple appeared.
> > 
> > If O2 was compiled, __builtin_rintf would be generated, but Ofast would 
> > generate __builtin_irintf
> 
> Indeed...  It seems the FE will only generate __builtin_irintf when -
> fno-math-errno -funsafe-math-optimizations.
> 
> But I cannot see why this is necessary (at least for us): the rintf
> function does not set errno at all, and to me using vftint.w.s here is
> safe: if the rounded result can be represented as a 32-bit int,
> obviously there is no issue;  otherwise, per C23 section F.4 we should
> raise FE_INVALID and produce unspecified result.  It seems our ftint.w.s
> instruction has the required semantics.
> 
> +Uros and Joseph for some comment about the expected behavior of
> (int)rintf(x).

I've spent some time reading the code and got some results.

For -fno-math-errno, it's for preventing from converting (int)rintf(x)
to a call to the *external* function irintf(x).  The problem is rintf
never sets errno, but irintf may set errno, this was PR 61876.  However
it's not a problem preventing us from using ftint.w.s because this
instruction does not sets errno.

For -funsafe-math-optimizations, there seems a logic error in
convert_to_integer_1:

  /* Convert e.g. (long)round(d) -> lround(d).  */
  /* If we're converting to char, we may encounter differing behavior
 between converting from double->char vs double->long->char.
 We're in "undefined" territory but we prefer to be conservative,
 so only proceed in "unsafe" math mode.  */
  if (optimize
  && (flag_unsafe_math_optimizations
  || (long_integer_type_node
  && outprec >= TYPE_PRECISION (long_integer_type_node

But shouldn't we compare against integer_type_node here as we're
handling __builtin_irint etc. of which the output is int (not long) in
this block?

Anyway, both constraints does not apply for our ftint.w.s instruction. 
And IMO the second constraint is a target-independent bug which should
be fixed.  The first constraint must remain there, but it's only for
preventing from mistakenly using an external irint (which may set
errno), not the ftint.w.s instruction (it does not even know errno).  So
we should use the target-specific way, i. e. a define_insn, to ensure
the optimization even if -fmath-errno.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[Committed] RISC-V: Add wrapper for emit vec_extract[NFC]

2023-11-23 Thread Juzhe-Zhong
Add wrapper for vec_extract since my following patch will need to call it.
gcc/ChangeLog:

* config/riscv/riscv-protos.h (emit_vec_extract): New function.
* config/riscv/riscv-v.cc (emit_vec_extract): Ditto.
* config/riscv/riscv.cc (riscv_legitimize_move): Refine codes.

---
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv-v.cc | 22 ++
 gcc/config/riscv/riscv.cc   | 12 +---
 3 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 52097fe48cf..c74c2e94a4f 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -558,6 +558,7 @@ void expand_cond_binop (unsigned, rtx *);
 void expand_cond_ternop (unsigned, rtx *);
 void expand_popcount (rtx *);
 void expand_rawmemchr (machine_mode, rtx, rtx, rtx);
+void emit_vec_extract (rtx, rtx, poly_int64);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 24b09c0dd2d..72b96d8339d 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4636,4 +4636,26 @@ can_be_broadcasted_p (rtx op)
   return can_create_pseudo_p () && nonmemory_operand (op, mode);
 }
 
+/* Helper function to emit vec_extract_optab.  */
+void
+emit_vec_extract (rtx target, rtx src, poly_int64 index)
+{
+  machine_mode vmode = GET_MODE (src);
+  machine_mode smode = GET_MODE (target);
+  class expand_operand ops[3];
+  enum insn_code icode
+= convert_optab_handler (vec_extract_optab, vmode, smode);
+  gcc_assert (icode != CODE_FOR_nothing);
+  create_output_operand (&ops[0], target, smode);
+  ops[0].target = 1;
+  create_input_operand (&ops[1], src, vmode);
+  if (index.is_constant ())
+create_integer_operand (&ops[2], index);
+  else
+create_input_operand (&ops[2], gen_int_mode (index, Pmode), Pmode);
+  expand_insn (icode, 3, ops);
+  if (ops[0].value != target)
+emit_move_insn (target, ops[0].value);
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d0efb939bf2..425463ebb18 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2616,14 +2616,10 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
  nunits = nunits * 2;
}
   vmode = riscv_vector::get_vector_mode (smode, nunits).require ();
-  enum insn_code icode
-   = convert_optab_handler (vec_extract_optab, vmode, smode);
-  gcc_assert (icode != CODE_FOR_nothing);
   rtx v = gen_lowpart (vmode, SUBREG_REG (src));
 
   for (unsigned int i = 0; i < num; i++)
{
- class expand_operand ops[3];
  rtx result;
  if (num == 1)
result = dest;
@@ -2631,13 +2627,7 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
result = gen_lowpart (smode, dest);
  else
result = gen_reg_rtx (smode);
- create_output_operand (&ops[0], result, smode);
- ops[0].target = 1;
- create_input_operand (&ops[1], v, vmode);
- create_integer_operand (&ops[2], index + i);
- expand_insn (icode, 3, ops);
- if (ops[0].value != result)
-   emit_move_insn (result, ops[0].value);
+ riscv_vector::emit_vec_extract (result, v, index + i);
 
  if (i == 1)
{
-- 
2.36.3



Re: [PATCH v3 2/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX muh instructions

2023-11-23 Thread chenglulu

LGTM.

Thanks!

在 2023/11/20 上午8:47, Xi Ruoyao 写道:

Removes unnecessary UNSPECs and make the muh instructions useful with
GNU vectors or auto vectorization.

gcc/ChangeLog:

* config/loongarch/simd.md (muh): New code attribute mapping
any_extend to smul_highpart or umul_highpart.
(mul3_highpart): New define_insn.
* config/loongarch/lsx.md (UNSPEC_LSX_VMUH_S): Remove.
(UNSPEC_LSX_VMUH_U): Remove.
(lsx_vmuh_s_): Remove.
(lsx_vmuh_u_): Remove.
* config/loongarch/lasx.md (UNSPEC_LASX_XVMUH_S): Remove.
(UNSPEC_LASX_XVMUH_U): Remove.
(lasx_xvmuh_s_): Remove.
(lasx_xvmuh_u_): Remove.
* config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vmuh_b):
Redefine to standard pattern name.
(CODE_FOR_lsx_vmuh_h): Likewise.
(CODE_FOR_lsx_vmuh_w): Likewise.
(CODE_FOR_lsx_vmuh_d): Likewise.
(CODE_FOR_lsx_vmuh_bu): Likewise.
(CODE_FOR_lsx_vmuh_hu): Likewise.
(CODE_FOR_lsx_vmuh_wu): Likewise.
(CODE_FOR_lsx_vmuh_du): Likewise.
(CODE_FOR_lasx_xvmuh_b): Likewise.
(CODE_FOR_lasx_xvmuh_h): Likewise.
(CODE_FOR_lasx_xvmuh_w): Likewise.
(CODE_FOR_lasx_xvmuh_d): Likewise.
(CODE_FOR_lasx_xvmuh_bu): Likewise.
(CODE_FOR_lasx_xvmuh_hu): Likewise.
(CODE_FOR_lasx_xvmuh_wu): Likewise.
(CODE_FOR_lasx_xvmuh_du): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-muh.c: New test.
---
  gcc/config/loongarch/lasx.md  | 22 
  gcc/config/loongarch/loongarch-builtins.cc| 32 -
  gcc/config/loongarch/lsx.md   | 22 
  gcc/config/loongarch/simd.md  | 16 +
  gcc/testsuite/gcc.target/loongarch/vect-muh.c | 36 +++
  5 files changed, 68 insertions(+), 60 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index d4a56c307c4..023a023b44e 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -68,8 +68,6 @@ (define_c_enum "unspec" [
UNSPEC_LASX_BRANCH
UNSPEC_LASX_BRANCH_V
  
-  UNSPEC_LASX_XVMUH_S

-  UNSPEC_LASX_XVMUH_U
UNSPEC_LASX_MXVEXTW_U
UNSPEC_LASX_XVSLLWIL_S
UNSPEC_LASX_XVSLLWIL_U
@@ -2823,26 +2821,6 @@ (define_insn "neg2"
[(set_attr "type" "simd_logic")
 (set_attr "mode" "")])
  
-(define_insn "lasx_xvmuh_s_"

-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-  (match_operand:ILASX 2 "register_operand" "f")]
- UNSPEC_LASX_XVMUH_S))]
-  "ISA_HAS_LASX"
-  "xvmuh.\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "")])
-
-(define_insn "lasx_xvmuh_u_"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-  (match_operand:ILASX 2 "register_operand" "f")]
- UNSPEC_LASX_XVMUH_U))]
-  "ISA_HAS_LASX"
-  "xvmuh.\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "")])
-
  (define_insn "lasx_xvsllwil_s__"
[(set (match_operand: 0 "register_operand" "=f")
(unspec: [(match_operand:ILASX_WHB 1 "register_operand" "f")
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index cbd833aa283..a6fcc1c731e 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -319,6 +319,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
  #define CODE_FOR_lsx_vmod_hu CODE_FOR_umodv8hi3
  #define CODE_FOR_lsx_vmod_wu CODE_FOR_umodv4si3
  #define CODE_FOR_lsx_vmod_du CODE_FOR_umodv2di3
+#define CODE_FOR_lsx_vmuh_b CODE_FOR_smulv16qi3_highpart
+#define CODE_FOR_lsx_vmuh_h CODE_FOR_smulv8hi3_highpart
+#define CODE_FOR_lsx_vmuh_w CODE_FOR_smulv4si3_highpart
+#define CODE_FOR_lsx_vmuh_d CODE_FOR_smulv2di3_highpart
+#define CODE_FOR_lsx_vmuh_bu CODE_FOR_umulv16qi3_highpart
+#define CODE_FOR_lsx_vmuh_hu CODE_FOR_umulv8hi3_highpart
+#define CODE_FOR_lsx_vmuh_wu CODE_FOR_umulv4si3_highpart
+#define CODE_FOR_lsx_vmuh_du CODE_FOR_umulv2di3_highpart
  #define CODE_FOR_lsx_vmul_b CODE_FOR_mulv16qi3
  #define CODE_FOR_lsx_vmul_h CODE_FOR_mulv8hi3
  #define CODE_FOR_lsx_vmul_w CODE_FOR_mulv4si3
@@ -439,14 +447,6 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
  #define CODE_FOR_lsx_vfnmsub_s CODE_FOR_vfnmsubv4sf4_nmsub4
  #define CODE_FOR_lsx_vfnmsub_d CODE_FOR_vfnmsubv2df4_nmsub4
  
-#define CODE_FOR_lsx_vmuh_b CODE_FOR_lsx_vmuh_s_b

-#define CODE_FOR_lsx_vmuh_h CODE_FOR_lsx_vmuh_s_h
-#define CODE_FOR_lsx_vmuh_w CODE_FOR_lsx_vmuh_s_w
-#define CODE_FOR_lsx_vmuh_d CODE_FOR_lsx_vmuh_s_d
-#define CODE_FOR_lsx_vmuh_bu CODE_FOR_lsx_vmuh_u_bu
-#define CODE_FOR_lsx_vmuh_hu CODE_FOR_lsx_vmuh_u_hu
-#define CODE_FOR_lsx_vmuh_wu CODE_FOR_lsx_vmuh_u_wu

Re: [PATCH v4] Introduce strub: machine-independent stack scrubbing

2023-11-23 Thread Richard Biener
On Thu, Nov 23, 2023 at 11:56 AM Alexandre Oliva  wrote:
>
> Hello, Richi,
>
> Thanks for the extensive review!
>
> On Nov 22, 2023, Richard Biener  wrote:
>
> > On Mon, Nov 20, 2023 at 1:40 PM Alexandre Oliva  wrote:
> >>
> >> On Oct 26, 2023, Alexandre Oliva  wrote:
> >>
> >> >> This is a refreshed and improved version of the version posted back in
> >> >> June.  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621936.html
> >>
> >> > Ping? https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633675.html
> >> > I'm combining the gcc/ipa-strub.cc bits from
> >> > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633526.html
> >>
> >> Ping?
> >> Retested on x86_64-linux-gnu, with and without -fstrub=all.
>
> > @@ -898,7 +899,24 @@ decl_attributes (tree *node, tree attributes, int 
> > flags,
> >TYPE_NAME (tt) = *node;
> >  }
>
> > -  *anode = cur_and_last_decl[0];
> > +  if (*anode != cur_and_last_decl[0])
> > +{
> > +  /* Even if !spec->function_type_required, allow the attribute
> > + handler to request the attribute to be applied to the function
> > + type, rather than to the function pointer type, by setting
> > + cur_and_last_decl[0] to the function type.  */
> > +  if (!fn_ptr_tmp
> > +  && POINTER_TYPE_P (*anode)
> > +  && TREE_TYPE (*anode) == cur_and_last_decl[0]
> > +  && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (*anode)))
> > + {
> > +  fn_ptr_tmp = TREE_TYPE (*anode);
> > +  fn_ptr_quals = TYPE_QUALS (*anode);
> > +  anode = &fn_ptr_tmp;
> > + }
> > +  *anode = cur_and_last_decl[0];
> > +}
> > +
>
> > what is this a workaround for?
>
> For the fact that the strub attribute attaches to types, whether data or
> function types, so we can't have fn_type_req, but when it's a function
> or pointer-to-function type, we want to affect the function type, rather
> than the pointer type, when the attribute has an argument.  The argument
> names the strub mode for a function; that only applies to function
> types, never to data types.
>
> The hunk above introduces the means for the attribute handler to choose
> what to attach the attribute t.
>
> > Isn't there a suitable parsing position for placing the attribute?
>
> It's been a while, but IIRC the need for this first came up in Ada,
> where attributes can't just go anywhere, and it was further complicated
> by the fact that Ada doesn't have first-class function or procedure
> types, only access-to-them, but we needed some means for the attributes
> to apply to the function type.
>
> > +#ifndef STACK_GROWS_DOWNWARD
> > +# define STACK_TOPS GT
> > +#else
> > +# define STACK_TOPS LT
> > +#endif
>
> > according to docs this is defined to 0 or 1 so the above looks wrong
> > (it's always defined).
>
> Ugh.  Thanks, will fix.  (I'm pretty sure I had notes somewhere stating
> that stack-grows-upwards hadn't been tested, and that was for the sheer
> lack of platforms making that choice, but I hoped it wasn't that broken
> :-(
>
> > +  if (optimize < 2 || optimize_size || flag_no_inline)
> > +return NULL_RTX;
>
> > I'm wondering about these checks in the expansions of the builtins,
> > I think this is about inline expanding or emitting a libcall, right?
>
> Yeah.
>
> > I wonder if you should use optimize_function_for_speed (cfun) instead?
> > Usually -fno-inline shouldn't affect such calls, but -fno-builtin-FOO would.
> > I have no strong opinion here though.
>
> I've occasionally wondered whether builtins were the best framework for
> these semi-internal calls.
>
> > The new builtins seem undocumented - usually those are documented
> > within extend.texi
>
> Erhm...  Weird.  I had documentation for them.
>
> (checks)
>
> No, it's there, in extend.texi, right after __builtin_stack_address.
> It's admittedly a big patch :-/
>
> > I guess placing __builtin___strub_enter calls in the code manually
> > will break in interesting ways - if that's not supposed to happen the
> > trick is to embed a space in the name of the built-in.
>
> Yeah, I was a little torn between the choices here.  On the one hand, I
> needed visible symbols for the out-of-line implementations, so I figured
> that trying to hide the builtins wouldn't bring any advantage.
>
> However, I've also designed the builtins with interfaces that would
> avoid disruption even with explicit calls.  __strub_enter and
> __strub_update only initialize or adjust a pointer handed to them.
> __strub_leave will erase things from the top of the stack to the
> pointer, so if the watermark is "active stack", nothing happens, and
> things only get cleared if it points to "unused stack space".  There's
> potential for disruption if one passes a statically-allocated pointer to
> it, but nothing much different from memsetting that memory range, core
> wars-style.
>
> > -symtab_node::reset (void)
> > +symtab_node::reset (bool preserve_comdat_group)
>
> > not sure what for, I'll leave Honza to comment.
>
> This restores the possibility of getting the pre-PR107897 behavior, that
> the stru

Re: [PATCH] lower-bitint: Fix up -fnon-call-exceptions bit-field load lowering [PR112668]

2023-11-23 Thread Richard Biener
On Thu, Nov 23, 2023 at 12:55 PM Jakub Jelinek  wrote:
>
> On Thu, Nov 23, 2023 at 11:56:33AM +0100, Jakub Jelinek wrote:
> > Now, regarding m_init_gsi, I think I'll need to play around, maybe
> > I should have in the end insert after and update behavior rather than
> > insert after, and that could be achieved by adding
> > m_init_gsi = m_gsi;
> > gsi_prev (&m_init_gsi);
> > before the
> > m_gsi = save_gsi;
> > restore in all the 3 places and then no other updates of m_init_gsi would be
> > needed.  Except gsi_prev likely won't like the gsi_end_p (m_gsi) case,
> > so maybe
> > m_init_gsi = m_gsi;
> > if (gsi_end_p (m_init_gsi))
> >   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
> > else
> >   gsi_prev (&m_init_gsi);
> > m_gsi = save_gsi;
>
> That seems to work and I think it is the right thing to do.
>
> Here is an updated patch to do that.  Passed
> make check-gcc -j32 -k GCC_TEST_RUN_EXPENSIVE=1 
> RUNTESTFLAGS='GCC_TEST_RUN_EXPENSIVE=1 dg.exp=*bitint* 
> dg-torture.exp=*bitint*'
> so far.

Looks a bit better.  As for constructing a gsi_end_p () iterator for a
basic-block
I'd simply add a new gsi_end_{bb,seq} ({basic_block,gimple_seq}).

Richard.

> 2023-11-23  Jakub Jelinek  
>
> PR middle-end/112668
> * gimple-lower-bitint.cc (bitint_large_huge::handle_cast): After
> temporarily adding statements after m_init_gsi, update m_init_gsi
> such that later additions after it will be after the added statements.
> (bitint_large_huge::handle_load): Likewise.  When splitting
> gsi_bb (m_init_gsi) basic block, update m_preheader_bb if needed
> and update saved m_gsi as well if needed.
>
> * gcc.dg/bitint-40.c: New test.
>
> --- gcc/gimple-lower-bitint.cc.jj   2023-11-22 22:55:14.260164718 +0100
> +++ gcc/gimple-lower-bitint.cc  2023-11-23 12:39:35.030364243 +0100
> @@ -1294,6 +1294,11 @@ bitint_large_huge::handle_cast (tree lhs
>   g = gimple_build_assign (n, RSHIFT_EXPR, t, lpm1);
>   insert_before (g);
>   m_data[save_data_cnt + 1] = add_cast (m_limb_type, n);
> + m_init_gsi = m_gsi;
> + if (gsi_end_p (m_init_gsi))
> +   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
> + else
> +   gsi_prev (&m_init_gsi);
>   m_gsi = save_gsi;
> }
>   else if (m_upwards_2limb * limb_prec < TYPE_PRECISION (rhs_type))
> @@ -1523,6 +1528,11 @@ bitint_large_huge::handle_cast (tree lhs
>   insert_before (g);
>   rext = add_cast (m_limb_type, gimple_assign_lhs (g));
> }
> + m_init_gsi = m_gsi;
> + if (gsi_end_p (m_init_gsi))
> +   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
> + else
> +   gsi_prev (&m_init_gsi);
>   m_gsi = save_gsi;
> }
>tree t;
> @@ -1687,9 +1697,27 @@ bitint_large_huge::handle_load (gimple *
>   edge e = split_block (gsi_bb (m_gsi), g);
>   make_edge (e->src, eh_edge->dest, EDGE_EH)->probability
> = profile_probability::very_unlikely ();
> - m_init_gsi.bb = e->dest;
> + m_gsi = gsi_after_labels (e->dest);
> + if (gsi_bb (save_gsi) == e->src)
> +   {
> + if (gsi_end_p (save_gsi))
> +   {
> + save_gsi = gsi_last_bb (e->dest);
> + if (!gsi_end_p (save_gsi))
> +   gsi_next (&save_gsi);
> +   }
> + else
> +   save_gsi = gsi_for_stmt (gsi_stmt (save_gsi));
> +   }
> + if (m_preheader_bb == e->src)
> +   m_preheader_bb = e->dest;
> }
> }
> + m_init_gsi = m_gsi;
> + if (gsi_end_p (m_init_gsi))
> +   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
> + else
> +   gsi_prev (&m_init_gsi);
>   m_gsi = save_gsi;
>   tree out;
>   prepare_data_in_out (iv, idx, &out);
> --- gcc/testsuite/gcc.dg/bitint-40.c.jj 2023-11-22 13:47:12.380580107 +0100
> +++ gcc/testsuite/gcc.dg/bitint-40.c2023-11-22 14:35:50.225842768 +0100
> @@ -0,0 +1,29 @@
> +/* PR middle-end/112668 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -fnon-call-exceptions" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 156
> +struct T156 { _BitInt(156) a : 2; unsigned _BitInt(156) b : 135; 
> _BitInt(156) c : 2; };
> +extern void foo156 (struct T156 *);
> +
> +unsigned _BitInt(156)
> +bar156 (int i)
> +{
> +  struct T156 r156[12];
> +  foo156 (&r156[0]);
> +  return r156[i].b;
> +}
> +#endif
> +
> +#if __BITINT_MA

Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact

2023-11-23 Thread Xi Ruoyao
On Thu, 2023-11-23 at 17:14 +0800, chenglulu wrote:
> When I look at this code and compare it to our scalar implementation, it 
> seems
> 
> that our scalar implementation still lacks an "lround".

Should be "lroundeven".  We don't have an instruction for lround :(.

I tried this but it does not work:

-(define_int_iterator LRINT [UNSPEC_FTINT UNSPEC_FTINTRM UNSPEC_FTINTRP])
+(define_int_iterator LRINT
+  [UNSPEC_FTINT UNSPEC_FTINTRM UNSPEC_FTINTRP UNSPEC_FTINTRNE])
 (define_int_attr lrint_pattern [(UNSPEC_FTINT "lrint")
(UNSPEC_FTINTRM "lfloor")
-   (UNSPEC_FTINTRP "lceil")])
+   (UNSPEC_FTINTRP "lceil")
+   (UNSPEC_FTINTRNE "lroundeven")])
 (define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
(UNSPEC_FTINTRM "rm")
-   (UNSPEC_FTINTRP "rp")])
+   (UNSPEC_FTINTRP "rp")
+   (UNSPEC_FTINTRNE "rne")])

The problem is "lroundevenMN2" is not a standard pattern name.  The SIMD
version of ftintrne in patch 1 only works because we are expanding
"roundevenM2" (it's a standard pattern name) to UNSPEC_SIMD_FRINTRNE,
and then a define_insn can match (fix (UNSPEC_SIMD_FRINTRNE op)).  But
for non-SIMD we don't have roundevenM2.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] c++: Clear uninstantiated friend flag when instantiating [PR104234]

2023-11-23 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
access.

-- >8 --

Otherwise attempting to get the originating module declaration ICEs
because the DECL_CHAIN of an instantiated friend template is no longer
its context.

PR c++/104234
PR c++/112580

gcc/cp/ChangeLog:

* pt.cc (tsubst_template_decl):

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr104234.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/pt.cc|  2 ++
 gcc/testsuite/g++.dg/modules/pr104234.C | 16 
 2 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr104234.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index ed681afb5d4..5e10a523e1a 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -14789,6 +14789,8 @@ tsubst_template_decl (tree t, tree args, tsubst_flags_t 
complain,
   if (PRIMARY_TEMPLATE_P (t))
 DECL_PRIMARY_TEMPLATE (r) = r;
 
+  DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (r) = false;
+
   if (!lambda_fntype && !class_p)
 {
   /* Record this non-type partial instantiation.  */
diff --git a/gcc/testsuite/g++.dg/modules/pr104234.C 
b/gcc/testsuite/g++.dg/modules/pr104234.C
new file mode 100644
index 000..d81f0d435bc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr104234.C
@@ -0,0 +1,16 @@
+// PR c++/104234
+// { dg-additional-options "-fmodules-ts" }
+
+template  struct _Node_handle_common {
+  template  friend class _Rb_tree;
+};
+struct _Hashtable {
+  using node_type = _Node_handle_common;
+  node_type __trans_tmp_1;
+};
+template  class _Rb_tree {
+  struct _Rb_tree_impl {
+_Rb_tree_impl();
+  } _M_impl;
+};
+_Rb_tree _M_tmap_;
-- 
2.42.0



[PATCH] RISC-V: Optimize a special case of VLA SLP

2023-11-23 Thread Juzhe-Zhong
When working on fixing bugs of zvl1024b. I notice a special VLA SLP case
can be better optimized.

v = vec_perm (op1, op2, { nunits - 1, nunits, nunits + 1, ... })

Before this patch, we are using genriec approach (vrgather):

vid
vadd.vx
vrgather
vmsgeu
vrgather

With this patch, we use vec_extract + slide1up:

scalar = vec_extract (last element of op1)
v = slide1up (op2, scalar)

I am gonna to run testing zvl128b/zvl256b/zvl512b/zvl1024b of both RV32 and 
RV64.

Ok for trunk if no regression on those testing above ?

PR target/112599

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_extract_and_slide1up_patterns):
(expand_vec_perm_const_1):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112599-2.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 33 
 .../gcc.target/riscv/rvv/autovec/pr112599-2.c | 51 +++
 2 files changed, 84 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112599-2.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 72b96d8339d..2ee1bf5191b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3232,6 +3232,37 @@ shuffle_bswap_pattern (struct expand_vec_perm_d *d)
   return true;
 }
 
+/* Recognize the pattern that can be shuffled by vec_extract and slide1up
+   approach.  */
+
+static bool
+shuffle_extract_and_slide1up_patterns (struct expand_vec_perm_d *d)
+{
+  poly_uint64 nunits = GET_MODE_NUNITS (d->vmode);
+
+  /* Recognize { nunits - 1, nunits, nunits + 1, ... }.  */
+  if (!d->perm.series_p (0, 2, nunits - 1, 2)
+  || !d->perm.series_p (1, 2, nunits, 2))
+return false;
+
+  /* Success! */
+  if (d->testing_p)
+return true;
+
+  /* Extract the last element of the first vector.  */
+  scalar_mode smode = GET_MODE_INNER (d->vmode);
+  rtx tmp = gen_reg_rtx (smode);
+  emit_vec_extract (tmp, d->op0, nunits - 1);
+
+  /* Insert the scalar into element 0.  */
+  unsigned int unspec
+= FLOAT_MODE_P (d->vmode) ? UNSPEC_VFSLIDE1UP : UNSPEC_VSLIDE1UP;
+  insn_code icode = code_for_pred_slide (unspec, d->vmode);
+  rtx ops[] = {d->target, d->op1, tmp};
+  emit_vlmax_insn (icode, BINARY_OP, ops);
+  return true;
+}
+
 /* Recognize the pattern that can be shuffled by generic approach.  */
 
 static bool
@@ -3310,6 +3341,8 @@ expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
return true;
  if (shuffle_bswap_pattern (d))
return true;
+ if (shuffle_extract_and_slide1up_patterns (d))
+   return true;
  if (shuffle_generic_patterns (d))
return true;
  return false;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112599-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112599-2.c
new file mode 100644
index 000..fd87565b054
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112599-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl1024b -mabi=lp64d -O3" } */
+
+struct s { struct s *n; } *p;
+struct s ss;
+#define MAX 10
+struct s sss[MAX];
+int count = 0;
+
+int look( struct s *p, struct s **pp )
+{
+for ( ; p; p = p->n )
+;
+*pp = p;
+count++;
+return( 1 );
+}
+
+void sub( struct s *p, struct s **pp )
+{
+   for ( ; look( p, pp ); ) {
+if ( p )
+p = p->n;
+else
+break;
+   }
+}
+
+int
+foo(void)
+{
+struct s *pp;
+struct s *next;
+int i;
+
+p = &ss;
+next = p;
+for ( i = 0; i < MAX; i++ ) {
+next->n = &sss[i];
+next = next->n;
+}
+next->n = 0;
+
+sub( p, &pp );
+if (count != MAX+2)
+  __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not {vrgather} } } */
+/* { dg-final { scan-assembler-times {vslide1up\.vx} 1 } } */
-- 
2.36.3



Re: [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores

2023-11-23 Thread Jan-Benedict Glaw
On Thu, 2023-11-16 15:26:14 +, Christophe Lyon  
wrote:
> diff --git a/gcc/config/arm/arm-mve-builtins-functions.h 
> b/gcc/config/arm/arm-mve-builtins-functions.h
> index eba1f071af0..6d234a2dd7c 100644
> --- a/gcc/config/arm/arm-mve-builtins-functions.h
> +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> @@ -966,6 +966,62 @@ public:
[...]

> +class full_width_access : public multi_vector_function
> +{
> +public:
> +  CONSTEXPR full_width_access (unsigned int vectors_per_tuple = 1)
> +: multi_vector_function (vectors_per_tuple) {}
> +
> +  tree
> +  memory_scalar_type (const function_instance &fi) const override
> +  {
> +return fi.scalar_type (0);
> +  }
> +
> +  machine_mode
> +  memory_vector_mode (const function_instance &fi) const override
> +  {
> +machine_mode mode = fi.vector_mode (0);
> +/* Vectors of floating-point are managed in memory as vectors of
> +   integers.  */
> +switch (mode)
> +  {
> +  case E_V4SFmode:
> + mode = E_V4SImode;
> + break;
> +  case E_V8HFmode:
> + mode = E_V8HImode;
> + break;
> +  }

This introduces warnings about many enum values not being handled, so
a default would be good I think. (I do automated builds with
--enable-werror-always, see eg.
http://toolchain.lug-owl.de/laminar/log/gcc-arm-eabi/48)

MfG, JBG

-- 


signature.asc
Description: PGP signature


Re: [PATCH] c++: Clear uninstantiated friend flag when instantiating [PR104234]

2023-11-23 Thread Nathaniel Shead
Sorry, I just noticed I hadn't actually filled in the changelog. It
should say "Clear DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P."

On Thu, Nov 23, 2023 at 11:54 PM Nathaniel Shead
 wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
> access.
>
> -- >8 --
>
> Otherwise attempting to get the originating module declaration ICEs
> because the DECL_CHAIN of an instantiated friend template is no longer
> its context.
>
> PR c++/104234
> PR c++/112580
>
> gcc/cp/ChangeLog:
>
> * pt.cc (tsubst_template_decl):
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/modules/pr104234.C: New test.
>
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/pt.cc|  2 ++
>  gcc/testsuite/g++.dg/modules/pr104234.C | 16 
>  2 files changed, 18 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/modules/pr104234.C
>
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index ed681afb5d4..5e10a523e1a 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -14789,6 +14789,8 @@ tsubst_template_decl (tree t, tree args, 
> tsubst_flags_t complain,
>if (PRIMARY_TEMPLATE_P (t))
>  DECL_PRIMARY_TEMPLATE (r) = r;
>
> +  DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (r) = false;
> +
>if (!lambda_fntype && !class_p)
>  {
>/* Record this non-type partial instantiation.  */
> diff --git a/gcc/testsuite/g++.dg/modules/pr104234.C 
> b/gcc/testsuite/g++.dg/modules/pr104234.C
> new file mode 100644
> index 000..d81f0d435bc
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/pr104234.C
> @@ -0,0 +1,16 @@
> +// PR c++/104234
> +// { dg-additional-options "-fmodules-ts" }
> +
> +template  struct _Node_handle_common {
> +  template  friend class _Rb_tree;
> +};
> +struct _Hashtable {
> +  using node_type = _Node_handle_common;
> +  node_type __trans_tmp_1;
> +};
> +template  class _Rb_tree {
> +  struct _Rb_tree_impl {
> +_Rb_tree_impl();
> +  } _M_impl;
> +};
> +_Rb_tree _M_tmap_;
> --
> 2.42.0
>


[PATCH] gcov: No atomic ops for -fprofile-update=single

2023-11-23 Thread Sebastian Huber
gcc/ChangeLog:

PR tree-optimization/112678

* tree-profile.cc (tree_profiling): Do not use atomic operations
for -fprofile-update=single.
---
 gcc/tree-profile.cc | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index 1ac0fdb3bc98..9c8fdb8b18f4 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -767,6 +767,7 @@ tree_profiling (void)
 = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
   bool have_atomic_8
 = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
+  bool needs_split = gcov_type_size == 8 && !have_atomic_8 && have_atomic_4;
   if (!can_support_atomic)
 {
   if (gcov_type_size == 4)
@@ -775,6 +776,9 @@ tree_profiling (void)
can_support_atomic = have_atomic_8;
 }
 
+  if (flag_profile_update != PROFILE_UPDATE_SINGLE && needs_split)
+counter_update = COUNTER_UPDATE_ATOMIC_PARTIAL;
+
   if (flag_profile_update == PROFILE_UPDATE_ATOMIC
   && !can_support_atomic)
 {
@@ -788,13 +792,11 @@ tree_profiling (void)
 
   if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
 {
-  if (gcov_type_size == 8 && !have_atomic_8 && have_atomic_4)
+  if (needs_split)
counter_update = COUNTER_UPDATE_ATOMIC_SPLIT;
   else
counter_update = COUNTER_UPDATE_ATOMIC_BUILTIN;
 }
-  else if (gcov_type_size == 8 && have_atomic_4)
-counter_update = COUNTER_UPDATE_ATOMIC_PARTIAL;
 
   /* This is a small-ipa pass that gets called only once, from
  cgraphunit.cc:ipa_passes().  */
-- 
2.35.3



[PATCH] lower-bitint, v3: Fix up -fnon-call-exceptions bit-field load lowering [PR112668]

2023-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2023 at 01:10:02PM +0100, Richard Biener wrote:
> Looks a bit better.  As for constructing a gsi_end_p () iterator for a
> basic-block
> I'd simply add a new gsi_end_{bb,seq} ({basic_block,gimple_seq}).

Ok, here it is (just used gsi_end without _seq suffix for gimple_seq &
because it is then consistent with gsi_start/gsi_last etc.).

2023-11-23  Jakub Jelinek  

PR middle-end/112668
* gimple-iterator.h (gsi_end, gsi_end_bb): New inline functions.
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): After
temporarily adding statements after m_init_gsi, update m_init_gsi
such that later additions after it will be after the added statements.
(bitint_large_huge::handle_load): Likewise.  When splitting
gsi_bb (m_init_gsi) basic block, update m_preheader_bb if needed
and update saved m_gsi as well if needed.
(bitint_large_huge::lower_mergeable_stmt,
bitint_large_huge::lower_comparison_stmt,
bitint_large_huge::lower_mul_overflow,
bitint_large_huge::lower_bit_query): Use gsi_end_bb.

* gcc.dg/bitint-40.c: New test.

--- gcc/gimple-iterator.h.jj2023-04-22 10:23:40.628612517 +0200
+++ gcc/gimple-iterator.h   2023-11-23 14:46:28.371861488 +0100
@@ -169,6 +169,41 @@ gsi_last_bb (basic_block bb)
   return i;
 }
 
+/* Return a new iterator pointing to before the first statement or after
+   last statement (depending on whether adding statements after it or before 
it)
+   in a GIMPLE_SEQ.  */
+
+inline gimple_stmt_iterator
+gsi_end (gimple_seq &seq)
+{
+  gimple_stmt_iterator i;
+  gimple *g = gimple_seq_last (seq);
+
+  i.ptr = NULL;
+  i.seq = &seq;
+  i.bb = g ? gimple_bb (g) : NULL;
+
+  return i;
+}
+
+/* Return a new iterator pointing to before the first statement or after
+   last statement (depending on whether adding statements after it or before 
it)
+   in basic block BB.  */
+
+inline gimple_stmt_iterator
+gsi_end_bb (basic_block bb)
+{
+  gimple_stmt_iterator i;
+  gimple_seq *seq;
+
+  seq = bb_seq_addr (bb);
+  i.ptr = NULL;
+  i.seq = seq;
+  i.bb = bb;
+
+  return i;
+}
+
 /* Return true if I is at the end of its sequence.  */
 
 inline bool
--- gcc/gimple-lower-bitint.cc.jj   2023-11-23 12:55:16.967225422 +0100
+++ gcc/gimple-lower-bitint.cc  2023-11-23 14:30:02.830662509 +0100
@@ -1294,6 +1294,11 @@ bitint_large_huge::handle_cast (tree lhs
  g = gimple_build_assign (n, RSHIFT_EXPR, t, lpm1);
  insert_before (g);
  m_data[save_data_cnt + 1] = add_cast (m_limb_type, n);
+ m_init_gsi = m_gsi;
+ if (gsi_end_p (m_init_gsi))
+   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
+ else
+   gsi_prev (&m_init_gsi);
  m_gsi = save_gsi;
}
  else if (m_upwards_2limb * limb_prec < TYPE_PRECISION (rhs_type))
@@ -1523,6 +1528,11 @@ bitint_large_huge::handle_cast (tree lhs
  insert_before (g);
  rext = add_cast (m_limb_type, gimple_assign_lhs (g));
}
+ m_init_gsi = m_gsi;
+ if (gsi_end_p (m_init_gsi))
+   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
+ else
+   gsi_prev (&m_init_gsi);
  m_gsi = save_gsi;
}
   tree t;
@@ -1687,9 +1697,23 @@ bitint_large_huge::handle_load (gimple *
  edge e = split_block (gsi_bb (m_gsi), g);
  make_edge (e->src, eh_edge->dest, EDGE_EH)->probability
= profile_probability::very_unlikely ();
- m_init_gsi.bb = e->dest;
+ m_gsi = gsi_after_labels (e->dest);
+ if (gsi_bb (save_gsi) == e->src)
+   {
+ if (gsi_end_p (save_gsi))
+   save_gsi = gsi_end_bb (e->dest);
+ else
+   save_gsi = gsi_for_stmt (gsi_stmt (save_gsi));
+   }
+ if (m_preheader_bb == e->src)
+   m_preheader_bb = e->dest;
}
}
+ m_init_gsi = m_gsi;
+ if (gsi_end_p (m_init_gsi))
+   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
+ else
+   gsi_prev (&m_init_gsi);
  m_gsi = save_gsi;
  tree out;
  prepare_data_in_out (iv, idx, &out);
@@ -2359,11 +2383,7 @@ bitint_large_huge::lower_mergeable_stmt
   edge e = split_block (gsi_bb (gsi), gsi_stmt (gsi));
   edge_bb = e->src;
   if (kind == bitint_prec_large)
-   {
- m_gsi = gsi_last_bb (edge_bb);
- if (!gsi_end_p (m_gsi))
-   gsi_next (&m_gsi);
-   }
+   m_gsi = gsi_end_bb (edge_bb);
 }
   else
 m_after_stmt = stmt;
@@ -2816,9 +2836,7 @@ bitint_large_huge::lower_comparison_stmt
   gsi_prev (&gsi);
   edge e = split_block (gsi_bb (gsi), gsi_stmt (gsi

Re: [PATCH V2] introduce light expander sra

2023-11-23 Thread Richard Biener
On Fri, Oct 27, 2023 at 3:51 AM Jiufu Guo  wrote:
>
> Hi,
>
> Compare with previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632399.html
> This verion supports TI/VEC mode of the access.
>
> There are a few PRs (meta-bug PR101926) on various targets.
> The root causes of them are similar: the aggeragte param/
> returns are passed by multi-registers, but they are stored
> to stack from registers first; and then, access the
> parameter through stack slot.
>
> A general idea to enhance this: accessing the aggregate
> parameters/returns directly through incoming/outgoing
> scalar registers.  This idea would be a kind of SRA.
>
> This experimental patch for light-expander-sra contains
> below parts:
>
> a. Check if the parameters/returns are ok/profitable to
>scalarize, and set the incoming/outgoing registers(
>pseudos) for the parameter/return.
>   - This is done in "expand_function_start", after the
> incoming/outgoing hard registers are determined for the
> paramter/return.
> The scalarized registers are recorded in DECL_RTL for
> the parameter/return in parallel form.
>   - At the time when setting DECL_RTL, "scalarizable_aggregate"
> is called to check the accesses are ok/profitable to
> scalarize.
> We can continue to enhance this function, to support
> more cases.  For example:
> - 'reverse storage order'.
> - 'writing to parameter'/'overlap accesses'.
>
> b. When expanding the accesses of the parameters/returns,
>according to the info of the access(e.g. bitpos,bitsize,
>mode), the scalar(pseudos) can be figured out to expand
>the access.  This may happen when expand below accesses:
>   - The component access of a parameter: "_1 = arg.f1".
> Or whole parameter access: rhs of "_2 = arg"
>   - The assignment to a return val:
> "D.xx = yy; or D.xx.f = zz" where D.xx occurs on return
> stmt.
>   - This is mainly done in expr.cc(expand_expr_real_1, and
> expand_assignment).  Function "extract_sub_member" is
> used to figure out the scalar rtxs(pseudos).
>
> Besides the above two parts, some work are done in the GIMPLE
> tree:  collect sra candidates for parameters/returns, and
> collect the SRA access info.
> This is mainly done at the beginning of the expander pass.
> Below are two major items of this part.
>  - Collect light-expand-sra candidates.
>   Each parameter is checked if it has the proper aggregate
>   type.  Collect return val (VAR_P) on each return stmts if
>   the function is returning via registers.
>   This is implemented in expand_sra::collect_sra_candidates.
>
>  - Build/collect/manage all the access on the candidates.
>   The function "scan_function" is used to do this work, it
>   goes through all basicblocks, and all interesting stmts (
>   phi, return, assign, call, asm) are checked.
>   If there is an interesting expression (e.g. COMPONENT_REF
>   or PARM_DECL), then record the required info for the access
>   (e.g. pos, size, type, base).
>   And if it is risky to do SRA, the candidates may be removed.
>   e.g. address-taken and accessed via memory.
>   "foo(struct S arg) {bar (&arg);}"
>
> This patch is tested on ppc64{,le} and x86_64.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
> PR target/65421
>
> gcc/ChangeLog:
>
> * cfgexpand.cc (struct access): New class.
> (struct expand_sra): New class.
> (expand_sra::collect_sra_candidates): New member function.
> (expand_sra::add_sra_candidate): Likewise.
> (expand_sra::build_access): Likewise.
> (expand_sra::analyze_phi): Likewise.
> (expand_sra::analyze_assign): Likewise.
> (expand_sra::visit_base): Likewise.
> (expand_sra::protect_mem_access_in_stmt): Likewise.
> (expand_sra::expand_sra):  Class constructor.
> (expand_sra::~expand_sra): Class destructor.
> (expand_sra::scalarizable_access):  New member function.
> (expand_sra::scalarizable_accesses):  Likewise.
> (scalarizable_aggregate):  New function.
> (set_scalar_rtx_for_returns):  New function.
> (expand_value_return): Updated.
> (expand_debug_expr): Updated.
> (pass_expand::execute): Updated to use expand_sra.
> * cfgexpand.h (scalarizable_aggregate): New declare.
> (set_scalar_rtx_for_returns): New declare.
> * expr.cc (expand_assignment): Updated.
> (expand_constructor): Updated.
> (query_position_in_parallel): New function.
> (extract_sub_member): New function.
> (expand_expr_real_1): Updated.
> * expr.h (query_position_in_parallel): New declare.
> * function.cc (assign_parm_setup_block): Updated.
> (assign_parms): Updated.
> (expand_function_start): Updated.
> * tree-sra.h (struct sra_base_access): New class.
> (struct sra_default_analyzer): New class.
> (scan_function): New template function.
> * va

Re: [PATCH] libgcc: mark __hardcfr_check_fail as always_inline

2023-11-23 Thread Richard Biener
On Wed, Nov 22, 2023 at 3:39 PM Jose E. Marchesi
 wrote:
>
> The function __hardcfr_check_fail in hardcfr.c is internal and static
> inline.  It receives many arguments, which require more than five
> registers to be passed in bpf-none-unknown targets.  BPF is limited to
> that number of registers to pass arguments, and therefore libgcc fails
> to build in that target.  This patch marks the function with the
> always_inline attribute, fixing the bpf build.
>
> Tested in bpf-unknown-none target and x86_64-linux-gnu host.
>
> libgcc/ChangeLog:
>
> * hardcfr.c (__hardcfr_check_fail): Mark as always_inline.
> ---
>  libgcc/hardcfr.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/libgcc/hardcfr.c b/libgcc/hardcfr.c
> index 25ff06742cb..48a87a5a87a 100644
> --- a/libgcc/hardcfr.c
> +++ b/libgcc/hardcfr.c
> @@ -206,7 +206,8 @@ __hardcfr_debug_cfg (size_t const blocks,
> enabled, it also forces __hardcfr_debug_cfg (above) to be compiled into an
> out-of-line function, that could be called from a debugger.
> */
> -static inline void
> +
> +static inline  __attribute__((__always_inline__)) void

can we gate this with

#ifdef __BPF

or so?

>  __hardcfr_check_fail (size_t const blocks ATTRIBUTE_UNUSED,
>   vword const *const visited ATTRIBUTE_UNUSED,
>   vword const *const cfg ATTRIBUTE_UNUSED,
> --
> 2.30.2
>


[Patch] OpenMP: Accept argument to depobj's destroy clause

2023-11-23 Thread Tobias Burnus

I stumbled over this trivial omission which blocks some testcases.

I am not sure whether I have solved the is-same-expr most elegantly,
but I did loosely follow the duplicated-entry check for 'map'. As that's
a restriction to the user, we don't have to catch all and I hope the code
catches the most important violations, doesn't ICE and does not reject
valid code. At least for all real-world code it should™ work, but I
guess for lvalue expressions involving function calls it probably doesn't.

Thoughts, comments?

Tobias

PS: GCC accepts an lvalue expression in C/C++ and only a identifier
for a scalar variable in Fortran, i.e. neither array elements nor
structure components.

Which variant is right depends whether one reads OpenMP 5.1 (lvalue expr,
scalar variable) or 5.2 (variable without permitting array sections or
structure components) - whereas TR12 has the same but talks about
locator list items in one restriction. For the OpenMP mess, see spec
issue #3739.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Accept argument to depobj's destroy clause

Since OpenMP 5.2, the destroy clause takes an depend argument as argument;
for the depobj directive, it the new argument is optional but, if present,
it must be identical to the directive's argument.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_depobj): Accept optionally an argument
	to the destroy clause.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_depobj): Accept optionally an argument
	to the destroy clause.

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_match_omp_depobj): Accept optionally an argument
	to the destroy clause.

libgomp/ChangeLog:

	* libgomp.texi (5.2 Impl. Status): An argument to the destroy clause
	is now supported.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/depobj-3.c: New test.
	* gfortran.dg/gomp/depobj-3.f90: New test.

 gcc/c/c-parser.cc   | 57 ++-
 gcc/cp/parser.cc| 60 -
 gcc/fortran/openmp.cc   | 15 +++-
 gcc/testsuite/c-c++-common/gomp/depobj-3.c  | 40 +++
 gcc/testsuite/gfortran.dg/gomp/depobj-3.f90 | 18 +
 libgomp/libgomp.texi|  2 +-
 6 files changed, 188 insertions(+), 4 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 371dd29557b..378647c1a67 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -21605,6 +21605,9 @@ c_parser_omp_critical (location_t loc, c_parser *parser, bool *if_p)
  destroy
  update (dependence-type)
 
+   OpenMP 5.2 additionally:
+ destroy ( depobj )
+
dependence-type:
  in
  out
@@ -21663,7 +21666,59 @@ c_parser_omp_depobj (c_parser *parser)
 	clause = error_mark_node;
 	}
   else if (!strcmp ("destroy", p))
-	kind = OMP_CLAUSE_DEPEND_LAST;
+	{
+	  matching_parens c_parens;
+	  kind = OMP_CLAUSE_DEPEND_LAST;
+	  if (c_parser_next_token_is (parser, CPP_OPEN_PAREN)
+	  && c_parens.require_open (parser))
+	{
+	  tree destobj = c_parser_expr_no_commas (parser, NULL).value;
+	  /* OpenMP requires that the two expressions are identical; catch
+		 the most common mismatches.  */
+	  if (!lvalue_p (destobj))
+		error_at (EXPR_LOC_OR_LOC (destobj, c_loc),
+			  "% expression is not lvalue expression");
+	  else if (depobj != error_mark_node)
+		{
+		  tree t = depobj;
+		  tree t2 = build_unary_op (EXPR_LOC_OR_LOC (destobj, c_loc),
+	ADDR_EXPR, destobj, false);
+		  if (t2 != error_mark_node)
+		t2 = build_indirect_ref (EXPR_LOC_OR_LOC (t2, c_loc),
+	 t2, RO_UNARY_STAR);
+		  while (TREE_CODE (t) == COMPONENT_REF
+			 || TREE_CODE (t) == ARRAY_REF)
+{
+		  t = TREE_OPERAND (t, 0);
+		  if (TREE_CODE (t) == MEM_REF || INDIRECT_REF_P (t))
+			{
+			  t = TREE_OPERAND (t, 0);
+			  STRIP_NOPS (t);
+			  if (TREE_CODE (t) == POINTER_PLUS_EXPR)
+			t = TREE_OPERAND (t, 0);
+}
+		}
+		  while (TREE_CODE (t2) == COMPONENT_REF
+			 || TREE_CODE (t2) == ARRAY_REF)
+{
+		  t2 = TREE_OPERAND (t2, 0);
+		  if (TREE_CODE (t2) == MEM_REF || INDIRECT_REF_P (t2))
+			{
+			  t2 = TREE_OPERAND (t2, 0);
+			  STRIP_NOPS (t2);
+			  if (TREE_CODE (t2) == POINTER_PLUS_EXPR)
+			t2 = TREE_OPERAND (t2, 0);
+}
+		}
+		  if (DECL_UID (t) != DECL_UID (t2))
+		error_at (EXPR_LOC_OR_LOC (destobj, c_loc),
+			  "the % expression %qE must be the same "
+			  "as the % argument %qE",
+			  destobj, depobj);
+		}
+	  c_parens.skip_until_found_close (parser);
+	}
+	}
   else if (!strcmp ("update", p))
 	{
 	  matching_parens c_parens;
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f6d088bc7

Re: [PATCH] gcov: No atomic ops for -fprofile-update=single

2023-11-23 Thread Richard Biener
On Thu, Nov 23, 2023 at 2:47 PM Sebastian Huber
 wrote:
>
> gcc/ChangeLog:
> PR tree-optimization/112678
>
> * tree-profile.cc (tree_profiling): Do not use atomic operations
> for -fprofile-update=single.
> ---
>  gcc/tree-profile.cc | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index 1ac0fdb3bc98..9c8fdb8b18f4 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -767,6 +767,7 @@ tree_profiling (void)
>  = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
>bool have_atomic_8
>  = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
> +  bool needs_split = gcov_type_size == 8 && !have_atomic_8 && have_atomic_4;
>if (!can_support_atomic)
>  {
>if (gcov_type_size == 4)
> @@ -775,6 +776,9 @@ tree_profiling (void)
> can_support_atomic = have_atomic_8;
>  }
>
> +  if (flag_profile_update != PROFILE_UPDATE_SINGLE && needs_split)
> +counter_update = COUNTER_UPDATE_ATOMIC_PARTIAL;
> +

I wonder if it's cleaner to set can_support_atomic when we can support
it with splitting instead, avoiding a != PROFILE_UPDATE_SINGLE check
here?

Otherwise looks OK.

Richard.

>if (flag_profile_update == PROFILE_UPDATE_ATOMIC
>&& !can_support_atomic)
>  {
> @@ -788,13 +792,11 @@ tree_profiling (void)
>
>if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
>  {
> -  if (gcov_type_size == 8 && !have_atomic_8 && have_atomic_4)
> +  if (needs_split)
> counter_update = COUNTER_UPDATE_ATOMIC_SPLIT;
>else
> counter_update = COUNTER_UPDATE_ATOMIC_BUILTIN;
>  }
> -  else if (gcov_type_size == 8 && have_atomic_4)
> -counter_update = COUNTER_UPDATE_ATOMIC_PARTIAL;
>
>/* This is a small-ipa pass that gets called only once, from
>   cgraphunit.cc:ipa_passes().  */
> --
> 2.35.3
>


Re: [PATCH] lower-bitint, v3: Fix up -fnon-call-exceptions bit-field load lowering [PR112668]

2023-11-23 Thread Richard Biener
On Thu, Nov 23, 2023 at 2:56 PM Jakub Jelinek  wrote:
>
> On Thu, Nov 23, 2023 at 01:10:02PM +0100, Richard Biener wrote:
> > Looks a bit better.  As for constructing a gsi_end_p () iterator for a
> > basic-block
> > I'd simply add a new gsi_end_{bb,seq} ({basic_block,gimple_seq}).
>
> Ok, here it is (just used gsi_end without _seq suffix for gimple_seq &
> because it is then consistent with gsi_start/gsi_last etc.).

OK.  It occurs to me that start + last and now end is somewhat
inconsistent and it should have been first + last, but well ...

Richard.

> 2023-11-23  Jakub Jelinek  
>
> PR middle-end/112668
> * gimple-iterator.h (gsi_end, gsi_end_bb): New inline functions.
> * gimple-lower-bitint.cc (bitint_large_huge::handle_cast): After
> temporarily adding statements after m_init_gsi, update m_init_gsi
> such that later additions after it will be after the added statements.
> (bitint_large_huge::handle_load): Likewise.  When splitting
> gsi_bb (m_init_gsi) basic block, update m_preheader_bb if needed
> and update saved m_gsi as well if needed.
> (bitint_large_huge::lower_mergeable_stmt,
> bitint_large_huge::lower_comparison_stmt,
> bitint_large_huge::lower_mul_overflow,
> bitint_large_huge::lower_bit_query): Use gsi_end_bb.
>
> * gcc.dg/bitint-40.c: New test.
>
> --- gcc/gimple-iterator.h.jj2023-04-22 10:23:40.628612517 +0200
> +++ gcc/gimple-iterator.h   2023-11-23 14:46:28.371861488 +0100
> @@ -169,6 +169,41 @@ gsi_last_bb (basic_block bb)
>return i;
>  }
>
> +/* Return a new iterator pointing to before the first statement or after
> +   last statement (depending on whether adding statements after it or before 
> it)
> +   in a GIMPLE_SEQ.  */
> +
> +inline gimple_stmt_iterator
> +gsi_end (gimple_seq &seq)
> +{
> +  gimple_stmt_iterator i;
> +  gimple *g = gimple_seq_last (seq);
> +
> +  i.ptr = NULL;
> +  i.seq = &seq;
> +  i.bb = g ? gimple_bb (g) : NULL;
> +
> +  return i;
> +}
> +
> +/* Return a new iterator pointing to before the first statement or after
> +   last statement (depending on whether adding statements after it or before 
> it)
> +   in basic block BB.  */
> +
> +inline gimple_stmt_iterator
> +gsi_end_bb (basic_block bb)
> +{
> +  gimple_stmt_iterator i;
> +  gimple_seq *seq;
> +
> +  seq = bb_seq_addr (bb);
> +  i.ptr = NULL;
> +  i.seq = seq;
> +  i.bb = bb;
> +
> +  return i;
> +}
> +
>  /* Return true if I is at the end of its sequence.  */
>
>  inline bool
> --- gcc/gimple-lower-bitint.cc.jj   2023-11-23 12:55:16.967225422 +0100
> +++ gcc/gimple-lower-bitint.cc  2023-11-23 14:30:02.830662509 +0100
> @@ -1294,6 +1294,11 @@ bitint_large_huge::handle_cast (tree lhs
>   g = gimple_build_assign (n, RSHIFT_EXPR, t, lpm1);
>   insert_before (g);
>   m_data[save_data_cnt + 1] = add_cast (m_limb_type, n);
> + m_init_gsi = m_gsi;
> + if (gsi_end_p (m_init_gsi))
> +   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
> + else
> +   gsi_prev (&m_init_gsi);
>   m_gsi = save_gsi;
> }
>   else if (m_upwards_2limb * limb_prec < TYPE_PRECISION (rhs_type))
> @@ -1523,6 +1528,11 @@ bitint_large_huge::handle_cast (tree lhs
>   insert_before (g);
>   rext = add_cast (m_limb_type, gimple_assign_lhs (g));
> }
> + m_init_gsi = m_gsi;
> + if (gsi_end_p (m_init_gsi))
> +   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
> + else
> +   gsi_prev (&m_init_gsi);
>   m_gsi = save_gsi;
> }
>tree t;
> @@ -1687,9 +1697,23 @@ bitint_large_huge::handle_load (gimple *
>   edge e = split_block (gsi_bb (m_gsi), g);
>   make_edge (e->src, eh_edge->dest, EDGE_EH)->probability
> = profile_probability::very_unlikely ();
> - m_init_gsi.bb = e->dest;
> + m_gsi = gsi_after_labels (e->dest);
> + if (gsi_bb (save_gsi) == e->src)
> +   {
> + if (gsi_end_p (save_gsi))
> +   save_gsi = gsi_end_bb (e->dest);
> + else
> +   save_gsi = gsi_for_stmt (gsi_stmt (save_gsi));
> +   }
> + if (m_preheader_bb == e->src)
> +   m_preheader_bb = e->dest;
> }
> }
> + m_init_gsi = m_gsi;
> + if (gsi_end_p (m_init_gsi))
> +   m_init_gsi = gsi_last_bb (gsi_bb (m_init_gsi));
> + else
> +   gsi_prev (&m_init_gsi);
>   m_gsi = save_gsi;
>   tree out;
>   prepare_data_in_out (iv, idx, &out);
> @@ -2359,11 +2383,7 @@ bitint_large_huge::lower_mergeable_stmt
>edge e = split_block (gs

Re: [PATCH] gcov: No atomic ops for -fprofile-update=single

2023-11-23 Thread Sebastian Huber

On 23.11.23 15:19, Richard Biener wrote:

On Thu, Nov 23, 2023 at 2:47 PM Sebastian Huber
  wrote:

gcc/ChangeLog:
 PR tree-optimization/112678

 * tree-profile.cc (tree_profiling): Do not use atomic operations
 for -fprofile-update=single.
---
  gcc/tree-profile.cc | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index 1ac0fdb3bc98..9c8fdb8b18f4 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -767,6 +767,7 @@ tree_profiling (void)
  = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
bool have_atomic_8
  = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
+  bool needs_split = gcov_type_size == 8 && !have_atomic_8 && have_atomic_4;
if (!can_support_atomic)
  {
if (gcov_type_size == 4)
@@ -775,6 +776,9 @@ tree_profiling (void)
 can_support_atomic = have_atomic_8;
  }

+  if (flag_profile_update != PROFILE_UPDATE_SINGLE && needs_split)
+counter_update = COUNTER_UPDATE_ATOMIC_PARTIAL;
+

I wonder if it's cleaner to set can_support_atomic when we can support
it with splitting instead, avoiding a != PROFILE_UPDATE_SINGLE check
here?


The bug was that counter_update was set to COUNTER_UPDATE_ATOMIC_PARTIAL 
for -fprofile-update=single. I don't think we can get rid of the 
flag_profile_update != PROFILE_UPDATE_SINGLE without changing this whole 
code block considerably.


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [Patch] OpenMP: Accept argument to depobj's destroy clause

2023-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2023 at 03:21:41PM +0100, Tobias Burnus wrote:
> I stumbled over this trivial omission which blocks some testcases.
> 
> I am not sure whether I have solved the is-same-expr most elegantly,
> but I did loosely follow the duplicated-entry check for 'map'. As that's
> a restriction to the user, we don't have to catch all and I hope the code
> catches the most important violations, doesn't ICE and does not reject
> valid code. At least for all real-world code it should™ work, but I
> guess for lvalue expressions involving function calls it probably doesn't.
> 
> Thoughts, comments?
> 
> Tobias
> 
> PS: GCC accepts an lvalue expression in C/C++ and only a identifier
> for a scalar variable in Fortran, i.e. neither array elements nor
> structure components.
> 
> Which variant is right depends whether one reads OpenMP 5.1 (lvalue expr,
> scalar variable) or 5.2 (variable without permitting array sections or
> structure components) - whereas TR12 has the same but talks about
> locator list items in one restriction. For the OpenMP mess, see spec
> issue #3739.
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> OpenMP: Accept argument to depobj's destroy clause
> 
> Since OpenMP 5.2, the destroy clause takes an depend argument as argument;
> for the depobj directive, it the new argument is optional but, if present,
> it must be identical to the directive's argument.
> 
> gcc/c/ChangeLog:
> 
>   * c-parser.cc (c_parser_omp_depobj): Accept optionally an argument
>   to the destroy clause.
> 
> gcc/cp/ChangeLog:
> 
>   * parser.cc (cp_parser_omp_depobj): Accept optionally an argument
>   to the destroy clause.
> 
> gcc/fortran/ChangeLog:
> 
>   * openmp.cc (gfc_match_omp_depobj): Accept optionally an argument
>   to the destroy clause.
> 
> libgomp/ChangeLog:
> 
>   * libgomp.texi (5.2 Impl. Status): An argument to the destroy clause
>   is now supported.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/gomp/depobj-3.c: New test.
>   * gfortran.dg/gomp/depobj-3.f90: New test.
> 
>  gcc/c/c-parser.cc   | 57 ++-
>  gcc/cp/parser.cc| 60 
> -
>  gcc/fortran/openmp.cc   | 15 +++-
>  gcc/testsuite/c-c++-common/gomp/depobj-3.c  | 40 +++
>  gcc/testsuite/gfortran.dg/gomp/depobj-3.f90 | 18 +
>  libgomp/libgomp.texi|  2 +-
>  6 files changed, 188 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> index 371dd29557b..378647c1a67 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -21605,6 +21605,9 @@ c_parser_omp_critical (location_t loc, c_parser 
> *parser, bool *if_p)
>   destroy
>   update (dependence-type)
>  
> +   OpenMP 5.2 additionally:
> + destroy ( depobj )
> +
> dependence-type:
>   in
>   out
> @@ -21663,7 +21666,59 @@ c_parser_omp_depobj (c_parser *parser)
>   clause = error_mark_node;
>   }
>else if (!strcmp ("destroy", p))
> - kind = OMP_CLAUSE_DEPEND_LAST;
> + {
> +   matching_parens c_parens;
> +   kind = OMP_CLAUSE_DEPEND_LAST;
> +   if (c_parser_next_token_is (parser, CPP_OPEN_PAREN)
> +   && c_parens.require_open (parser))
> + {
> +   tree destobj = c_parser_expr_no_commas (parser, NULL).value;
> +   /* OpenMP requires that the two expressions are identical; catch
> +  the most common mismatches.  */
> +   if (!lvalue_p (destobj))
> + error_at (EXPR_LOC_OR_LOC (destobj, c_loc),
> +   "% expression is not lvalue expression");
> +   else if (depobj != error_mark_node)
> + {
> +   tree t = depobj;
> +   tree t2 = build_unary_op (EXPR_LOC_OR_LOC (destobj, c_loc),
> + ADDR_EXPR, destobj, false);
> +   if (t2 != error_mark_node)
> + t2 = build_indirect_ref (EXPR_LOC_OR_LOC (t2, c_loc),
> +  t2, RO_UNARY_STAR);

Please watch indentation, seems there is a mix of 8 spaces vs. tabs:

> +   while (TREE_CODE (t) == COMPONENT_REF
> +  || TREE_CODE (t) == ARRAY_REF)
> +{
> +   t = TREE_OPERAND (t, 0);
> +   if (TREE_CODE (t) == MEM_REF || INDIRECT_REF_P (t))
> + {
> +   t = TREE_OPERAND (t, 0);
> +   STRIP_NOPS (t);
> +   if (TREE_CODE (t) == POINTER_PLUS_EXPR)
> + t = TREE_OPERAND (t, 0);
> +}
> + }
> +  

Re: [PATCH] s390: Fix ICE in testcase pr89233

2023-11-23 Thread Andreas Krebbel
On 11/15/23 14:12, Juergen Christ wrote:
> When using GNU vector extensions, an access outside of the vector size
> caused an ICE on s390.  Fix this by aligning with the vec_extract
> builtin, i.e., computing constant index modulo number of lanes.
> 
> Fixes testcase gcc.target/s390/pr89233.c.
> 
> Bootstrapped and tested on s390.  OK for mainline?
> 
> gcc/ChangeLog:
> 
>   * config/s390/vector.md: (*vec_extract) Fix.

Committed to mainline. Thanks!

Andreas



Re: [PATCH] s390: split int128 load

2023-11-23 Thread Andreas Krebbel
On 11/15/23 14:15, Juergen Christ wrote:
> Issue two loads when using GPRs instead of one load-multiple.
> 
> Bootstrapped and tested on s390.  OK for mainline?
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.md: Split TImode loads.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/int128load.c: New test.
> 
> Signed-off-by: Juergen Christ 

Since the testcase is using __int128 it needs to be gated like this to prevent 
it from being tested
with -m31:

/* { dg-do compile { target int128 } } */

Committed to mainline with that change. Thanks!

Andreas



Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact

2023-11-23 Thread chenglulu



在 2023/11/23 下午8:24, Xi Ruoyao 写道:

On Thu, 2023-11-23 at 17:14 +0800, chenglulu wrote:

When I look at this code and compare it to our scalar implementation, it
seems

that our scalar implementation still lacks an "lround".

Should be "lroundeven".  We don't have an instruction for lround :(.

I tried this but it does not work:

-(define_int_iterator LRINT [UNSPEC_FTINT UNSPEC_FTINTRM UNSPEC_FTINTRP])
+(define_int_iterator LRINT
+  [UNSPEC_FTINT UNSPEC_FTINTRM UNSPEC_FTINTRP UNSPEC_FTINTRNE])
  (define_int_attr lrint_pattern [(UNSPEC_FTINT "lrint")
(UNSPEC_FTINTRM "lfloor")
-   (UNSPEC_FTINTRP "lceil")])
+   (UNSPEC_FTINTRP "lceil")
+   (UNSPEC_FTINTRNE "lroundeven")])
  (define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
(UNSPEC_FTINTRM "rm")
-   (UNSPEC_FTINTRP "rp")])
+   (UNSPEC_FTINTRP "rp")
+   (UNSPEC_FTINTRNE "rne")])

The problem is "lroundevenMN2" is not a standard pattern name.  The SIMD
version of ftintrne in patch 1 only works because we are expanding
"roundevenM2" (it's a standard pattern name) to UNSPEC_SIMD_FRINTRNE,
and then a define_insn can match (fix (UNSPEC_SIMD_FRINTRNE op)).  But
for non-SIMD we don't have roundevenM2.


Okay, I understand. I think this is a bit regretful.




Re: [PATCH] s390: implement flags output

2023-11-23 Thread Andreas Krebbel
On 11/15/23 14:15, Juergen Christ wrote:
> Implement flags output for inline assemblies.  Only use one output constraint
> that captures the whole condition code.  No breakout into different condition
> codes is allowed.  Also, only one condition code variable is allowed.
> 
> Add further logic to canonicalize various cases where we combine different
> cases of possible condition codes.
> 
> Bootstrapped and tested on s390.  OK for mainline?
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390-c.cc (s390_cpu_cpp_builtins): Define
>   __GCC_ASM_FLAG_OUTPUTS__.
>   * config/s390/s390.cc (s390_canonicalize_comparison): More
>   UNSPEC_CC_TO_INT cases.
>   (s390_md_asm_adjust): Implement flags output.
>   * config/s390/s390.md (ccstore4): Allow mask operands.
>   * doc/extend.texi: Document flags output.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/ccor.c: New test.
> 
> Signed-off-by: Juergen Christ 

Committed to mainline with a few minor formatting fixes. Thanks!

Andreas



Re: [PATCH] RISC-V: Optimize a special case of VLA SLP

2023-11-23 Thread Robin Dapp
LGTM (and harmless enough) but I'd rather wait for a second look or a
maintainer's OK as we're past stage 1 and it's not a real bugfix.
(On top, it's Thanksgiving so not many people will even notice).

On a related note, this should probably be a middle-end optimization
but before a variable-index vec extract most likely nobody bothered. 

Regards
 Robin


Re: libstdc++: Speed up push_back

2023-11-23 Thread Jan Hubicka
> On Sunday, 19 November 2023 22:53:37 CET Jan Hubicka wrote:
> > Sadly it is really hard to work out this
> > from IPA passes, since we basically care whether the iterator points to
> > the same place as the end pointer, which are both passed by reference.
> > This is inter-procedural value numbering that is quite out of reach.
> 
> I've done a fair share of branching on __builtin_constant_p in 
> std::experimental::simd to improve code-gen. It's powerful! But maybe we 
> also need the other side of the story to tell the optimizer: "I know you 
> can't const-prop everything; but this variable / expression, even if you 
> need to put in a lot of effort, the performance difference will be worth 
> it."
> 
> For std::vector, the remaining capacity could be such a value. The 
> functions f() and g() are equivalent (their code-gen isn't https://
> compiler-explorer.com/z/r44ejK1qz):
> 
> #include 
> 
> auto
> f()
> {
>   std::vector x;
>   x.reserve(10);
>   for (int i = 0; i < 10; ++i)
> x.push_back(0);
>   return x;
> }
> auto
> g()
> { return std::vector(10, 0); }

With my changes at -O3 we now inline push_back, so we could optimize the
first loop to the second. However with 
~/trunk-install/bin/gcc -O3  auto.C  -S -fdump-tree-all-details -fno-exceptions 
-fno-store-merging -fno-tree-slp-vectorize
the fist problem is right at the begining:

   [local count: 97603128]:
  MEM[(struct _Vector_impl_data *)x_4(D)]._M_start = 0B;
  MEM[(struct _Vector_impl_data *)x_4(D)]._M_finish = 0B;
  MEM[(struct _Vector_impl_data *)x_4(D)]._M_end_of_storage = 0B;
  _37 = operator new (40);
  _22 = x_4(D)->D.26019._M_impl.D.25320._M_finish;
  _23 = x_4(D)->D.26019._M_impl.D.25320._M_start;
  _24 = _22 - _23;
  if (_24 > 0)
goto ; [41.48%]
  else
goto ; [58.52%]

So the vector is fist initialized with _M_start=_M_finish=0, but after
call to new we already are not able to propagate this.

This is because x is returned and PTA considers it escaping.  This is
problem discussed in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653
Which shows that it is likely worthwhile to fix PTA to handle this
correctly.


Re: [PATCH] libgcc: mark __hardcfr_check_fail as always_inline

2023-11-23 Thread Jose E. Marchesi


> On Wed, Nov 22, 2023 at 3:39 PM Jose E. Marchesi
>  wrote:
>>
>> The function __hardcfr_check_fail in hardcfr.c is internal and static
>> inline.  It receives many arguments, which require more than five
>> registers to be passed in bpf-none-unknown targets.  BPF is limited to
>> that number of registers to pass arguments, and therefore libgcc fails
>> to build in that target.  This patch marks the function with the
>> always_inline attribute, fixing the bpf build.
>>
>> Tested in bpf-unknown-none target and x86_64-linux-gnu host.
>>
>> libgcc/ChangeLog:
>>
>> * hardcfr.c (__hardcfr_check_fail): Mark as always_inline.
>> ---
>>  libgcc/hardcfr.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/libgcc/hardcfr.c b/libgcc/hardcfr.c
>> index 25ff06742cb..48a87a5a87a 100644
>> --- a/libgcc/hardcfr.c
>> +++ b/libgcc/hardcfr.c
>> @@ -206,7 +206,8 @@ __hardcfr_debug_cfg (size_t const blocks,
>> enabled, it also forces __hardcfr_debug_cfg (above) to be compiled into 
>> an
>> out-of-line function, that could be called from a debugger.
>> */
>> -static inline void
>> +
>> +static inline  __attribute__((__always_inline__)) void
>
> can we gate this with
>
> #ifdef __BPF
>
> or so?

Yep.  Just sent V2.

>>  __hardcfr_check_fail (size_t const blocks ATTRIBUTE_UNUSED,
>>   vword const *const visited ATTRIBUTE_UNUSED,
>>   vword const *const cfg ATTRIBUTE_UNUSED,
>> --
>> 2.30.2
>>


[PATCH V2] libgcc: mark __hardcfr_check_fail as always_inline

2023-11-23 Thread Jose E. Marchesi
[Changes from V1:
- Use always_inline only in BPF target.]

The function __hardcfr_check_fail in hardcfr.c is internal and static
inline.  It receives many arguments, which require more than five
registers to be passed in bpf-none-unknown targets.  BPF is limited to
that number of registers to pass arguments, and therefore libgcc fails
to build in that target.  This patch marks the function with the
always_inline attribute, fixing the bpf build.

Tested in bpf-unknown-none target and x86_64-linux-gnu host.

libgcc/ChangeLog:

* hardcfr.c (__hardcfr_check_fail): Mark as always_inline.
---
 libgcc/hardcfr.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/libgcc/hardcfr.c b/libgcc/hardcfr.c
index 25ff06742cb..376a36202c8 100644
--- a/libgcc/hardcfr.c
+++ b/libgcc/hardcfr.c
@@ -206,6 +206,10 @@ __hardcfr_debug_cfg (size_t const blocks,
enabled, it also forces __hardcfr_debug_cfg (above) to be compiled into an
out-of-line function, that could be called from a debugger.
*/
+
+#ifdef __BPF__
+__attribute__((__always_inline__))
+#endif
 static inline void
 __hardcfr_check_fail (size_t const blocks ATTRIBUTE_UNUSED,
  vword const *const visited ATTRIBUTE_UNUSED,
-- 
2.30.2



Re: [RFC PATCH] i386: Fix ICE with -mforce-indirect-call and -fsplit-stack [PR89316]

2023-11-23 Thread Uros Bizjak
On Mon, Nov 20, 2023 at 5:33 PM Uros Bizjak  wrote:
>
> With the above two options, use a temporary register regno (as returned
> from split_stack_prologue_scratch_regno) as an indirect call scratch
> register to hold __morestack function address.  On 64-bit targets, two
> temporary registers are always available, so load the function address in
> %r11 and call __morestack_large_model with its one-argument-register value
> in %r10.  On 32-bit targets, bail out with a "sorry" if the temporary
> register can not be obtained.
>
> On 32-bit targets, also emit a PIC sequence that re-uses the obtained indirect
> call scratch register before moving the function address to it.  We can
> not set up %ebx PIC register in this case, but __morestack is prepared
> for this situation and sets it up by itself.
>
> PR target/89316
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_expand_split_stack_prologue): Obtain
> scratch regno when flag_force_indirect_call is set.  On 64-bit
> targets, call __morestack_large_model when flag_force_indirect_call
> is set and on 32-bit targets with -fpic, manually expand PIC sequence
> to call __morestack.  Move the function address to an indirect
> call scratch register.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr89316.C: New test.
> * gcc.target/i386/pr112605-1.c: New test.
> * gcc.target/i386/pr112605-2.c: New test.
> * gcc.target/i386/pr112605.c: New test.
>
> Jakub, I'm not entirely sure x86_32 PIC sequence is 100% correct
> (please note that the missing %ebx setup situation is handled in
> __morestack), so I'd be very grateful for your review of this part.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.


Re: [Patch] OpenMP: Accept argument to depobj's destroy clause

2023-11-23 Thread Tobias Burnus

Hi Jakub,

On 23.11.23 15:32, Jakub Jelinek wrote:

On Thu, Nov 23, 2023 at 03:21:41PM +0100, Tobias Burnus wrote:

I stumbled over this trivial omission which blocks some testcases.
I am not sure whether I have solved the is-same-expr most elegantly,

Answer: I didn't - as expected.

+ if (DECL_UID (t) != DECL_UID (t2))
Nothing checks that t and t2 here are decls.  Use operand_equal_p instead?


Yes – I think I can simply use this instead all the other checks. That is
the function I was looking for but couldn't find before.

(I even have use the function before for PR108545.)

I decided that volatileness is fine and using twice a volatile function is
Okay according to the spec - hence, I permit this in addition. (One can
argue about it - but as specifying both is mandatory in OpenMP 6.0, it
seems to make sense.)

Revised version attached.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Accept argument to depobj's destroy clause

Since OpenMP 5.2, the destroy clause takes an depend argument as argument;
for the depobj directive, it the new argument is optional but, if present,
it must be identical to the directive's argument.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_depobj): Accept optionally an argument
	to the destroy clause.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_depobj): Accept optionally an argument
	to the destroy clause.

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_match_omp_depobj): Accept optionally an argument
	to the destroy clause.

libgomp/ChangeLog:

	* libgomp.texi (5.2 Impl. Status): An argument to the destroy clause
	is now supported.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/depobj-3.c: New test.
	* gfortran.dg/gomp/depobj-3.f90: New test.

 gcc/c/c-parser.cc   | 23 +-
 gcc/cp/parser.cc| 24 ++-
 gcc/fortran/openmp.cc   | 15 -
 gcc/testsuite/c-c++-common/gomp/depobj-3.c  | 47 +
 gcc/testsuite/gfortran.dg/gomp/depobj-3.f90 | 18 +++
 libgomp/libgomp.texi|  2 +-
 6 files changed, 125 insertions(+), 4 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 371dd29557b..006aee3e93f 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -21605,6 +21605,9 @@ c_parser_omp_critical (location_t loc, c_parser *parser, bool *if_p)
  destroy
  update (dependence-type)
 
+   OpenMP 5.2 additionally:
+ destroy ( depobj )
+
dependence-type:
  in
  out
@@ -21663,7 +21666,25 @@ c_parser_omp_depobj (c_parser *parser)
 	clause = error_mark_node;
 	}
   else if (!strcmp ("destroy", p))
-	kind = OMP_CLAUSE_DEPEND_LAST;
+	{
+	  matching_parens c_parens;
+	  kind = OMP_CLAUSE_DEPEND_LAST;
+	  if (c_parser_next_token_is (parser, CPP_OPEN_PAREN)
+	  && c_parens.require_open (parser))
+	{
+	  tree destobj = c_parser_expr_no_commas (parser, NULL).value;
+	  if (!lvalue_p (destobj))
+		error_at (EXPR_LOC_OR_LOC (destobj, c_loc),
+			  "% expression is not lvalue expression");
+	  else if (depobj != error_mark_node
+		   && !operand_equal_p (destobj, depobj,
+	OEP_MATCH_SIDE_EFFECTS))
+		error_at (EXPR_LOC_OR_LOC (destobj, c_loc),
+			  "the % expression %qE must be the same as "
+			  "the % argument %qE", destobj, depobj);
+	  c_parens.skip_until_found_close (parser);
+	}
+	}
   else if (!strcmp ("update", p))
 	{
 	  matching_parens c_parens;
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f6d088bc73f..1fca6bff795 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -43173,6 +43173,9 @@ cp_parser_omp_critical (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
  destroy
  update (dependence-type)
 
+   OpenMP 5.2 additionally:
+ destroy ( depobj )
+
dependence-type:
  in
  out
@@ -43219,7 +43222,26 @@ cp_parser_omp_depobj (cp_parser *parser, cp_token *pragma_tok)
 	clause = error_mark_node;
 	}
   else if (!strcmp ("destroy", p))
-	kind = OMP_CLAUSE_DEPEND_LAST;
+	{
+	  kind = OMP_CLAUSE_DEPEND_LAST;
+	  matching_parens c_parens;
+	  if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)
+	  && c_parens.require_open (parser))
+	{
+	  tree destobj = cp_parser_assignment_expression (parser);
+	  if (depobj != error_mark_node
+		  && destobj != error_mark_node
+		  && !operand_equal_p (destobj, depobj, OEP_MATCH_SIDE_EFFECTS))
+		error_at (EXPR_LOC_OR_LOC (destobj, c_loc),
+			  "the % expression %qE must be the same as "
+			  "the % argument %qE", destobj, depobj);
+	  if (!c_parens.require_close (parser))
+		cp_parser_skip_to_closing_parenthesis (parser,
+		   /*recovering=*/true,
+		   /*or_comma=*/false,
+		   /*consume_pa

[committed] i386: Wrong code with __builtin_parityl [PR112672]

2023-11-23 Thread Uros Bizjak
gen_parityhi2_cmp instruction clobbers its input operand, so use
a temporary register in the call to gen_parityhi2_cmp.

PR target/112672

gcc/ChangeLog:

* config/i386/i386.md (parityhi2):
Use temporary register in the call to gen_parityhi2_cmp.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112672.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 99bb909b244..41de9537a40 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -20031,8 +20031,10 @@ (define_expand "parityhi2"
   "! TARGET_POPCNT"
 {
   rtx scratch = gen_reg_rtx (QImode);
+  rtx tmp = gen_reg_rtx (HImode);
 
-  emit_insn (gen_parityhi2_cmp (operands[1]));
+  emit_move_insn (tmp, operands[1]);
+  emit_insn (gen_parityhi2_cmp (tmp));
 
   ix86_expand_setcc (scratch, ORDERED,
 gen_rtx_REG (CCmode, FLAGS_REG), const0_rtx);
diff --git a/gcc/testsuite/gcc.target/i386/pr112672.c 
b/gcc/testsuite/gcc.target/i386/pr112672.c
new file mode 100644
index 000..583e9fdfb8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112672.c
@@ -0,0 +1,23 @@
+/* PR target/112672 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+typedef unsigned short u16;
+
+u16 g = 254;
+
+static inline u16
+foo (u16 u)
+{
+  u *= g;
+  return u + __builtin_parityl (u);
+}
+
+int
+main (void)
+{
+  u16 x = foo (4);
+  if (x != 4 * 254 + 1)
+__builtin_abort ();
+  return 0;
+}


Re: [Patch] OpenMP: Accept argument to depobj's destroy clause

2023-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2023 at 04:21:50PM +0100, Tobias Burnus wrote:
> @@ -21663,7 +21666,25 @@ c_parser_omp_depobj (c_parser *parser)
>   clause = error_mark_node;
>   }
>else if (!strcmp ("destroy", p))
> - kind = OMP_CLAUSE_DEPEND_LAST;
> + {
> +   matching_parens c_parens;
> +   kind = OMP_CLAUSE_DEPEND_LAST;
> +   if (c_parser_next_token_is (parser, CPP_OPEN_PAREN)
> +   && c_parens.require_open (parser))
> + {
> +   tree destobj = c_parser_expr_no_commas (parser, NULL).value;
> +   if (!lvalue_p (destobj))
> + error_at (EXPR_LOC_OR_LOC (destobj, c_loc),
> +   "% expression is not lvalue expression");
> +   else if (depobj != error_mark_node
> +&& !operand_equal_p (destobj, depobj,
> + OEP_MATCH_SIDE_EFFECTS))

There is also OEP_LEXICOGRAPHIC which could be used in addition to that.
The question is if we want to consider say
#pragma depobj (a[++i]) destroy (a[++i])
as same or different (similarly a[foo ()] in both cases).
A function could at least in theory return the same value, for other
side-effects there is some wiggle room in unspecified number of times how
many the side-effects of clauses are evaluated (and for destroy we really
don't intend to evaluate them at all for the clause, just for the directive
argument).

Jakub



Re: libstdc++: Speed up push_back

2023-11-23 Thread Jan Hubicka
> > On Sunday, 19 November 2023 22:53:37 CET Jan Hubicka wrote:
> > > Sadly it is really hard to work out this
> > > from IPA passes, since we basically care whether the iterator points to
> > > the same place as the end pointer, which are both passed by reference.
> > > This is inter-procedural value numbering that is quite out of reach.
> > 
> > I've done a fair share of branching on __builtin_constant_p in 
> > std::experimental::simd to improve code-gen. It's powerful! But maybe we 
> > also need the other side of the story to tell the optimizer: "I know you 
> > can't const-prop everything; but this variable / expression, even if you 
> > need to put in a lot of effort, the performance difference will be worth 
> > it."
> > 
> > For std::vector, the remaining capacity could be such a value. The 
> > functions f() and g() are equivalent (their code-gen isn't https://
> > compiler-explorer.com/z/r44ejK1qz):
> > 
> > #include 
> > 
> > auto
> > f()
> > {
> >   std::vector x;
> >   x.reserve(10);
> >   for (int i = 0; i < 10; ++i)
> > x.push_back(0);
> >   return x;
> > }
> > auto
> > g()
> > { return std::vector(10, 0); }
> 
> With my changes at -O3 we now inline push_back, so we could optimize the
> first loop to the second. However with 
> ~/trunk-install/bin/gcc -O3  auto.C  -S -fdump-tree-all-details 
> -fno-exceptions -fno-store-merging -fno-tree-slp-vectorize
> the fist problem is right at the begining:
> 
>[local count: 97603128]:
>   MEM[(struct _Vector_impl_data *)x_4(D)]._M_start = 0B;
>   MEM[(struct _Vector_impl_data *)x_4(D)]._M_finish = 0B;
>   MEM[(struct _Vector_impl_data *)x_4(D)]._M_end_of_storage = 0B;
>   _37 = operator new (40);

I also wonder, if default operator new and malloc can be handled as not
reading/modifying anything visible to the user code.  That would help
us to propagate here even if we lose track of points-to information.

We have:

  /* If the call is to a replaceable operator delete and results
 from a delete expression as opposed to a direct call to
 such operator, then we can treat it as free.  */
  if (fndecl
  && DECL_IS_OPERATOR_DELETE_P (fndecl)
  && DECL_IS_REPLACEABLE_OPERATOR (fndecl)
  && gimple_call_from_new_or_delete (stmt))
return ". o ";
  /* Similarly operator new can be treated as malloc.  */
  if (fndecl
  && DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl)
  && gimple_call_from_new_or_delete (stmt))
return "m ";
Which informs alias analysis that new returns pointer to memory
not aliasing with anything and that free is not reading anything
from its parameter (but it is modelled as a write to make it clear
that the memory dies).

stmt_kills_ref_p special cases BUILT_IN_FREE but not OPERATOR delete
to make it clear that everything pointed to by it dies.   This is needed
because 'o' only means that some data may be overwritten, but it does
not make it clear that all data dies.

Not handling operator delete seems like an omision, but maybe it is not
too critical since we emit clobbers around destructors that are usually
right before call to delete.  Also ipa-modref kill analysis does not
understand BUILT_IN_FREE nor delete and could.

I wonder if we can handle both as const except for side-effects
described.

Honza
>   _22 = x_4(D)->D.26019._M_impl.D.25320._M_finish;
>   _23 = x_4(D)->D.26019._M_impl.D.25320._M_start;
>   _24 = _22 - _23;
>   if (_24 > 0)
> goto ; [41.48%]
>   else
> goto ; [58.52%]
> 
> So the vector is fist initialized with _M_start=_M_finish=0, but after
> call to new we already are not able to propagate this.
> 
> This is because x is returned and PTA considers it escaping.  This is
> problem discussed in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653
> Which shows that it is likely worthwhile to fix PTA to handle this
> correctly.


Re: libstdc++: Speed up push_back

2023-11-23 Thread Jan Hubicka
Hi,
so if I understand it right, it should be safe to simply replace memmove
by memcpy.  I wonder if we can get rid of the count != 0 check at least
for glibc systems.  In general push_back now need inline-insns-auto to
be 33 to be inlined at -O2


jh@ryzen4:/tmp> cat ~/tt.C
#include 
typedef unsigned int uint32_t;
struct pair_t {uint32_t first, second;};
struct pair_t pair;
void
test()
{
std::vector stack;
stack.push_back (pair);
while (!stack.empty()) {
pair_t cur = stack.back();
stack.pop_back();
if (!cur.first)
{
cur.second++;
stack.push_back (cur);
}
if (cur.second > 1)
break;
}
}
int
main()
{
for (int i = 0; i < 1; i++)
  test();
}

jh@ryzen4:/tmp> ~/trunk-install/bin/g++ ~/tt.C -O2 --param 
max-inline-insns-auto=32 ; time ./a.out

real0m0.399s
user0m0.399s
sys 0m0.000s
jh@ryzen4:/tmp> ~/trunk-install/bin/g++ ~/tt.C -O2 --param 
max-inline-insns-auto=33 ; time ./a.out

real0m0.039s
user0m0.039s
sys 0m0.000s

Current inline limit is 15. We can save
 - 2 insns if inliner knows that conditional guarding
   builtin_unreachable will die (I have patch for this)
 - 4 isnsn if we work out that on 64bit hosts allocating vector with
   2^63 elements is impossible
 - 2 insns if we allow NULL parameter on memcpy
 - 2 insns if we allos NULL parameter on delete
So thi is 23 instructions. Inliner has hinting which could make
push_back reasonable candidate for -O2 inlining and then we could be
able to propagate interesitng stuff across repeated calls to push_back.

libstdc++-v3/ChangeLog:

* include/bits/stl_uninitialized.h (relocate_a_1): Use memcpy instead 
of memmove.

diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h 
b/libstdc++-v3/include/bits/stl_uninitialized.h
index 1282af3bc43..a9b802774c6 100644
--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -1119,14 +1119,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef __cpp_lib_is_constant_evaluated
  if (std::is_constant_evaluated())
{
- // Can't use memmove. Wrap the pointer so that __relocate_a_1
+ // Can't use memcpy. Wrap the pointer so that __relocate_a_1
  // resolves to the non-trivial overload above.
  __gnu_cxx::__normal_iterator<_Tp*, void> __out(__result);
  __out = std::__relocate_a_1(__first, __last, __out, __alloc);
  return __out.base();
}
 #endif
- __builtin_memmove(__result, __first, __count * sizeof(_Tp));
+ __builtin_memcpy(__result, __first, __count * sizeof(_Tp));
}
   return __result + __count;
 }


Re: [PATCH V2] libgcc: mark __hardcfr_check_fail as always_inline

2023-11-23 Thread Richard Biener



> Am 23.11.2023 um 16:17 schrieb Jose E. Marchesi :
> 
> [Changes from V1:
> - Use always_inline only in BPF target.]
> 
> The function __hardcfr_check_fail in hardcfr.c is internal and static
> inline.  It receives many arguments, which require more than five
> registers to be passed in bpf-none-unknown targets.  BPF is limited to
> that number of registers to pass arguments, and therefore libgcc fails
> to build in that target.  This patch marks the function with the
> always_inline attribute, fixing the bpf build.
> 
> Tested in bpf-unknown-none target and x86_64-linux-gnu host.

Ok

Richard 

> libgcc/ChangeLog:
> 
>* hardcfr.c (__hardcfr_check_fail): Mark as always_inline.
> ---
> libgcc/hardcfr.c | 4 
> 1 file changed, 4 insertions(+)
> 
> diff --git a/libgcc/hardcfr.c b/libgcc/hardcfr.c
> index 25ff06742cb..376a36202c8 100644
> --- a/libgcc/hardcfr.c
> +++ b/libgcc/hardcfr.c
> @@ -206,6 +206,10 @@ __hardcfr_debug_cfg (size_t const blocks,
>enabled, it also forces __hardcfr_debug_cfg (above) to be compiled into an
>out-of-line function, that could be called from a debugger.
>*/
> +
> +#ifdef __BPF__
> +__attribute__((__always_inline__))
> +#endif
> static inline void
> __hardcfr_check_fail (size_t const blocks ATTRIBUTE_UNUSED,
>  vword const *const visited ATTRIBUTE_UNUSED,
> --
> 2.30.2
> 


[PATCH] arm: [MVE intrinsics] Add default clause to full_width_access::memory_vector_mode

2023-11-23 Thread Christophe Lyon
My recent commit 0c2037d9d93a8f768cb11698ff794278246bb31f added a
switch statement lacking a default clause, leading to warnings or
errors when building with --enable-werror-always.

Fix by adding an empty default.

Committed as obvious.

2023-11-23  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-functions.h
(full_width_access::memory_vector_mode): Add default clause.
---
 gcc/config/arm/arm-mve-builtins-functions.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-functions.h 
b/gcc/config/arm/arm-mve-builtins-functions.h
index 6d234a2dd7c..1c93e6436b5 100644
--- a/gcc/config/arm/arm-mve-builtins-functions.h
+++ b/gcc/config/arm/arm-mve-builtins-functions.h
@@ -1013,6 +1013,8 @@ public:
   case E_V8HFmode:
mode = E_V8HImode;
break;
+  default:
+   break;
   }
 
 if (m_vectors_per_tuple != 1)
-- 
2.34.1



Re: [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores

2023-11-23 Thread Christophe Lyon
Hi!

On Thu, 23 Nov 2023 at 14:29, Jan-Benedict Glaw  wrote:
>
> On Thu, 2023-11-16 15:26:14 +, Christophe Lyon 
>  wrote:
> > diff --git a/gcc/config/arm/arm-mve-builtins-functions.h 
> > b/gcc/config/arm/arm-mve-builtins-functions.h
> > index eba1f071af0..6d234a2dd7c 100644
> > --- a/gcc/config/arm/arm-mve-builtins-functions.h
> > +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> > @@ -966,6 +966,62 @@ public:
> [...]
>
> > +class full_width_access : public multi_vector_function
> > +{
> > +public:
> > +  CONSTEXPR full_width_access (unsigned int vectors_per_tuple = 1)
> > +: multi_vector_function (vectors_per_tuple) {}
> > +
> > +  tree
> > +  memory_scalar_type (const function_instance &fi) const override
> > +  {
> > +return fi.scalar_type (0);
> > +  }
> > +
> > +  machine_mode
> > +  memory_vector_mode (const function_instance &fi) const override
> > +  {
> > +machine_mode mode = fi.vector_mode (0);
> > +/* Vectors of floating-point are managed in memory as vectors of
> > +   integers.  */
> > +switch (mode)
> > +  {
> > +  case E_V4SFmode:
> > + mode = E_V4SImode;
> > + break;
> > +  case E_V8HFmode:
> > + mode = E_V8HImode;
> > + break;
> > +  }
>
> This introduces warnings about many enum values not being handled, so
> a default would be good I think. (I do automated builds with
> --enable-werror-always, see eg.
> http://toolchain.lug-owl.de/laminar/log/gcc-arm-eabi/48)
>

Ha right, thanks for catching this.

Fixed by commit b9dbdefac626ba20222ca534b58f7e493d713b9a

Christophe

> MfG, JBG
>
> --


Re: [Patch] OpenMP: Accept argument to depobj's destroy clause

2023-11-23 Thread Tobias Burnus

Hi Jakub,

On 23.11.23 16:32, Jakub Jelinek wrote:

On Thu, Nov 23, 2023 at 04:21:50PM +0100, Tobias Burnus wrote:

@@ -21663,7 +21666,25 @@ c_parser_omp_depobj (c_parser *parser)
+  else if (depobj != error_mark_node
+   && !operand_equal_p (destobj, depobj,
+OEP_MATCH_SIDE_EFFECTS))

There is also OEP_LEXICOGRAPHIC which could be used in addition to that.
The question is if we want to consider say
#pragma depobj (a[++i]) destroy (a[++i])
as same or different (similarly a[foo ()] in both cases).


I don't think that we want to permit those; I think there is (a) the
question whether both expressions have to be evaluated or not and (b),
if so, in which order and (c), if the run-time result is different,
whether both have to be 'destory'ed or only one of them (which one?).

Additionally, 'destroy-var must refer to the same depend object as the
depobj argument of the construct.' cannot be fulfilled if one is
evaluated before the other and both use the same 'i' in your case.

Thus, I do not really see an argument for permitting OEP_LEXICOGRAPHIC.

I think permitting 'volatile' does make sense, in a ways, as a
hyper-careful user might actually write such code.

[I wonder whether the OpenMP wording would permit 'omp depobj(obj)
destroy(f())' with 'auto f() { return obj; }' – but I am sure we don't
want to permit it in the compiler.]

Tobias

PS: In any case, I find it confusing to require that the same
variable/lvalue-expression has to be specified twice. The (only) pro is
that for 'omp interop destroy(...)' the argument is required and for
consistency of the 'destroy' clause, an argument now must be (always)
specified. But that leads to the odd 'omp depobj(obj) destroy(obj)',
which is really ugly. (In 5.2 the arg to destroy is optional but
omitting it is deprecated; hence, in OpenMP 6.0 (TR11, TR12) the
argument must be specified twice.)

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] testsuite, lib: Re-allow mulitple function start labels.

2023-11-23 Thread Christophe Lyon
Hi Iain,

Thanks for dealing with this :-)

On Thu, 23 Nov 2023 at 10:58, Iain Sandoe  wrote:
>
> Tested on a cross to armv8l-unknown-linux-gnueabihf where the failing
> testcase is restored, and on aarch64-linux-gnu where no change is seen
> on the aarch64.exp suite.  Also tested on arm64 Darwin for aarch64.exp
> and aarch64-darwin.exp.
>
> OK for trunk, or some alternative would be better?
> Iain
>
> --- 8< ---
>
> The change applied in r14-5760-g2a46e0e7e20 changed the behaviour of
> functions with assembly like:
>
> bar:
> __acle_se_bar:
>
> Where both bar and __acle_se_bar are globals refering to the same
> function body.  The old behaviour overrides 'bar' with '__acle_se_bar'
> and the scan tests for that label.
>
> The change here re-allows the override.
>
> Case like this are not legal Mach-O (where two global symbols cannot
> have the same address in the assembler output).  However, given the
> constraints on the Mach-O scanning, it does not seem that it is
> necessary to skip the change (any incorrect case should be easily
> evident in the assembler).
>
> gcc/testsuite/ChangeLog:
>
> * lib/scanasm.exp: Allow multiple function start symbols,
> taking the last as the function name.
>
> Signed-off-by: Iain Sandoe 
> ---
>  gcc/testsuite/lib/scanasm.exp | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
> index 85ee54ff9a8..7ec3cfce02b 100644
> --- a/gcc/testsuite/lib/scanasm.exp
> +++ b/gcc/testsuite/lib/scanasm.exp
> @@ -877,7 +877,15 @@ proc parse_function_bodies { config filename result } {
> set in_function 0
> }
> } elseif { $in_function } {
> -   if { [regexp $up_config(end) $line] } {
> +   # We allow multiple function start labels, taking the last one 
> seen
> +   # as the function name.
> +   if { [regexp [lindex $up_config(start) 0] \
> +$line dummy maybe_function_name] } {
> +   verbose "parse_function_bodies: overriding $function_name 
> with $maybe_function_name"
> +   set function_name $maybe_function_name
> +   set in_function 1
this is not necessary, since we are already inside if ($in_function) ?

> +   set function_body ""
> +   } elseif { [regexp $up_config(end) $line] } {
> verbose "parse_function_bodies: 
> $function_name:\n$function_body"
> set up_result($function_name) $function_body
> set in_function 0
> --
> 2.39.2 (Apple Git-143)
>

Thanks,

Christophe


Re: [PATCH] AArch64/testsuite: Use non-capturing parentheses with ccmp_1.c

2023-11-23 Thread Maciej W. Rozycki
On Wed, 22 Nov 2023, Richard Earnshaw (lists) wrote:

> > Use non-capturing parentheses for the subexpressions used with 
> > `scan-assembler-times', to avoid a quirk with double-counting.
> > 
> > gcc/testsuite/
> > * gcc.target/aarch64/ccmp_1.c: Use non-capturing parentheses 
> > with `scan-assembler-times'.
> 
> OK

 Thank you for your review.  I have applied all the three changes now.

  Maciej


Re: libstdc++: Speed up push_back

2023-11-23 Thread Jonathan Wakely
On Thu, 23 Nov 2023 at 15:34, Jan Hubicka  wrote:
>
> > > On Sunday, 19 November 2023 22:53:37 CET Jan Hubicka wrote:
> > > > Sadly it is really hard to work out this
> > > > from IPA passes, since we basically care whether the iterator points to
> > > > the same place as the end pointer, which are both passed by reference.
> > > > This is inter-procedural value numbering that is quite out of reach.
> > >
> > > I've done a fair share of branching on __builtin_constant_p in
> > > std::experimental::simd to improve code-gen. It's powerful! But maybe we
> > > also need the other side of the story to tell the optimizer: "I know you
> > > can't const-prop everything; but this variable / expression, even if you
> > > need to put in a lot of effort, the performance difference will be worth
> > > it."
> > >
> > > For std::vector, the remaining capacity could be such a value. The
> > > functions f() and g() are equivalent (their code-gen isn't https://
> > > compiler-explorer.com/z/r44ejK1qz):
> > >
> > > #include 
> > >
> > > auto
> > > f()
> > > {
> > >   std::vector x;
> > >   x.reserve(10);
> > >   for (int i = 0; i < 10; ++i)
> > > x.push_back(0);
> > >   return x;
> > > }
> > > auto
> > > g()
> > > { return std::vector(10, 0); }
> >
> > With my changes at -O3 we now inline push_back, so we could optimize the
> > first loop to the second. However with
> > ~/trunk-install/bin/gcc -O3  auto.C  -S -fdump-tree-all-details 
> > -fno-exceptions -fno-store-merging -fno-tree-slp-vectorize
> > the fist problem is right at the begining:
> >
> >[local count: 97603128]:
> >   MEM[(struct _Vector_impl_data *)x_4(D)]._M_start = 0B;
> >   MEM[(struct _Vector_impl_data *)x_4(D)]._M_finish = 0B;
> >   MEM[(struct _Vector_impl_data *)x_4(D)]._M_end_of_storage = 0B;
> >   _37 = operator new (40);
>
> I also wonder, if default operator new and malloc can be handled as not
> reading/modifying anything visible to the user code.

No, there's no way to know if the default operator new is being used.
A replacement operator new could be provided at link-time.

That's why we need -fsane-operator-new

> That would help
> us to propagate here even if we lose track of points-to information.
>
> We have:
>
>   /* If the call is to a replaceable operator delete and results
>  from a delete expression as opposed to a direct call to
>  such operator, then we can treat it as free.  */
>   if (fndecl
>   && DECL_IS_OPERATOR_DELETE_P (fndecl)
>   && DECL_IS_REPLACEABLE_OPERATOR (fndecl)
>   && gimple_call_from_new_or_delete (stmt))
> return ". o ";
>   /* Similarly operator new can be treated as malloc.  */
>   if (fndecl
>   && DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl)
>   && gimple_call_from_new_or_delete (stmt))
> return "m ";
> Which informs alias analysis that new returns pointer to memory
> not aliasing with anything and that free is not reading anything
> from its parameter (but it is modelled as a write to make it clear
> that the memory dies).

But this only applies to new T[n] not to operator new(n * sizeof(T)).
So it's not relevant to std::vector.

> stmt_kills_ref_p special cases BUILT_IN_FREE but not OPERATOR delete
> to make it clear that everything pointed to by it dies.   This is needed
> because 'o' only means that some data may be overwritten, but it does
> not make it clear that all data dies.
>
> Not handling operator delete seems like an omision, but maybe it is not
> too critical since we emit clobbers around destructors that are usually
> right before call to delete.  Also ipa-modref kill analysis does not
> understand BUILT_IN_FREE nor delete and could.
>
> I wonder if we can handle both as const except for side-effects
> described.
>
> Honza
> >   _22 = x_4(D)->D.26019._M_impl.D.25320._M_finish;
> >   _23 = x_4(D)->D.26019._M_impl.D.25320._M_start;
> >   _24 = _22 - _23;
> >   if (_24 > 0)
> > goto ; [41.48%]
> >   else
> > goto ; [58.52%]
> >
> > So the vector is fist initialized with _M_start=_M_finish=0, but after
> > call to new we already are not able to propagate this.
> >
> > This is because x is returned and PTA considers it escaping.  This is
> > problem discussed in
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653
> > Which shows that it is likely worthwhile to fix PTA to handle this
> > correctly.
>



Re: libstdc++: Speed up push_back

2023-11-23 Thread Jonathan Wakely
On Thu, 23 Nov 2023 at 15:44, Jan Hubicka  wrote:
>
> Hi,
> so if I understand it right, it should be safe to simply replace memmove
> by memcpy.  I wonder if we can get rid of the count != 0 check at least
> for glibc systems.

I don't think we can do that. It's still undefined with glibc, and
glibc marks it with __attribute__((nonnull)), and ubsan will diagnose
it.

>  In general push_back now need inline-insns-auto to
> be 33 to be inlined at -O2
>
>
> jh@ryzen4:/tmp> cat ~/tt.C
> #include 
> typedef unsigned int uint32_t;
> struct pair_t {uint32_t first, second;};
> struct pair_t pair;
> void
> test()
> {
> std::vector stack;
> stack.push_back (pair);
> while (!stack.empty()) {
> pair_t cur = stack.back();
> stack.pop_back();
> if (!cur.first)
> {
> cur.second++;
> stack.push_back (cur);
> }
> if (cur.second > 1)
> break;
> }
> }
> int
> main()
> {
> for (int i = 0; i < 1; i++)
>   test();
> }
>
> jh@ryzen4:/tmp> ~/trunk-install/bin/g++ ~/tt.C -O2 --param 
> max-inline-insns-auto=32 ; time ./a.out
>
> real0m0.399s
> user0m0.399s
> sys 0m0.000s
> jh@ryzen4:/tmp> ~/trunk-install/bin/g++ ~/tt.C -O2 --param 
> max-inline-insns-auto=33 ; time ./a.out
>
> real0m0.039s
> user0m0.039s
> sys 0m0.000s
>
> Current inline limit is 15. We can save
>  - 2 insns if inliner knows that conditional guarding
>builtin_unreachable will die (I have patch for this)
>  - 4 isnsn if we work out that on 64bit hosts allocating vector with
>2^63 elements is impossible
>  - 2 insns if we allow NULL parameter on memcpy

I don't think we can do that.

>  - 2 insns if we allos NULL parameter on delete

That's allowed, I think we just check first to avoid making a function
call if it's null, because we know operator delete will do nothing.

But if it's hurting inlining, maybe that's the wrong choice.

> So thi is 23 instructions. Inliner has hinting which could make
> push_back reasonable candidate for -O2 inlining and then we could be
> able to propagate interesitng stuff across repeated calls to push_back.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/stl_uninitialized.h (relocate_a_1): Use memcpy instead 
> of memmove.

This patch is OK for trunk.

>
> diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h 
> b/libstdc++-v3/include/bits/stl_uninitialized.h
> index 1282af3bc43..a9b802774c6 100644
> --- a/libstdc++-v3/include/bits/stl_uninitialized.h
> +++ b/libstdc++-v3/include/bits/stl_uninitialized.h
> @@ -1119,14 +1119,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #ifdef __cpp_lib_is_constant_evaluated
>   if (std::is_constant_evaluated())
> {
> - // Can't use memmove. Wrap the pointer so that __relocate_a_1
> + // Can't use memcpy. Wrap the pointer so that __relocate_a_1
>   // resolves to the non-trivial overload above.
>   __gnu_cxx::__normal_iterator<_Tp*, void> __out(__result);
>   __out = std::__relocate_a_1(__first, __last, __out, __alloc);
>   return __out.base();
> }
>  #endif
> - __builtin_memmove(__result, __first, __count * sizeof(_Tp));
> + __builtin_memcpy(__result, __first, __count * sizeof(_Tp));
> }
>return __result + __count;
>  }
>



Re: libstdc++: Turn memmove to memcpy in vector reallocations

2023-11-23 Thread Jonathan Wakely
On Tue, 21 Nov 2023 at 18:11, Marc Glisse  wrote:
>
> On Tue, 21 Nov 2023, Jonathan Wakely wrote:
>
> > CC Marc Glisse who added the relocation support. He might recall why
> > we use memmove when all uses are for newly-allocated storage, which
> > cannot overlap the existing storage.
>
> Going back a bit:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2019-April/520658.html
>
> "I think the call to memmove in __relocate_a_1 could probably be
> memcpy (I don't remember why I chose memmove)"
>
> Going back a bit further:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2018-September/505800.html
>
> "I had to add a special case for trivial types, using memmove, to avoid
> perf regressions, since relocation takes precedence over the old path that
> is specialized to call memmove."
>
> So the reason seems to be because vector already used memmove before my
> patch. You can dig further if you want to check why that is ;-)


Thanks for the quick archaeology, Marc!



Re: [PATCH V2] libgcc: mark __hardcfr_check_fail as always_inline

2023-11-23 Thread Jose E. Marchesi


>> Am 23.11.2023 um 16:17 schrieb Jose E. Marchesi :
>> 
>> [Changes from V1:
>> - Use always_inline only in BPF target.]
>> 
>> The function __hardcfr_check_fail in hardcfr.c is internal and static
>> inline.  It receives many arguments, which require more than five
>> registers to be passed in bpf-none-unknown targets.  BPF is limited to
>> that number of registers to pass arguments, and therefore libgcc fails
>> to build in that target.  This patch marks the function with the
>> always_inline attribute, fixing the bpf build.
>> 
>> Tested in bpf-unknown-none target and x86_64-linux-gnu host.
>
> Ok

Pushed.  Thanks.

> Richard 
>
>> libgcc/ChangeLog:
>> 
>>* hardcfr.c (__hardcfr_check_fail): Mark as always_inline.
>> ---
>> libgcc/hardcfr.c | 4 
>> 1 file changed, 4 insertions(+)
>> 
>> diff --git a/libgcc/hardcfr.c b/libgcc/hardcfr.c
>> index 25ff06742cb..376a36202c8 100644
>> --- a/libgcc/hardcfr.c
>> +++ b/libgcc/hardcfr.c
>> @@ -206,6 +206,10 @@ __hardcfr_debug_cfg (size_t const blocks,
>>enabled, it also forces __hardcfr_debug_cfg (above) to be compiled into an
>>out-of-line function, that could be called from a debugger.
>>*/
>> +
>> +#ifdef __BPF__
>> +__attribute__((__always_inline__))
>> +#endif
>> static inline void
>> __hardcfr_check_fail (size_t const blocks ATTRIBUTE_UNUSED,
>>  vword const *const visited ATTRIBUTE_UNUSED,
>> --
>> 2.30.2
>> 


Re: [PATCH] testsuite, lib: Re-allow mulitple function start labels.

2023-11-23 Thread Iain Sandoe
Hi 

> On 23 Nov 2023, at 16:11, Christophe Lyon  wrote:
> 
> Hi Iain,
> 
> Thanks for dealing with this :-)
> 
> On Thu, 23 Nov 2023 at 10:58, Iain Sandoe  wrote:
>> 
>> Tested on a cross to armv8l-unknown-linux-gnueabihf where the failing
>> testcase is restored, and on aarch64-linux-gnu where no change is seen
>> on the aarch64.exp suite.  Also tested on arm64 Darwin for aarch64.exp
>> and aarch64-darwin.exp.
>> 
>> OK for trunk, or some alternative would be better?
>> Iain
>> 
>> --- 8< ---
>> 
>> The change applied in r14-5760-g2a46e0e7e20 changed the behaviour of
>> functions with assembly like:
>> 
>> bar:
>> __acle_se_bar:
>> 
>> Where both bar and __acle_se_bar are globals refering to the same
>> function body.  The old behaviour overrides 'bar' with '__acle_se_bar'
>> and the scan tests for that label.
>> 
>> The change here re-allows the override.
>> 
>> Case like this are not legal Mach-O (where two global symbols cannot
>> have the same address in the assembler output).  However, given the
>> constraints on the Mach-O scanning, it does not seem that it is
>> necessary to skip the change (any incorrect case should be easily
>> evident in the assembler).
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>* lib/scanasm.exp: Allow multiple function start symbols,
>>taking the last as the function name.
>> 
>> Signed-off-by: Iain Sandoe 
>> ---
>> gcc/testsuite/lib/scanasm.exp | 10 +-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>> 
>> diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
>> index 85ee54ff9a8..7ec3cfce02b 100644
>> --- a/gcc/testsuite/lib/scanasm.exp
>> +++ b/gcc/testsuite/lib/scanasm.exp
>> @@ -877,7 +877,15 @@ proc parse_function_bodies { config filename result } {
>>set in_function 0
>>}
>>} elseif { $in_function } {
>> -   if { [regexp $up_config(end) $line] } {
>> +   # We allow multiple function start labels, taking the last one 
>> seen
>> +   # as the function name.
>> +   if { [regexp [lindex $up_config(start) 0] \
>> +$line dummy maybe_function_name] } {
>> +   verbose "parse_function_bodies: overriding $function_name 
>> with $maybe_function_name"
>> +   set function_name $maybe_function_name
>> +   set in_function 1
> this is not necessary, since we are already inside if ($in_function) ?

It resets the state to “first line matched” for mutli-line function start 
cases.   Currently, those cases are only for Darwin and as noted in the 
changelog actually such code is not legal Mach-O, so in practice, at present we 
could remove it - but it might be better to be consistent (I am OK either way).



Of course, an alternate fix would be to match the first label always and change 
the testcase to scan for ‘bar’ - since that seems to be the only instance 
needing this facility  - but I’ll leave that to you folks to consider.

Iain

> 
>> +   set function_body ""
>> +   } elseif { [regexp $up_config(end) $line] } {
>>verbose "parse_function_bodies: 
>> $function_name:\n$function_body"
>>set up_result($function_name) $function_body
>>set in_function 0
>> --
>> 2.39.2 (Apple Git-143)
>> 
> 
> Thanks,
> 
> Christophe



Re: [PATCH] c++/modules: check mismatching exports for class tags [PR98885]

2023-11-23 Thread Nathan Sidwell

On 11/12/23 07:00, Nathaniel Shead wrote:

I think the error message is still a little bit unclear but I couldn't
come up with something clearer that was similarly concise and matching
the existing style.

(Also I noticed that the linked PR was assigned to Nathan but there
hadn't been activity for a while, and I've been looking into these kinds
of issues recently anyway so I thought I'd give it a go.)

Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
access.


ok



-- >8 --

Checks for exporting a declaration that was previously declared as not
exported is implemented in 'duplicate_decls', but this doesn't handle
declarations of classes. This patch adds these checks and slightly
adjusts the associated error messages for clarity.

PR c++/98885

gcc/cp/ChangeLog:

* decl.cc (duplicate_decls): Adjust error message.
(xref_tag): Adjust error message. Check exporting decl that is
already declared as non-exporting.

gcc/testsuite/ChangeLog:

* g++.dg/modules/export-1.C: Adjust error messages. Remove
xfails for working case. Add new test case.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/decl.cc  | 21 ++---
  gcc/testsuite/g++.dg/modules/export-1.C | 16 +---
  2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 4a07c7e879b..bde9bd79d58 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -2236,8 +2236,10 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
hiding, bool was_hidden)
  if (DECL_MODULE_EXPORT_P (STRIP_TEMPLATE (newdecl))
  && !DECL_MODULE_EXPORT_P (not_tmpl))
{
- error ("conflicting exporting declaration %qD", newdecl);
- inform (olddecl_loc, "previous declaration %q#D here", olddecl);
+ auto_diagnostic_group d;
+ error ("conflicting exporting for declaration %qD", newdecl);
+ inform (olddecl_loc,
+ "previously declared here without exporting");
}
}
else if (DECL_MODULE_EXPORT_P (newdecl))
@@ -16249,11 +16251,24 @@ xref_tag (enum tag_types tag_code, tree name,
  tree decl = TYPE_NAME (t);
  if (!module_may_redeclare (decl))
{
+ auto_diagnostic_group d;
  error ("cannot declare %qD in a different module", decl);
- inform (DECL_SOURCE_LOCATION (decl), "declared here");
+ inform (DECL_SOURCE_LOCATION (decl), "previously declared here");
  return error_mark_node;
}
  
+	  tree not_tmpl = STRIP_TEMPLATE (decl);

+ if (DECL_LANG_SPECIFIC (not_tmpl)
+ && DECL_MODULE_ATTACH_P (not_tmpl)
+ && !DECL_MODULE_EXPORT_P (not_tmpl)
+ && module_exporting_p ())
+   {
+ auto_diagnostic_group d;
+ error ("conflicting exporting for declaration %qD", decl);
+ inform (DECL_SOURCE_LOCATION (decl),
+ "previously declared here without exporting");
+   }
+
  tree maybe_tmpl = decl;
  if (CLASS_TYPE_P (t) && CLASSTYPE_IS_TEMPLATE (t))
maybe_tmpl = CLASSTYPE_TI_TEMPLATE (t);
diff --git a/gcc/testsuite/g++.dg/modules/export-1.C 
b/gcc/testsuite/g++.dg/modules/export-1.C
index 8ca696ebee0..3f93814d270 100644
--- a/gcc/testsuite/g++.dg/modules/export-1.C
+++ b/gcc/testsuite/g++.dg/modules/export-1.C
@@ -4,19 +4,21 @@ export module frob;
  // { dg-module-cmi !frob }
  
  int x ();

-export int x (); // { dg-error "conflicting exporting declaration" }
+export int x (); // { dg-error "conflicting exporting for declaration" }
  
  int y;

-export extern int y; // { dg-error "conflicting exporting declaration" }
+export extern int y; // { dg-error "conflicting exporting for declaration" }
  
  typedef int z;

-export typedef int z; // { dg-error "conflicting exporting declaration" }
+export typedef int z; // { dg-error "conflicting exporting for declaration" }
  
  template  int f (T);

-export template  int f (T); // { dg-error "conflicting exporting 
declaration" }
+export template  int f (T); // { dg-error "conflicting exporting for 
declaration" }
  
-// doesn't go via duplicate_decls so we miss this for now

  class A;
-export class A; // { dg-error "conflicting exporting declaration" "" { xfail 
*-*-* } }
+export class A; // { dg-error "conflicting exporting for declaration" }
  
-// { dg-warning  "due to errors" "" { target *-*-* } 0 }

+template  struct B;
+export template  struct B {};  // { dg-error "conflicting exporting for 
declaration" }
+
+// { dg-warning "due to errors" "" { target *-*-* } 0 }


--
Nathan Sidwell



[PATCH v2 1/11] rtl-ssa: Support for inserting new insns

2023-11-23 Thread Alex Coplan
Hi,

This is a v2, original patch is here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637606.html

This addresses review feedback and:
 - Fixes a bug in the previous version in
   function_info::finalize_new_accesses; we should now correctly handle
   the case where properties.refs () has two writes to a resource and we're
   adding a new (temporary) set for that resource.
 - Drops some handling for new uses which isn't needed now that RTL-SSA can
   infer uses of mem (since g:505f1202e3a1a1aecd0df10d0f1620df6fea4ab5).

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

-- >8 --

The upcoming aarch64 load pair pass needs to form store pairs, and can
re-order stores over loads when alias analysis determines this is safe.
In the case that both mem defs have uses in the RTL-SSA IR, and both
stores require re-ordering over their uses, we represent that as
(tentative) deletion of the original store insns and creation of a new
insn, to prevent requiring repeated re-parenting of uses during the
pass.  We then update all mem uses that require re-parenting in one go
at the end of the pass.

To support this, RTL-SSA needs to handle inserting new insns (rather
than just changing existing ones), so this patch adds support for that.

New insns (and new accesses) are temporaries, allocated above a temporary
obstack_watermark, such that the user can easily back out of a change without
awkward bookkeeping.

gcc/ChangeLog:

* rtl-ssa/accesses.cc (function_info::create_set): New.
* rtl-ssa/accesses.h (access_info::is_temporary): New.
* rtl-ssa/changes.cc (move_insn): Handle new (temporary) insns.
(function_info::finalize_new_accesses): Handle new/temporary
user-created accesses.
(function_info::apply_changes_to_insn): Ensure m_is_temp flag
on new insns gets cleared.
(function_info::change_insns): Handle new/temporary insns.
(function_info::create_insn): New.
* rtl-ssa/changes.h (class insn_change): Make function_info a
friend class.
* rtl-ssa/functions.h (function_info): Declare new entry points:
create_set, create_insn.  Declare new change_alloc helper.
* rtl-ssa/insns.cc (insn_info::print_full): Identify temporary insns in
dump.
* rtl-ssa/insns.h (insn_info): Add new m_is_temp flag and accompanying
is_temporary accessor.
* rtl-ssa/internals.inl (insn_info::insn_info): Initialize m_is_temp to
false.
* rtl-ssa/member-fns.inl (function_info::change_alloc): New.
* rtl-ssa/movement.h (restrict_movement_for_defs_ignoring): Add
handling for temporary defs.
diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index 510545a8bad..76d70fd8bd3 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -1456,6 +1456,16 @@ function_info::make_uses_available (obstack_watermark 
&watermark,
   return use_array (new_uses, num_uses);
 }
 
+set_info *
+function_info::create_set (obstack_watermark &watermark,
+  insn_info *insn,
+  resource_info resource)
+{
+  auto set = change_alloc (watermark, insn, resource);
+  set->m_is_temp = true;
+  return set;
+}
+
 // Return true if ACCESS1 can represent ACCESS2 and if ACCESS2 can
 // represent ACCESS1.
 static bool
diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h
index fce31d46717..7e7a90ece97 100644
--- a/gcc/rtl-ssa/accesses.h
+++ b/gcc/rtl-ssa/accesses.h
@@ -204,6 +204,10 @@ public:
   // in the main instruction pattern.
   bool only_occurs_in_notes () const { return m_only_occurs_in_notes; }
 
+  // Return true if this is a temporary access, e.g. one created for
+  // an insn that is about to be inserted.
+  bool is_temporary () const { return m_is_temp; }
+
 protected:
   access_info (resource_info, access_kind);
 
diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index aab532b9f26..2f2d12d5f30 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -394,14 +394,20 @@ move_insn (insn_change &change, insn_info *after)
   // At the moment we don't support moving instructions between EBBs,
   // but this would be worth adding if it's useful.
   insn_info *insn = change.insn ();
-  gcc_assert (after->ebb () == insn->ebb ());
+
   bb_info *bb = after->bb ();
   basic_block cfg_bb = bb->cfg_bb ();
 
-  if (insn->bb () != bb)
-// Force DF to mark the old block as dirty.
-df_insn_delete (rtl);
-  ::remove_insn (rtl);
+  if (!insn->is_temporary ())
+{
+  gcc_assert (after->ebb () == insn->ebb ());
+
+  if (insn->bb () != bb)
+   // Force DF to mark the old block as dirty.
+   df_insn_delete (rtl);
+  ::remove_insn (rtl);
+}
+
   ::add_insn_after (rtl, after_rtl, cfg_bb);
 }
 
@@ -437,12 +443,33 @@ function_info::finalize_new_accesses (insn_change 
&change, insn_info *pos)
   {
def_info *def = find_access (change.new_defs, ref.regno

Re: [PATCH 2/1] c++/modules: Allow exporting a typedef redeclaration

2023-11-23 Thread Nathan Sidwell

On 11/13/23 01:09, Nathaniel Shead wrote:

I happened to be browsing the standard a bit later and noticed that we
incorrectly reject the example given below.

Bootstrapped on x86_64-pc-linux-gnu; regtesting ongoing but modules.exp
completed with no errors.

-- >8 --

A typedef doesn't create a new entity, and thus should be allowed to be
exported even if it has been previously declared un-exported. See the
example in [module.interface] p6:


ok.  Could you put a reference to [module.interface]/p6 in the comment though?

nathan



   export module M;
   struct S { int n; };
   typedef S S;
   export typedef S S; // OK, does not redeclare an entity

PR c++/102341

gcc/cp/ChangeLog:

* decl.cc (duplicate_decls): Allow exporting a redeclaration of
a typedef.

gcc/testsuite/ChangeLog:

* g++.dg/modules/export-1.C: Adjust test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/decl.cc  | 5 -
  gcc/testsuite/g++.dg/modules/export-1.C | 6 +-
  2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index bde9bd79d58..5e175d3e835 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -2231,7 +2231,10 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
hiding, bool was_hidden)
}
  
tree not_tmpl = STRIP_TEMPLATE (olddecl);

-  if (DECL_LANG_SPECIFIC (not_tmpl) && DECL_MODULE_ATTACH_P (not_tmpl))
+  if (DECL_LANG_SPECIFIC (not_tmpl)
+ && DECL_MODULE_ATTACH_P (not_tmpl)
+ /* Typedefs are not entities and so can be exported later.  */
+ && TREE_CODE (olddecl) != TYPE_DECL)
{
  if (DECL_MODULE_EXPORT_P (STRIP_TEMPLATE (newdecl))
  && !DECL_MODULE_EXPORT_P (not_tmpl))
diff --git a/gcc/testsuite/g++.dg/modules/export-1.C 
b/gcc/testsuite/g++.dg/modules/export-1.C
index 3f93814d270..598814370ec 100644
--- a/gcc/testsuite/g++.dg/modules/export-1.C
+++ b/gcc/testsuite/g++.dg/modules/export-1.C
@@ -9,8 +9,12 @@ export int x (); // { dg-error "conflicting exporting for 
declaration" }
  int y;
  export extern int y; // { dg-error "conflicting exporting for declaration" }
  
+// A typedef is not an entity so the following is OK; see [module.interface] example 4

  typedef int z;
-export typedef int z; // { dg-error "conflicting exporting for declaration" }
+export typedef int z; // { dg-bogus "conflicting exporting for declaration" }
+
+template  using w = T;
+export template  using w = T;  // { dg-error "conflicting exporting for 
declaration" }
  
  template  int f (T);

  export template  int f (T); // { dg-error "conflicting exporting for 
declaration" }


--
Nathan Sidwell



[PATCH v5] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-11-23 Thread Marek Polacek
On Mon, Nov 13, 2023 at 09:06:09PM -0500, Jason Merrill wrote:
> On 11/6/23 17:34, Marek Polacek wrote:
> > On Fri, Nov 03, 2023 at 01:51:07PM -0400, Jason Merrill wrote:
> > > On 11/2/23 11:28, Marek Polacek wrote:
> > > > On Sat, Oct 14, 2023 at 12:56:11AM -0400, Jason Merrill wrote:
> > > > > On 10/10/23 13:20, Marek Polacek wrote:
> > > > > > I suppose some
> > > > > > functions cannot possibly be promoted because they don't contain
> > > > > > any CALL_EXPRs.  So we may be able to rule them out while doing
> > > > > > cp_fold_r early.
> > > > > 
> > > > > Yes.  Or, the only immediate-escalating functions referenced have 
> > > > > already
> > > > > been checked.
> > > 
> > > It looks like you haven't pursued this yet?  One implementation thought:
> > 
> > Oops, I'd forgotten to address that.
> > 
> > > maybe_store_cfun... could stop skipping immediate_escalating_function_p
> > > (current_function_decl), and after we're done folding if the current
> > > function isn't in the hash_set we can go ahead and set
> > > DECL_ESCALATION_CHECKED_P?
> > 
> > Clever, I see what you mean.  IOW, we store c_f_d iff the function contains
> > an i-e expr.  If not, it can't possibly become consteval.  I've added that
> > into cp_fold_function, and it seems to work well...
> > 
> > ...except it revealed a different problem: cp_fold_r -> cp_fold will, since
> > https://gcc.gnu.org/pipermail/gcc-patches/2016-March/443993.html, remove
> > UNARY_PLUS_EXPR, leading us into this problem:
> > 
> >// stmt = +id(i)
> >cp_fold (...);
> >// stmt = id(i)
> > 
> > and the subsequent tree walk walks the CALL_EXPR's operands, so
> > cp_fold_immediate_r will never see the CALL_EXPR, so we miss an i-e expr.
> > 
> > Perhaps a better solution than the kludge I added would be to only call
> > cp_fold_immediate_r after cp_fold.  Or potentially before /and/ after if
> > cp_fold changes the expression?
> 
> Or walk everything with cp_fold_immediate_r before walking again with
> cp_fold_r?

Maybe, but I might run into complexity issues again unless I add a new
pset into cp_fold_data.  I did something else: only re-run
cp_fold_immediate_r if cp_fold turned an expression into a CALL_EXPR.
 
> > > > > It also seems odd that the ADDR_EXPR case calls vec_safe_push
> > > > > (deferred_escalating_exprs, while the CALL_EXPR case calls
> > > > > maybe_store_cfun_for_late_checking, why the different handling?
> > > > 
> > > > maybe_store_cfun_for_late_checking saves current_function_decl
> > > > so that we can check:
> > > > 
> > > > void g (int i) {
> > > > fn (i); // error if fn promotes to consteval
> > > > }
> > > 
> > > Yes, but why don't we want the same handling for ADDR_EXPR?
> > 
> > The handling can't be exactly the same due to global vars like
> > 
> >auto p1 = &f5;
> > 
> > ...but it's wrong to only save the ADDR_EXPR if it's enclosed in
> > a function, because the ADDR_EXPR could be inside a consteval if
> > block, in which case I think we're not supposed to error.  Tested
> > in consteval-prop20.C.  Thanks,
> 
> And we don't need the !current_function_decl handling for CALL_EXPR?

I don't think so: we should see the call when cp_fold_function gets to
__static_i_and_d.  I added a comment.
 
> The only significant difference I see between &f and f() for escalation is
> that the latter might be an immediate invocation.  Once we've determined
> that it's not, so we are in fact looking at an immediate-escalating
> expression, I'd expect the promotion handling to be identical.

I'm not certain if I understand the immediate invocation remark but in
v5 I've tried to unify both as much as possible.

> > +  /* Whether cp_fold_immediate_r is looking for immediate-escalating
> > + expressions.  */
> 
> Isn't that always what it's doing?
> 
> The uses of ff_escalating in maybe_explain_promoted_consteval and
> maybe_escalate_decl_and_cfun seem to have different purposes that I'm having
> trouble following.
> 
> For the former, it seems to control returning the offending expression
> rather than error_mark_node.  Why don't we always do that?
> 
> For the latter, it seems to control recursion, which seems redundant with
> the recursion in that latter function itself.  And the use of the flag seems
> redundant with at_eof.
 
You're absolutely correct, the code was a knotty mess.  In v5, I tried to
untangle it to make it something more straightforward.

I still need a new ff_ flag to signal that we can return immediately
after seeing an i-e expr.

> > +/* Remember that the current function declaration contains a call to
> > +   a function that might be promoted to consteval later.  */
> > +
> > +static void
> > +maybe_store_cfun_for_late_checking ()
> 
> This name could say more about escalation?  Maybe
> ...for_escalation_checking?
> 
> Or, better, merge this with maybe_store_immediate_escalating_fn?

I got rid of the former function.
 
> > +/* Figure out if DECL should be promoted to consteval and if so, maybe also
> > +   promote

Re: [Patch] OpenMP: Accept argument to depobj's destroy clause

2023-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2023 at 04:59:16PM +0100, Tobias Burnus wrote:
> > There is also OEP_LEXICOGRAPHIC which could be used in addition to that.
> > The question is if we want to consider say
> > #pragma depobj (a[++i]) destroy (a[++i])
> > as same or different (similarly a[foo ()] in both cases).
> 
> I don't think that we want to permit those; I think there is (a) the
> question whether both expressions have to be evaluated or not and (b),
> if so, in which order and (c), if the run-time result is different,
> whether both have to be 'destory'ed or only one of them (which one?).

Well, we don't need to destroy two, because it would be UB if the two
aren't the same.  This is just about diagnostics if user messed stuff
up unintentionally.
The function call case can be the same very easily, just
int foo () { return 0; }
omp_depend_t a[2];
...
#pragma omp depobj (a[foo ()]) destroy (a[foo ()])
or
int i = 0;
#pragma omp depobj (a[((++i) * 2) & 1]) destroy (a[((++i) * 2) & 1])
The former may evaluate the function call multiple times, but user arranges
for it to do the same thing in each case, in the second case while there
are side-effects, they don't really matter for the value, just in whether
i after this pragma has value of 0, 1, 2 or something else (but if again
nothing cares about that value afterwards...).

The question is if same (I admit I haven't looked up the exact wording now)
means lexically same, or anything that has the same value, etc.
Because e.g.
omp_depend_t a;
...
omp_depend_t *p = &a;
#pragma omp depobj (a) destroy (p[0])
is the same value but not lexically same.

IMHO the argument to destroy clause shouldn't have ever been allowed, it is
only unnecessary extra pain.

Jakub



  1   2   >