date:20240814

Re: [PATCH-1v4] Value Range: Add range op for builtin isinf

2024-08-14 Thread HAO CHEN GUI

Hi Jeff,

  May I know your final decision on this patch?
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656937.html

Thanks
Gui Haochen

在 2024/8/5 22:51, Jeff Law 写道:
> 
> 
> On 7/23/24 4:39 PM, Andrew MacLeod wrote:
>> the range is in r, and is set to [0,0].  this is the false part of what is 
>> being returned for the range.
>>
>> the "return true" indicates we determined a range, so use what is in R.
>>
>> returning false means we did not find a range to return, so r is garbage.
> Duh.  I guess I should have realized that.  I'll have to take another look at 
> Hao's patch.  It's likely OK, but let me take another looksie.
> 
> jeff
>

Re: [PATCH, gfortran] libgfortran: implement fpu-macppc for Darwin, support IEEE arithmetic

2024-08-14 Thread Sergey Fedorov

Thank you for responding.
I have added a changelog (is this a correct way?).

Sergey

On Wed, Aug 14, 2024 at 12:58 AM FX Coudert  wrote:

> Hi,
>
> > I dropped a change to the test file, since you have fixed it
> appropriately, and switched to Apple libm convention for flags, as you have
> suggested.
> > Please let me know if I should do anything further to improve it and
> make it acceptable for a merge.
>
> The patch itself is OK. Please add a ChangeLog entry fitting the GCC
> format (see prior commits in libgfortran/ for examples).
>
> FX


0001-libgfortran-implement-fpu-macppc-for-Darwin-support-.patch
Description: Binary data

Re: [Fortran, Patch, PR116292, v1] Fix 15-regression in move_alloc

2024-08-14 Thread Andre Vehreschild

Hi Thomas,

thanks for the review. Committed as gcc-15-2910-gbb2324769c5

Thanks again,
Andre

On Tue, 13 Aug 2024 13:34:44 +0200
Thomas Koenig  wrote:

> Hi Andre,
>
> > attached patch fixes a regression introduced by my previous patch on
> > handling _vptr's more consistently. The patch gets the _vptr only of a
> > derived or class type now and not of every type.
> >
> > Regression tested ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
>
> OK (and looks obvious, too).
>
> Best regards
>
>   Thomas
>


--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: Ping^4 [PATCH-2v4] Value Range: Add range op for builtin isfinite

2024-08-14 Thread HAO CHEN GUI

Hi Vinnet,

  This patch (test cases) relies on former patch (range op for isinf) which
hasn't been approval yet. I will commit them as soon as the former patch get
approval.

Thanks
Gui Haochen

在 2024/8/14 1:24, Vineet Gupta 写道:
> Hi Hao Gui,
> 
> Can you commit this soon - some of the arch patches might be waiting on this.
> 
> Thx,
> -Vineet
> 
> On 8/5/24 07:59, Jeff Law wrote:
>> On 7/21/24 8:10 PM, HAO CHEN GUI wrote:
>>> Hi,
>>>Gently ping it.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html
>> OK.  Sorry for the delays.

[PATCH] c++: Partially implement CWG 2867 - Order of initialization for structured bindings [PR115769]

2024-08-14 Thread Jakub Jelinek

Hi!

The following patch partially implements CWG 2867
- Order of initialization for structured bindings.
The DR requires that initialization of e is sequenced before r_i and
that r_i initialization is sequenced before r_j for j > i, we already do it
that way, the former ordering is a necessity so that the get calls are
actually emitted on already initialized variable, the rest just because
we implemented it that way, by going through the structured binding
vars in ascending order and doing their initialization.

The hard part not implemented yet is the lifetime extension of the
temporaries from the e initialization to after the get calls (if any).
Unlike the range-for lifetime extension patch which I've posted recently
where IMO we can just ignore lifetime extension of reference bound
temporaries because all the temporaries are extended to the same spot,
here lifetime extension of reference bound temporaries should last until
the end of lifetime of e, while other temporaries only after all the get
calls.

The patch just attempts to deal with automatic structured bindings for now,
I'll post a patch for static locals incrementally and I don't have a patch
for namespace scope structured bindings yet, this patch should just keep
existing behavior for both static locals and namespace scope structured
bindings.

What GCC currently emits is a CLEANUP_POINT_EXPR around the e
initialization, followed optionally by nested CLEANUP_STMTs for cleanups
like the e dtor if any and dtors of lifetime extended temporaries from
reference binding; inside of the CLEANUP_STMT CLEANUP_BODY then the
initialization of the individual variables for the tuple case, again with
optional CLEANUP_STMT if e.g. lifetime extended temporaries from reference
binding are needed in those.

The following patch drops that first CLEANUP_POINT_EXPR and instead
wraps the whole sequence of the e initialization and the individual variable
initialization with get calls after it into a single CLEANUP_POINT_EXPR.
If there are any CLEANUP_STMTs needed, they are all emitted first, with
the CLEANUP_POINT_EXPR for e initialization and the individual variable
initialization inside of those, and a guard variable set after different
phases in those expressions guarding the corresponding cleanups, so that
they aren't invoked until the respective variables are constructed.
This is implemented by cp_finish_decl doing cp_finish_decomp on its own
when !processing_template_decl (otherwise we often don't cp_finish_decl
or process it at a different time from when we want to call
cp_finish_decomp) or unless the decl is erroneous (cp_finish_decl has
too many early returns for erroneous cases, and for those we can actually
call it even multiple times, for the non-erroneous cases
non-processing_template_decl cases we need to call it just once).

The two testcases try to construct various temporaries and variables and
verify the order in which the temporaries and variables are constructed and
destructed.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-08-14  Jakub Jelinek  

PR c++/115769
* cp-tree.h: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decomp): Add bool argument defaulted to false.
* decl.cc (initialize_local_var): Add DECOMP argument, if true,
don't build cleanup and temporarily override stmts_are_full_exprs_p
to 0 rather than 1.  Formatting fix.
(cp_finish_decl): Invoke cp_finish_decomp fpr structured bindings
here if !processing_template_decl, first with TEST_P true.  For
automatic structured binding bases if the test cp_finish_decomp
returned true wrap the initialization together with what non-test
cp_finish_decomp emits with a CLEANUP_POINT_EXPR, and if there are
any CLEANUP_STMTs needed, emit them around the whole
CLEANUP_POINT_EXPR with guard variables for the cleanups.
(cp_finish_decomp): Add TEST_P argument, change return type from
void to bool, if TEST_P, return true instead of emitting actual
code for the tuple case, otherwise return false.
* parser.cc (cp_convert_range_for): Don't call cp_finish_decomp
unless range_decl is erroneous.
(cp_parser_decomposition_declaration): Set DECL_DECOMP_BASE
before cp_finish_decl call, call cp_finish_decomp after it only
if processing_template_decl or decl is erroneous.
(cp_finish_omp_range_for): Call cp_finish_decomp only if
processing_template_decl or decl is erroneous.
* pt.cc (tsubst_stmt): Likewise.

* g++.dg/DRs/dr2867-1.C: New test.
* g++.dg/DRs/dr2867-2.C: New test.

--- gcc/cp/cp-tree.h.jj 2024-08-12 10:49:12.355612381 +0200
+++ gcc/cp/cp-tree.h2024-08-13 12:54:04.013233029 +0200
@@ -7021,7 +7021,7 @@ extern void omp_declare_variant_finalize
 struct cp_decomp { tree decl; unsigned int count; };
 extern void cp_finish_dec

[RFC PATCH] c++: Partially implement for static locals CWG 2867 - Order of initialization for structured bindings [PR115769]

2024-08-14 Thread Jakub Jelinek

Hi!

The following patch extends the CWG 2867 support to function scope
static structured bindings.
The reason why I'm sending this separately is that I'm afraid it is an
ABI change (admittedly for a C++20 P1091R3 feature, but we accept it with
warning already in C++11).
The current state in both GCC and clang is that e.g. on the dr2867-3.C
bar case, there are _ZGVZ3barvEDC1x1y1z1wE and _ZGVZ3barvE1{x,y,z,w}
guard variables, _ZGVZ3barvEDC1x1y1z1wE guards the initialization of
the base (e in the standard), temporaries in that initialization are
destroyed before the guard is released, followed by acquiring
_ZGVZ3barvE1x guard, doing get for x and perhaps destroying its temporaries,
releasing that guard, acquiring _ZGVZ3barvE1y guard, etc.
So, in a multi-threaded program one thread can initialize e, another thread
x, another thread y, another thread z and another thread w.  The
initialization is still correctly sequenced, other threads wait on the
prior guard variables before they can acquire another one, but if the
e initialization creates any temporaries (other than lifetime extended
reference binding which promotes the vars also to static), I wonder if it
is ok to initialize x etc. in a different thread.

The following patch takes the easy path and just does the cp_finish_decomp
initialization inside of a CLEANUP_POINT_EXPR which initializes e,
so guarded by _ZGVZ3barvEDC1x1y1z1wE.  The ABI change is that if some
other thread in different compiler compiled code initializes e and
releases _ZGVZ3barvEDC1x1y1z1wE but doesn't initialize x, y, z and w,
while the current thread would expect _ZGVZ3barvEDC1x1y1z1wE to also
initialize x, y, z and w, nothing will initialize it.

Though, now that I think about it again, perhaps what we could do instead
is just make sure the _ZGVZ3barvEDC1x1y1z1wE initialization doesn't have
a CLEANUP_POINT_EXPR in it and wrap both the _ZGVZ3barvEDC1x1y1z1wE
and cp_finish_decomp created stuff into a single CLEANUP_POINT_EXPR.
That way, perhaps _ZGVZ3barvEDC1x1y1z1wE could be initialized by one thread
and _ZGVZ3barvE1x by a different, but the temporaries from 
_ZGVZ3barvEDC1x1y1z1wE
initialization would be only destructed after the _ZGVZ3barvE1w guard
was released by the thread which initialized _ZGVZ3barvEDC1x1y1z1wE.

Anyway, the following passed bootstrap/regtest on x86_64-linux and
i686-linux, but I'll try the above variant next.

2024-08-14  Jakub Jelinek  

PR c++/115769
* decl.cc: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(expand_static_init): Add DECOMP argument.  If true, call
cp_finish_decomp and if needed integrate it into the guarded
sequence wrapped with CLEANUP_POINT_EXPR.
(cp_finish_decl): Adjust expand_static_init caller, for
tuple cases of structured bindings at function scope pass
non-NULL decomp and ensure cp_finish_decomp isn't called later.

* g++.dg/DRs/dr2867-3.C: New test.
* g++.dg/DRs/dr2867-4.C: New test.

--- gcc/cp/decl.cc.jj   2024-08-13 19:18:42.170052535 +0200
+++ gcc/cp/decl.cc  2024-08-13 21:05:42.875446138 +0200
@@ -103,7 +103,7 @@ static tree push_cp_library_fn (enum tre
 static tree build_cp_library_fn (tree, enum tree_code, tree, int);
 static void store_parm_decls (tree);
 static void initialize_local_var (tree, tree, bool);
-static void expand_static_init (tree, tree);
+static void expand_static_init (tree, tree, cp_decomp *);
 static location_t smallest_type_location (const cp_decl_specifier_seq*);
 static bool identify_goto (tree, location_t, const location_t *,
   diagnostic_t, bool);
@@ -9121,7 +9121,15 @@ cp_finish_decl (tree decl, tree init, bo
 initializer.  It is not legal to redeclare a static data
 member, so this issue does not arise in that case.  */
   else if (var_definition_p && TREE_STATIC (decl))
-   expand_static_init (decl, init);
+   {
+ if (need_decomp_init && init && DECL_FUNCTION_SCOPE_P (decl))
+   {
+ expand_static_init (decl, init, decomp);
+ need_decomp_init = false;
+   }
+ else
+   expand_static_init (decl, init, NULL);
+   }
 }
 
   /* If a CLEANUP_STMT was created to destroy a temporary bound to a
@@ -10183,7 +10191,7 @@ register_dtor_fn (tree decl)
and destruction of DECL.  */
 
 static void
-expand_static_init (tree decl, tree init)
+expand_static_init (tree decl, tree init, cp_decomp *decomp)
 {
   gcc_assert (VAR_P (decl));
   gcc_assert (TREE_STATIC (decl));
@@ -10214,6 +10222,8 @@ expand_static_init (tree decl, tree init
  "initialization and destruction");
  informed = true;
}
+  if (decomp)
+   cp_finish_decomp (decl, decomp);
   return;
 }
 
@@ -10323,11 +10333,25 @@ expand_static_init (tree decl, tree init
 variable.  Do this before calling __cxa_guard_release.  */
  init = add_stmt_to_c

Re: [Fortran, Patch, PR102973, v1] Reset flag for parsing proc_ptrs in associate in error case

2024-08-14 Thread Andre Vehreschild

Hi Harald,

thanks for the review. Comitted as gcc-15-2911-g54be14bfd6e.

Do you have time for reviewing the pr110033 fix? See
here: https://gcc.gnu.org/pipermail/fortran/2024-August/060814.html

Yes, it looks lengthy, but it is always the same propagation of the corank and
bit of meat in the second part of the patch. This would then allow to get the
dependencies of the  ASSOCIATE-PR to zero.

Thanks again and regards,
Andre

On Tue, 13 Aug 2024 18:29:31 +0200
Harald Anlauf  wrote:

> Hi Andre,
>
> Am 13.08.24 um 15:15 schrieb Andre Vehreschild:
> > Hi all,
> >
> > attached patch is the last one the meta-bug 87477 ASSOCIATE depends on. The
> > resolution was already given in the PR, so I just beautified it and made
> > patch for it. I tried to come up with a testcase as well as Harald has, but
> > had no luck with it. I see less harm in reseting the flag in the error case
> > than not to do it.
>
> this is much simpler than Berhhard's patch while functionally equivalent
> and good for mainline.
>
> Thanks for taking care of the issue!
>
> Harald
>
> > Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
> >
> > Regards,
> > Andre
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>

--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [patch, fortran] First part of Fortran's unsigned implementation

2024-08-14 Thread Andre Vehreschild

Hi Thomas,

> > may I ask you to run contrib/check_GNU_style.py on your patch? At least on
> > my system more than lines 50 are reported. I am drawn to this style issues
> > and find it hard to digest the beef of the patch. That's my personal OCD
> > unfortunately.
>
> I did so, and fixed most of what it complained about.  Not all - I will
> not "fix" Fortran code or things like UNSIGNED(4) in error messages :-)

Well, in Fortran would be futile. No, I found some `a,a` in parameter lists on
several occasions, e.g. in arith.cc function gfc_arith_init_1 (void) near the
end of the function. I.e. no space after the comma. That was the background of
my question.

Thanks for the adaptions.

- Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [RFC][PATCH, aarch64] Implement 16-byte vector mode const0 store by TImode

2024-08-14 Thread Richard Sandiford

HAO CHEN GUI  writes:
> Hi,
>   I submitted a patch to change the mode checking for
> CLEAR_BY_PIECES.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660344.html
>
>   It causes some regressions on aarch64. With the patch,
> V2x8QImode is used to do clear by pieces instead of TImode as
> vector mode is preferable and V2x8QImode supports const0 store.
> Thus the efficient "stp" instructions can't be generated.
>
>   I drafted following patch to fix the problem. It can fix
> regressions found in memset-corner-cases.c, memset-q-reg.c,
> auto-init-padding-11.c and auto-init-padding-5.c.
>
>   Not sure if it should be done on all 16-byte vector modes.
> Also not sure if the patch is proper. So I send this RFC email.
>
> Thanks
> Gui Haochen
>
> ChangeLog
> aarch64: Implement 16-byte vector mode const0 store by TImode
>
> gcc/
>   * config/aarch64/aarch64-simd.md (mov for VSTRUCT_QD):
>   Expand V2x8QImode const0 store by TImode.
>
>
> patch.diff
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 01b084d8ccb..8aa72940b12 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -7766,7 +7766,14 @@ (define_expand "mov"
>   (match_operand:VSTRUCT_QD 1 "general_operand"))]
>"TARGET_FLOAT"
>  {
> -  if (can_create_pseudo_p ())
> +  if (mode == V2x8QImode
> +  && operands[1] == CONST0_RTX (V2x8QImode)
> +  && MEM_P (operands[0]))
> +{
> +  operands[0] = adjust_address (operands[0], TImode, 0);
> +  operands[1] = CONST0_RTX (TImode);
> +}

Interesting idea.  And the current handling of zeros certainly isn't
optimised.  For:

  void f(int8x8x2_t *ptr) { *ptr = (int8x8x2_t) {}; }

the patch changes:

adrpx1, .LC0
add x1, x1, :lo12:.LC0
ld1 {v30.8b - v31.8b}, [x1]
st1 {v30.8b - v31.8b}, [x0]
ret
...
.LC0:
...lots of zeros...

to:

stp xzr, xzr, [x0]
ret

which is a vast improvement.  We could of course fix that in the move
patterns (and maybe we should), but the point remains that zeroing N
bytes doesn't carry any real mode information.  We should just use the
best N-byte mode.

The only difficulty I can see is that, for big-endian targets, we
allow V8xQImode addresses to be any 7-bit scaled offset:

  if (aarch64_advsimd_partial_struct_mode_p (mode)
  && known_eq (GET_MODE_SIZE (mode), 16))
return aarch64_offset_7bit_signed_scaled_p (DImode, offset);

whereas for TImode we require:

  if (mode == TImode || mode == TFmode || mode == TDmode)
return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
&& (aarch64_offset_9bit_signed_unscaled_p (mode, offset)
|| offset_12bit_unsigned_scaled_p (mode, offset)));

So for big-endian, there are some immediate offsets that are valid
for V8xQImode but not for TImode.  This isn't a problem before register
allocation, because the adjust_address will take care of it.  But it
could lead to an ICE after register allocation.  Testing:

  && (can_create_pseudo_p ()
  || memory_address_p (TImode, XEXP (operands[0], 0

would avoid that.  (TBH I'm not sure this would ever trigger, i.e.
whether the move patterns would ever be asked to store zero to
memory after register allocation, but it does test the precondition
on using adjust_address.)

Like you say, the same approach should work for all 16-byte modes.
And since it's such an improvement, I think we should use it :)

Taking all that together, could you change the condition to:

  if (known_eq (GET_MODE_SIZE (mode), 16)
  && operands[1] == CONST0_RTX (mode)
  && MEM_P (operands[0])
  && (can_create_pseudo_p ()
  || memory_address_p (TImode, XEXP (operands[0], 0

The patch is OK from my POV with that change, independently of the
CLEAR_BY_PIECES patch, but please give others 24 hours to comment.

(Once the patch is in, I'll follow up with some tests for arm_neon.h,
to defend the improvement above.)

Thanks,
Richard

> +  else if (can_create_pseudo_p ())
>  {
>if (GET_CODE (operands[0]) != REG)
>   operands[1] = force_reg (mode, operands[1]);

[PATCH v2] i386: Fix some vex insns that prohibit egpr

2024-08-14 Thread Kong, Lingling




-Original Message-
From: Kong, Lingling  
Sent: Wednesday, August 14, 2024 4:20 PM
To: Kong, Lingling 
Subject: [PATCH v2] i386: Fix some vex insns that prohibit egpr

Although these vex insn have evex counterpart, but when it uses the displayed 
vex prefix should not support APX EGPR.
Like TARGET_AVXVNNI, TARGET_IFMA and TARGET_AVXNECONVERT.
TARGET_AVXVNNIINT8 and TARGET_AVXVNNITINT16 are also vex insn should not 
support egpr.

gcc/ChangeLog:

* config/i386/sse.md (vpmadd52):
Prohibit egpr for vex version.
(vpdpbusd_): Ditto.
(vpdpbusds_): Ditto.
(vpdpwssd_): Ditto.
(vpdpwssds_): Ditto.
(*vcvtneps2bf16_v4sf): Ditto.
(vcvtneps2bf16_v8sf): Ditto.
(vpdp_): Ditto.
(vbcstnebf162ps_): Ditto.
(vbcstnesh2ps_): Ditto.
(vcvtnee2ps_): Ditto.
(vcvtneo2ps_): Ditto.
(vpdp_): Ditto.
---
 gcc/config/i386/sse.md | 49 +++---
 1 file changed, 32 insertions(+), 17 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 
d1010bc5682..f0d94bba4e7 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -29886,7 +29886,7 @@
(unspec:VI8_AVX2
  [(match_operand:VI8_AVX2 1 "register_operand" "0,0")
   (match_operand:VI8_AVX2 2 "register_operand" "x,v")
-  (match_operand:VI8_AVX2 3 "nonimmediate_operand" "xm,vm")]
+  (match_operand:VI8_AVX2 3 "nonimmediate_operand" "xjm,vm")]
  VPMADD52))]
   "TARGET_AVXIFMA || (TARGET_AVX512IFMA && TARGET_AVX512VL)"
   "@
@@ -29894,6 +29894,7 @@
   vpmadd52\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr "isa" "avxifma,avx512ifmavl")
(set_attr "type" "ssemuladd")
+   (set_attr "addr" "gpr16,*")
(set_attr "prefix" "vex,evex")
(set_attr "mode" "")])
 
@@ -30253,13 +30254,14 @@
(unspec:VI4_AVX2
  [(match_operand:VI4_AVX2 1 "register_operand" "0,0")
   (match_operand:VI4_AVX2 2 "register_operand" "x,v")
-  (match_operand:VI4_AVX2 3 "nonimmediate_operand" "xm,vm")]
+  (match_operand:VI4_AVX2 3 "nonimmediate_operand" "xjm,vm")]
  UNSPEC_VPDPBUSD))]
   "TARGET_AVXVNNI || (TARGET_AVX512VNNI && TARGET_AVX512VL)"
   "@
   %{vex%} vpdpbusd\t{%3, %2, %0|%0, %2, %3}
   vpdpbusd\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr ("prefix") ("vex,evex"))
+   (set_attr "addr" "gpr16,*")
(set_attr ("isa") ("avxvnni,avx512vnnivl"))])
 
 (define_insn "vpdpbusd__mask"
@@ -30321,13 +30323,14 @@
(unspec:VI4_AVX2
  [(match_operand:VI4_AVX2 1 "register_operand" "0,0")
   (match_operand:VI4_AVX2 2 "register_operand" "x,v")
-  (match_operand:VI4_AVX2 3 "nonimmediate_operand" "xm,vm")]
+  (match_operand:VI4_AVX2 3 "nonimmediate_operand" "xjm,vm")]
  UNSPEC_VPDPBUSDS))]
   "TARGET_AVXVNNI || (TARGET_AVX512VNNI && TARGET_AVX512VL)"
   "@
%{vex%} vpdpbusds\t{%3, %2, %0|%0, %2, %3}
vpdpbusds\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr ("prefix") ("vex,evex"))
+   (set_attr "addr" "gpr16,*")
(set_attr ("isa") ("avxvnni,avx512vnnivl"))])
 
 (define_insn "vpdpbusds__mask"
@@ -30389,13 +30392,14 @@
(unspec:VI4_AVX2
  [(match_operand:VI4_AVX2 1 "register_operand" "0,0")
   (match_operand:VI4_AVX2 2 "register_operand" "x,v")
-  (match_operand:VI4_AVX2 3 "nonimmediate_operand" "xm,vm")]
+  (match_operand:VI4_AVX2 3 "nonimmediate_operand" "xjm,vm")]
  UNSPEC_VPDPWSSD))]
   "TARGET_AVXVNNI || (TARGET_AVX512VNNI && TARGET_AVX512VL)"
   "@
   %{vex%} vpdpwssd\t{%3, %2, %0|%0, %2, %3}
   vpdpwssd\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr ("prefix") ("vex,evex"))
+   (set_attr "addr" "gpr16,*")
(set_attr ("isa") ("avxvnni,avx512vnnivl"))])
 
 (define_insn "vpdpwssd__mask"
@@ -30457,13 +30461,14 @@
(unspec:VI4_AVX2
  [(match_operand:VI4_AVX2 1 "register_operand" "0,0")
   (match_operand:VI4_AVX2 2 "register_operand" "x,v")
-  (match_operand:VI4_AVX2 3 "nonimmediate_operand" "xm,vm")]
+  (match_operand:VI4_AVX2 3 "nonimmediate_operand" "xjm,vm")]
  UNSPEC_VPDPWSSDS))]
   "TARGET_AVXVNNI || (TARGET_AVX512VNNI && TARGET_AVX512VL)"
   "@
   %{vex%} vpdpwssds\t{%3, %2, %0|%0, %2, %3}
   vpdpwssds\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr ("prefix") ("vex,evex"))
+   (set_attr "addr" "gpr16,*")
(set_attr ("isa") ("avxvnni,avx512vnnivl"))])
 
 (define_insn "vpdpwssds__mask"
@@ -30681,13 +30686,14 @@
   [(set (match_operand:V8BF 0 "register_operand" "=x,v")
(vec_concat:V8BF
  (float_truncate:V4BF
-   (match_operand:V4SF 1 "nonimmediate_operand" "xm,vm"))
+   (match_operand:V4SF 1 "nonimmediate_operand" "xjm,vm"))
  (match_operand:V4BF 2 "const0_operand")))]
   "TARGET_AVXNECONVERT || (TARGET_AVX512BF16 && TARGET_AVX512VL)"
   "@
   %{vex%} vcvtneps2bf16{x}\t{%1, %0|%0, %1}
   vcvtneps2bf16{x}\t{%1, %0|%0, %1}"
   [(set_attr "isa" "avxneconvert,avx512bf16vl")
+   (set_attr "a

[PATCH V4] fsra: gimple final sra pass for paramters and returns

2024-08-14 Thread Jiufu Guo

There are a few PRs (meta-bug PR101926) about accessing aggregate
parameters/returns which are passed through registers.

A major reason of those issues is when access the aggregate, the
temporary stack slots are used without leverage the information about
the incoming/outgoing registers.

We could use the current SRA pass in a special mode right before
GIMPLE->RTL expansion for the parameters/returns, and scalarize
the access according to the incoming/outgoing registers.
Some discussion in:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637935.html

This patch adds a FINAL mode for tree-sra; and introduces IFN ARG_PARTS
/SET_RET_PARTS for scalar(s) access on parameters/returns.
And expand the IFNs according to the incoming/outgoing registers.

Compare with previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658224.html
This version support more features:
* Allow access on parameter with grp_write
* Allow there are unscalarized hole on 'return var'
* Allow addr-taken occur on call statment

Again there would be things to be enhanced for more cases. e.g.
* More optimization for access parameter in memory

For 'return var' with outgoing registers, I had a check to use
gimplify-phase or NRV-pass to optimize the code.  Comparing with
SRA pass, gimplify-phase and NRV-pass are relative straight forward
to replace 'return var' with ''(DECL_RESULT); but there are
some challenges(e.g. address-taken, convert to MEM...) for them.
And they also need additional code in expander (like expand_SET_RET).
So, SRA would be still a good choise to optimize those 'returns'.

Bootstrapped/regtested on ppc64{,le}, x86_64.
With known behavior changes in pr88873.c, pr101908-3.c.

Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

PR target/108073
PR target/65421
PR target/69143

gcc/ChangeLog:

* calls.cc (precompute_register_parameters): Prepare callee argument
from caller parameter.
* cfgexpand.cc (expand_value_return):  Update 'rtx eq' checking.
(expand_return): Checking sclarized returns.
* internal-fn.cc (query_position_in_series): New function.
(assign_from_regs): New function.
(reference_alias_ptr_type): Extern declare.
(expand_ARG_PARTS): New IFN expand.
(assign_to_regs): New function.
(expand_SET_RET_PARTS): New IFN expand.
* internal-fn.def (ARG_PARTS): New IFN.
(SET_RET_PARTS): New IFN.
* passes.def (pass_sra_final): Add new pass.
* tree-pass.h (make_pass_sra_final): New function.
* tree-sra.cc (enum sra_mode): New enum item SRA_MODE_FINAL_INTRA.
(build_access_from_expr_1): Check TARGET_MEM_REF.
(build_accesses_from_assign): Accept SRA_MODE_FINAL_INTRA.
(find_var_candidates): Add condidates for fsra.
(fsra_analyze): New function.
(analyze_access_subtree): Check fsra_analyze.
(propagate_subaccesses_from_rhs): Check return-var for fsra.
(generate_subtree_copies):  Generate IFNs and add a new parameter.
(sra_modify_assign): Gen SET_RET_PARTS for assign to 'return'.
(initialize_parameter_reductions): Pass fsra info.
(final_intra_sra): New function
(class pass_sra_final): New pass class.
(make_pass_sra_final): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr102024.C: Update instructions checking.
* gcc.target/powerpc/pr108073-1.c: New test.
* gcc.target/powerpc/pr108073.c: New test.
* gcc.target/powerpc/pr65421.c: New test.
* gcc.target/powerpc/pr69143.c: New test.

---
 gcc/calls.cc  |  10 +
 gcc/cfgexpand.cc  |   6 +-
 gcc/internal-fn.cc| 434 ++
 gcc/internal-fn.def   |   6 +
 gcc/passes.def|   1 +
 gcc/tree-pass.h   |   1 +
 gcc/tree-sra.cc   | 207 -
 gcc/testsuite/g++.target/powerpc/pr102024.C   |   3 +-
 gcc/testsuite/gcc.target/powerpc/pr108073-1.c |  76 +++
 gcc/testsuite/gcc.target/powerpc/pr108073.c   |  74 +++
 gcc/testsuite/gcc.target/powerpc/pr65421.c|  26 ++
 gcc/testsuite/gcc.target/powerpc/pr69143.c|  22 +
 12 files changed, 852 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr69143.c

diff --git a/gcc/calls.cc b/gcc/calls.cc
index f28c58217fd..d15996b8de8 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -996,6 +996,16 @@ precompute_register_parameters (int num_actuals, struct 
arg_data *args,
pop_temp_slots ();
  }
 
+   /* Put pseudos to temp stack for argument */
+   if (GET_CODE (args[i].value) == PARALLEL)
+

Re: [PATCH V7 1/3] split complicate 64bit constant to memory

2024-08-14 Thread Jiufu Guo



Hi,

I would like to have a ping for these patches.

BR,
Jeff(Jiufu Guo)

Jiufu Guo  writes:

> Hi,
>
> Sometimes, a complicated constant is built via 3(or more)
> instructions.  Generally speaking, it would not be as fast
> as loading it from the constant pool (as the discussions in
> PR63281):
> "ld" is one instruction.  If consider "address/toc" adjust,
> we may count it as 2 instructions. And "pld" may need fewer
> cycles.
>
> Adding --param=rs6000-min-insns-constant-in-pool helps to
> control the instruction number threshold for different scenarios.
>
> As known, because the constant is load from memory by this
> patch,  so this functionality may affect the cache missing.
> While, IMHO, this patch would be still do the right thing.
>
> Compare with the previous version:
> This patch serie adds one more patch to tune the threshold
> for power10.
>
> Boostrap & regtest pass on ppc64{,le}.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
>   PR target/63281
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000.cc (rs6000_emit_set_const): Split constant to
>   memory for -m64.
>   * config/rs6000/rs6000.opt (rs6000-min-insns-constant-in-pool): New
>   parameter.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/const_anchors.c: Test final-rtl.
>   * gcc.target/powerpc/parall_5insn_const.c: Add option
>   --param=rs6000-min-insns-constant-in-pool=5 to keep the original test.
>   * gcc.target/powerpc/pr106550.c: Likewise.
>   * gcc.target/powerpc/pr106550_1.c: Likewise.
>   * gcc.target/powerpc/pr93012.c: Likewise.
>   * gcc.target/powerpc/pr87870.c: Update instruction counts.
>   * gcc.target/powerpc/pr63281.c: New test.
>
>
> ---
>  gcc/config/rs6000/rs6000.cc   | 19 +++
>  gcc/config/rs6000/rs6000.opt  |  5 +
>  .../gcc.target/powerpc/const_anchors.c|  5 +++--
>  .../gcc.target/powerpc/parall_5insn_const.c   |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr106550.c   |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr106550_1.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr63281.c| 11 +++
>  gcc/testsuite/gcc.target/powerpc/pr87870.c|  5 -
>  gcc/testsuite/gcc.target/powerpc/pr93012.c|  2 +-
>  9 files changed, 46 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 2046a831938..ec384e87868 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10240,6 +10240,25 @@ rs6000_emit_set_const (rtx dest, rtx source)
> c = sext_hwi (c, 32);
> emit_move_insn (lo, GEN_INT (c));
>   }
> +
> +  /* Use base_reg_operand to avoid spliting "r0=xxx" to "r0=[r0+off]"
> +  after RA when reusing the DEST register to build the value.  */
> +  else if ((can_create_pseudo_p () || base_reg_operand (dest, mode))
> +&& num_insns_constant (source, mode)
> + > rs6000_min_insns_constant_in_pool
> +&& TARGET_64BIT)
> + {
> +   rtx sym = force_const_mem (mode, source);
> +   if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
> +   && use_toc_relative_ref (XEXP (sym, 0), mode))
> + {
> +   rtx toc = create_TOC_reference (XEXP (sym, 0), dest);
> +   sym = gen_const_mem (mode, toc);
> +   set_mem_alias_set (sym, get_TOC_alias_set ());
> + }
> +
> +   emit_move_insn (dest, sym);
> + }
>else
>   rs6000_emit_set_long_const (dest, c);
>break;
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index e8ca70340df..a1c0d1e89c5 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -679,3 +679,8 @@ default value is 4.
>  Target Undocumented Joined UInteger Var(rs6000_vect_unroll_reduc_threshold) 
> Init(1) Param
>  When reduction factor computed for a loop exceeds the threshold specified by
>  this parameter, prefer to unroll this loop.  The default value is 1.
> +
> +-param=rs6000-min-insns-constant-in-pool=
> +Target Undocumented Joined UInteger Var(rs6000_min_insns_constant_in_pool) 
> Init(2) IntegerRange(2, 5) Param
> +The minimum instruction number of building a constant to force loading it 
> from
> +the constant pool.
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
> b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> index 542e2674b12..682e773d506 100644
> --- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target has_arch_ppc64 } } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -fdump-rtl-final" } */
>  
>  #define C1 0x2351847027482577ULL
>  #define C2 0x2351847027482578ULL
> @@ -17,4 +17,5 @@ void __attribute__ ((noinline)) foo1 (long long *a, long 
> long b)
>  *a++ = C2;
>  }
>

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Alejandro Colomar

Hi Jens, Martin,

On Wed, Aug 14, 2024 at 08:11:20AM GMT, Jens Gustedt wrote:
> > Checking , I see that while several
> > projects have a lengthof() macro, all of them use it with semantics
> > compatible with this keyword, so it shouldn't break too much.  Maybe
> > those projects will start receiving diagnostics that they're redefining
> > a standard keyword, but that's not too bad.
> 
> For a WG14 paper you should add these findings to support that choice.
> Another option would be for WG14 to standardize the then existing 
> implementation with the double underscores.

Makes sense; I'll add that into new "Prior art" and "Backwards
compatibility" sections within the paper.

> > > As for the parentheses, I personally think lengthof should follow
> > > similar rules compared to sizeof.
> > 
> > I think most people agree with this.
> 
> I still don't, in particular not for standardisation.
> 
> We have to remember that there are many small C compilers out there. 
> I would not want unnecessary burden on them. So my preferred choice would be
> a standardisation as a macro, similar to offsetof.
> gcc (and clang) could then just map that to their builtin, other compilers 
> could use
> whatever they have at the moment, even just the macros that you have in the 
> paper as a starting point. 
> 
> The rest would be "quality of implementation"

Hmmm, sounds reasonable.

Some doubts:

If we allow a compiler to implement it as a predefined macro that
expands to the usual sizeof division, it might produce double evaluation
in some VLA cases.  That would be surprising to some programs, which may
expect either 0 or 1 evaluations, but not 2.  Maybe we can leave it as
unspecified behavior, and an implementation may document that double
evaluation may happen if the input is a VLA?

> What time horizon do you see to add the feature for array parameters?

Martin, what do you think?  I think the only blocking thing for me is
what you mentioned about turning function parameters into arrays that
decay almost everywhere.  Once that's set up, my code will probably work
with them without modification, or maybe with just a little tweak.  Do
you have an idea of how much time that can take you?

I expect it to be well before C2y.  Maybe a year or two?

Have a lovely day!
Alex

> Thanks
> Jens

-- 

signature.asc
Description: PGP signature

[PATCH 06/22] AVX10.2 ymm rounding: Support vcvtps2{, u}{dq, qq} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SI_FTYPE_V8SF_V8SI_UQI_INT, V4DI_FTYPE_V4SF_V4DI_UQI_INT.
* config/i386/sse.md
(_fix_notrunc):
Extend to round.

(_fixuns_notrunc):
Add round condition check.
* config/i386/subst.md (round_constraint4): New.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 226 ++
 gcc/config/i386/i386-builtin-types.def|   2 +
 gcc/config/i386/i386-builtin.def  |   4 +
 gcc/config/i386/i386-expand.cc|   2 +
 gcc/config/i386/sse.md|  10 +-
 gcc/config/i386/subst.md  |   1 +
 gcc/testsuite/gcc.target/i386/avx-1.c |   4 +
 .../gcc.target/i386/avx10_2-rounding-1.c  |  32 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   4 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  12 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  12 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   4 +
 12 files changed, 308 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index bc3f92a7d1a..fca10a6b586 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -863,6 +863,146 @@ _mm256_maskz_cvtx_roundps_ph (__mmask8 __U, __m256 __A, 
const int __R)
(__mmask8) __U,
__R);
 }
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundps_epi32 (__m256 __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_vcvtps2dq256_mask_round ((__v8sf) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundps_epi32 (__m256i __W, __mmask8 __U, __m256 __A,
+  const int __R)
+{
+  return (__m256i) __builtin_ia32_vcvtps2dq256_mask_round ((__v8sf) __A,
+  (__v8si) __W,
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvt_roundps_epi32 (__mmask8 __U, __m256 __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_vcvtps2dq256_mask_round ((__v8sf) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundps_epi64 (__m128 __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_cvtps2qq256_mask_round ((__v4sf) __A,
+(__v4di)
+_mm256_setzero_si256 (),
+(__mmask8) -1,
+__R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundps_epi64 (__m256i __W, __mmask8 __U, __m128 __A,
+  const int __R)
+{
+  return (__m256i) __builtin_ia32_cvtps2qq256_mask_round ((__v4sf) __A,
+ (__v4di) __W,
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvt_roundps_epi64 (__mmask8 __U, __m128 __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_cvtps2qq256_mask_round ((__v4sf) __A,
+(__v4di)
+

[PATCH 12/22] AVX10.2 ymm rounding: Support vfmadd{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(_fmadd__mask3): Add condition check.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: New test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 176 ++
 gcc/config/i386/i386-builtin.def  |   9 +
 gcc/config/i386/sse.md|   2 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |   9 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  31 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   9 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  12 ++
 gcc/testsuite/gcc.target/i386/sse-22.c|  12 ++
 gcc/testsuite/gcc.target/i386/sse-23.c|   9 +
 9 files changed, 268 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index d5ea6bc57da..9015095144e 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -2092,6 +2092,146 @@ _mm256_maskz_fixupimm_round_ps (__mmask8 __U, __m256 
__A, __m256 __B,
(__mmask8) __U,
__R);
 }
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmadd_round_pd (__m256d __A, __m256d __B, __m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfmaddpd256_mask_round ((__v4df) __A,
+ (__v4df) __B,
+ (__v4df) __D,
+ (__mmask8) -1,
+ __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmadd_round_pd (__m256d __A, __mmask8 __U, __m256d __B,
+   __m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfmaddpd256_mask_round ((__v4df) __A,
+ (__v4df) __B,
+ (__v4df) __D,
+ (__mmask8) __U, __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fmadd_round_pd (__m256d __A, __m256d __B, __m256d __D,
+__mmask8 __U, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfmaddpd256_mask3_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df) __D,
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fmadd_round_pd (__mmask8 __U, __m256d __A, __m256d __B,
+__m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfmaddpd256_maskz_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df) __D,
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmadd_round_ph (__m256h __A, __m256h __B, __m256h __D, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfmaddph256_mask_round ((__v16hf) __A,
+ (__v16hf) __B,
+ (__v16hf) __D,
+ (__mmask16) -1,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmadd_round_ph (__m256h __A, __mmask16 __U, __m256h __B,
+   __m256h __D, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfmaddph256_mask_round ((__v16hf) __A,
+ (__v16hf) __B,
+ (__v16hf) __D,
+ (__mmask16) __U,
+ __R);
+}
+
+extern __inline __m256h
+__attri

[PATCH 02/22] AVX10.2 ymm rounding: Support vcvtdq2p{s, h} and vcvtpd2p{s, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: Add new intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SF_FTYPE_V8SI_V8SF_UQI_INT, V4SF_FTYPE_V4DF_V4SF_UQI_INT,
V8HF_FTYPE_V8SI_V8HF_UQI_INT, V8HF_FTYPE_V4DF_V8HF_UQI_INT.
* config/i386/sse.md:

(avx512fp16_vcvt2ph_):
Add condition check.
(avx512fp16_vcvtpd2ph_v4df_mask_round): New expand.
(*avx512fp16_vcvt2ph__mask): Change name to
avx512fp16_vcvt2ph__mask_1
and extend pattern to generate 256bit insns.
(avx_cvtpd2ps256): Change name to
avx_cvtpd2ps256 and extend pattern to
generate 256bit insns.
* config/i386/subst.md (round_applied): New condition.
(round_suff): New iterator.
(round_mode_condition): Add V32HI check for 512bit.
(round_saeonly_mode_condition): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add new macro test.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 210 ++
 gcc/config/i386/i386-builtin-types.def|   4 +
 gcc/config/i386/i386-builtin.def  |   4 +
 gcc/config/i386/i386-expand.cc|   4 +
 gcc/config/i386/sse.md|  32 ++-
 gcc/config/i386/subst.md  |   4 +
 gcc/testsuite/gcc.target/i386/avx-1.c |   4 +
 .../gcc.target/i386/avx10_2-rounding-1.c  |  44 +++-
 gcc/testsuite/gcc.target/i386/sse-13.c|   4 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  12 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  12 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   4 +
 12 files changed, 322 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 5698ed05c1d..09285c1ffcd 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -216,6 +216,138 @@ _mm256_mask_cmp_round_ps_mask (__mmask8 __U, __m256 __A, 
__m256 __B,
(__mmask8) __U,
__R);
 }
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundepi32_ph (__m256i __A, const int __R)
+{
+  return (__m128h) __builtin_ia32_vcvtdq2ph256_mask_round ((__v8si) __A,
+  (__v8hf)
+  _mm_setzero_ph (),
+  (__mmask8) -1,
+  __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundepi32_ph (__m128h __W, __mmask8 __U, __m256i __A,
+  const int __R)
+{
+  return (__m128h) __builtin_ia32_vcvtdq2ph256_mask_round ((__v8si) __A,
+  (__v8hf) __W,
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvt_roundepi32_ph (__mmask8 __U, __m256i __A, const int __R)
+{
+  return (__m128h) __builtin_ia32_vcvtdq2ph256_mask_round ((__v8si) __A,
+  (__v8hf)
+  _mm_setzero_ph (),
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundepi32_ps (__m256i __A, const int __R)
+{
+  return (__m256) __builtin_ia32_cvtdq2ps256_mask_round ((__v8si) __A,
+(__v8sf)
+_mm256_undefined_ps (),
+(__mmask8) -1,
+__R);
+}
+
+extern __inline __m256
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundepi32_ps (__m256 __W, __mmask8 __U, __m256i __A,
+  const int __R)
+{
+  return (__m256) __builtin_ia32_cvtdq2ps256_mask_round ((__v8si) __A,
+

[PATCH 10/22] AVX10.2 ymm rounding: Support vcvt{, u}w2ph and vdivp{s, d, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V16HF_FTYPE_V16HI_V16HF_UHI_INT.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: New test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 286 ++
 gcc/config/i386/i386-builtin-types.def|   1 +
 gcc/config/i386/i386-builtin.def  |   5 +
 gcc/config/i386/i386-expand.cc|   1 +
 gcc/testsuite/gcc.target/i386/avx-1.c |   5 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  58 
 gcc/testsuite/gcc.target/i386/sse-13.c|   5 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  15 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  15 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   5 +
 10 files changed, 396 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-rounding-3.c

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 384facb424c..15ea46b5983 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -1757,6 +1757,183 @@ _mm256_maskz_cvt_roundepu64_ps (__mmask8 __U, __m256i 
__A, const int __R)
  (__mmask8) __U,
  __R);
 }
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundepu16_ph (__m256i __A, const int __R)
+{
+  return (__m256h) __builtin_ia32_vcvtuw2ph256_mask_round ((__v16hi) __A,
+  (__v16hf)
+  _mm256_setzero_ph (),
+  (__mmask16) -1,
+  __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundepu16_ph (__m256h __W, __mmask16 __U, __m256i __A,
+  const int __R)
+{
+  return (__m256h) __builtin_ia32_vcvtuw2ph256_mask_round ((__v16hi) __A,
+  (__v16hf) __W,
+  (__mmask16) __U,
+  __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvt_roundepu16_ph (__mmask16 __U, __m256i __A, const int __R)
+{
+  return (__m256h) __builtin_ia32_vcvtuw2ph256_mask_round ((__v16hi) __A,
+  (__v16hf)
+  _mm256_setzero_ph (),
+  (__mmask16) __U,
+  __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundepi16_ph (__m256i __A, const int __R)
+{
+  return (__m256h) __builtin_ia32_vcvtw2ph256_mask_round ((__v16hi) __A,
+ (__v16hf)
+ _mm256_setzero_ph (),
+ (__mmask16) -1,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundepi16_ph (__m256h __W, __mmask16 __U, __m256i __A,
+  const int __R)
+{
+  return (__m256h) __builtin_ia32_vcvtw2ph256_mask_round ((__v16hi) __A,
+ (__v16hf) __W,
+ (__mmask16) __U,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvt_roundepi16_ph (__mmask16 __U, __m256i __A, const int __R)
+{
+  return (__m256h) __builtin_ia32_vcvtw2ph256_mask_round ((__v16hi) __A,
+ (__v16hf)
+ _mm256_setzero_ph (),
+ (__mmask16) __U,
+ __R);
+}
+
+extern __inline __m256d
+__attri

[PATCH 08/22] AVX10.2 ymm rounding: Support vcvttph2{, u}{dq, qq, w} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md 
(avx512fp16_fix_trunc2):
Extend round control for 256bit.
(unspec_avx512fp16_fix_trunc2):
Ditto.

(avx512fp16_fix_trunc2):
Add condition check.
* config/i386/subst.md
(round_saeonly_mode_condition): Add V16HI check for 256bit.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-2.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 335 ++
 gcc/config/i386/i386-builtin.def  |   6 +
 gcc/config/i386/sse.md|  10 +-
 gcc/config/i386/subst.md  |   1 +
 gcc/testsuite/gcc.target/i386/avx-1.c |   6 +
 .../gcc.target/i386/avx10_2-rounding-2.c  |  46 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   6 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  18 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  18 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   6 +
 10 files changed, 447 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 25efd9d7b96..45a04e5a7a8 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -1241,6 +1241,216 @@ _mm256_maskz_cvtt_roundpd_epu64 (__mmask8 __U, __m256d 
__A, const int __R)
   (__mmask8) __U,
   __R);
 }
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtt_roundph_epi32 (__m128h __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_vcvttph2dq256_mask_round ((__v8hf) __A,
+  (__v8si)
+  _mm256_setzero_si256 (),
+  (__mmask8) -1,
+  __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtt_roundph_epi32 (__m256i __W, __mmask8 __U, __m128h __A,
+   const int __R)
+{
+  return (__m256i) __builtin_ia32_vcvttph2dq256_mask_round ((__v8hf) __A,
+   (__v8si) __W,
+   (__mmask8) __U,
+   __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtt_roundph_epi32 (__mmask8 __U, __m128h __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_vcvttph2dq256_mask_round ((__v8hf) __A,
+  (__v8si)
+  _mm256_setzero_si256 (),
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtt_roundph_epi64 (__m128h __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_vcvttph2qq256_mask_round ((__v8hf) __A,
+  (__v4di)
+  _mm256_setzero_si256 (),
+  (__mmask8) -1,
+  __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtt_roundph_epi64 (__m256i __W, __mmask8 __U, __m128h __A,
+   const int __R)
+{
+  return (__m256i) __builtin_ia32_vcvttph2qq256_mask_round ((__v8hf) __A,
+   (__v4di) __W,
+   (__mmask8) __U,
+   __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtt_roundph_epi64 (__mmask8 __U, __m128h __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_vcvttph2qq256_mask_round ((__v8hf) __A,
+  (__v4di)
+  _mm256_setzero_si256 (),
+

[PATCH 05/22] AVX10.2 ymm rounding: Support vcvtph2{, u}w and vcvtps2p{d, hx} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V16HI_FTYPE_V16HF_V16HI_UHI_INT, V4DF_FTYPE_V4SF_V4DF_UQI_INT
V8HF_FTYPE_V8SF_V8HF_UQI_INT.
* config/i386/sse.md
(avx512fp16_vcvt2ph_):
Add round condition check.
* config/i386/subst.md (round_mode_condition): Add V16HI check for
256bit.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 220 ++
 gcc/config/i386/i386-builtin-types.def|   3 +
 gcc/config/i386/i386-builtin.def  |   4 +
 gcc/config/i386/i386-expand.cc|   3 +
 gcc/config/i386/sse.md|   2 +-
 gcc/config/i386/subst.md  |   1 +
 gcc/testsuite/gcc.target/i386/avx-1.c |   4 +
 .../gcc.target/i386/avx10_2-rounding-1.c  |  36 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   4 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  12 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  12 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   4 +
 12 files changed, 304 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 29966f5e1bf..bc3f92a7d1a 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -726,6 +726,143 @@ _mm256_maskz_cvt_roundph_epu64 (__mmask8 __U, __m128h 
__A, const int __R)
   (__mmask8) __U,
   __R);
 }
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundph_epu16 (__m256h __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_vcvtph2uw256_mask_round ((__v16hf) __A,
+ (__v16hi)
+ _mm256_undefined_si256 (),
+ (__mmask16) -1,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundph_epu16 (__m256i __W, __mmask16 __U, __m256h __A,
+  const int __R)
+{
+  return (__m256i) __builtin_ia32_vcvtph2uw256_mask_round ((__v16hf) __A,
+  (__v16hi) __W,
+  (__mmask16) __U,
+  __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvt_roundph_epu16 (__mmask16 __U, __m256h __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_vcvtph2uw256_mask_round ((__v16hf) __A,
+ (__v16hi)
+ _mm256_setzero_si256 (),
+ (__mmask16) __U,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundph_epi16 (__m256h __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_vcvtph2w256_mask_round ((__v16hf) __A,
+(__v16hi)
+_mm256_undefined_si256 (),
+(__mmask16) -1,
+__R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundph_epi16 (__m256i __W, __mmask16 __U, __m256h __A,
+  const int __R)
+{
+  return (__m256i) __builtin_ia32_vcvtph2w256_mask_round ((__v16hf) __A,
+ (__v16hi) __W,
+ (__mmask16) __U,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvt_roundph_epi16 (__mmask16 __U, __m256h __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_vcvtph2w256_mask_round ((__v16hf) __A,
+

Re: [PATCH] c++/coroutines: fix passing *this to promise type, again [PR116327]

2024-08-14 Thread Iain Sandoe

> On 14 Aug 2024, at 05:46, Jason Merrill  wrote:
> 
> On 8/13/24 7:52 PM, Patrick Palka wrote:
>> On Tue, 13 Aug 2024, Jason Merrill wrote:
>>> On 8/12/24 10:01 PM, Patrick Palka wrote:
 Tested on x86_64-pc-linux-gnu, does this look OK for trunk/14?

 -- >8 --

 In r15-2210 we got rid of the unnecessary cast to lvalue reference when
 passing *this to the promise type ctor, and as a drive-by change we also
 simplified the code to use cp_build_fold_indirect_ref.

 But cp_build_fold_indirect_ref apparently does too much here, namely
 it has a shortcut for returning current_class_ref if the operand is
 current_class_ptr.  The problem with that shortcut is current_class_ref
 might have gotten clobbered earlier if it appeared in the function body,
 since rewrite_param_uses walks and rewrites in-place all local variable
 uses to their corresponding frame copy.

 So later this cp_build_fold_indirect_ref for *__closure will instead return
 the mutated current_class_ref i.e. *frame_ptr->__closure, which doesn't
 make sense here since we're in the ramp function and not the actor function
 where frame_ptr is in scope.

 This patch fixes this by building INDIRECT_REF directly instead of using
 cp_build_fold_indirect_ref.  (Another approach might be to restore an
 unshare_expr'd current_class_ref after doing coro_rewrite_function_body
 to avoid it remaining clobbered after the rewriting process.  Yet
 another more ambitious approach might be to avoid this tree sharing in
 the first place by returning unshared versions of current_class_ref from
 maybe_dummy_object etc.)
>>> 
>>> Maybe clear current_class_ptr/ref in coro rewriting so we don't hit the
>>> shortcut?
>> That seems to work, but I'm kind of worried about what other code paths
>> that'd disable, particularly semantic code paths vs just optimizations
>> code paths such as the cp_build_fold_indirect_ref shortcut.  IIUC the
>> ramp function has the same signature as the original presumably non-static
>> member function so ideally current class ref should remain set when
>> building the ramp function body and cleared only when building/rewriting
>> the actor function body (which is never a non-static member function and
>> so doesn't have a this pointer, I think?).

Yes, that is what I expect to be needed, and …

>> We do the actor body stuff first however, so even if we clear
>> current_class_ref then, the restored current_class_ref during the
>> later ramp function body stuff (including during the call to
>> cp_build_fold_indirect_ref) will still be clobbered :(

… I have a patch set (soon to be posted) that splits the analysis (needed to
complete the ramp) and the synthesis (of the actor / destroy) so that the latter
can happen after the context of the ramp is exited.  (i.e. right at the end of
finish_function and thus the sythesis can be treated as a stand-alone definition
this resolves similar issues with global state that we saw with Arsen’s patch to
resolve label contexts).

Iain

>> So ISTM this more narrow approach might be preferable unless we ever run
>> into another instance of this current_class_ref clobbering issue?
> 
> Fair enough.

> Is there a reason not to use build_fold_indirect_ref (without cp_)?
> 
> Jason
>

[PATCH 13/22] AVX10.2 ymm rounding: Support vfmaddcph and vfmaddsub{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(_fmaddsub__mask): Add condition check.
(_fmaddsub__mask3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 238 ++
 gcc/config/i386/i386-builtin.def  |  13 +
 gcc/config/i386/sse.md|   4 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |  13 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  43 
 gcc/testsuite/gcc.target/i386/sse-13.c|  13 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  16 ++
 gcc/testsuite/gcc.target/i386/sse-22.c|  15 ++
 gcc/testsuite/gcc.target/i386/sse-23.c|  13 +
 9 files changed, 366 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 9015095144e..95e42410a10 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -2232,6 +2232,193 @@ _mm256_maskz_fmadd_round_ps (__mmask8 __U, __m256 __A, 
__m256 __B,
  (__mmask8) __U,
  __R);
 }
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmadd_round_pch (__m256h __A, __m256h __B, __m256h __D, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfmaddcph256_round ((__v16hf) __A,
+ (__v16hf) __B,
+ (__v16hf) __D,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmadd_round_pch (__m256h __A, __mmask16 __U, __m256h __B,
+__m256h __D, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfmaddcph256_mask_round ((__v16hf) __A,
+  (__v16hf) __B,
+  (__v16hf) __D,
+  __U,
+  __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fmadd_round_pch (__m256h __A, __m256h __B, __m256h __D,
+ __mmask16 __U, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfmaddcph256_mask3_round ((__v16hf) __A,
+   (__v16hf) __B,
+   (__v16hf) __D,
+   __U,
+   __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fmadd_round_pch (__mmask16 __U, __m256h __A, __m256h __B,
+ __m256h __D, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfmaddcph256_maskz_round ((__v16hf) __A,
+   (__v16hf) __B,
+   (__v16hf) __D,
+   __U,
+   __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmaddsub_round_pd (__m256d __A, __m256d __B, __m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfmaddsubpd256_mask_round ((__v4df) __A,
+(__v4df) __B,
+(__v4df) __D,
+(__mmask8) -1,
+__R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmaddsub_round_pd (__m256d __A, __mmask8 __U, __m256d __B,
+  __m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfmaddsubpd256_mask_round ((__v4df) __A,
+(__v4df) __B,
+(__v4df) __D,
+(__mmask8) __

[PATCH 00/22] Support AVX10.2 ymm rounding

2024-08-14 Thread Haochen Jiang

Hi all,

The initial patch for AVX10.2 has been merged this week.

For the upcoming patches, we will first upstream ymm rounding control part.

In ymm rounding part, ALL the instructions in AVX512 with 512-bit rounding
control will also have 256-bit rounding control in AVX10.2.

For clearness, the patch order is based on alphabetical order. Each patch
will include its intrin definition and related tests. Sometimes pattern is
not changed in the patch because the previous change in the patch series
has already enabled the 256 bit rounding in the pattern.

Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk?

Thx,
Haochen

Ref: Intel Advanced Vector Extensions 10.2 Architecture Specification
https://cdrdv2.intel.com/v1/dl/getContent/828965

[PATCH 03/22] AVX10.2 ymm rounding: Support vcvtpd2{, u}{dq, qq} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: Add new intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V4DI_FTYPE_V4DF_V4DI_UQI_INT, V4SI_FTYPE_V4DF_V4SI_UQI_INT.
* config/i386/sse.md:
(avx_cvtpd2dq256): Change name to
avx_cvtpd2dq256 and extend pattern to
generate 256bit insns.
(fixuns_notrunc2):
Add round_mode_condition.
* config/i386/subst.md (round_pd2udqsuff): New iterator.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add new macro test.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 218 ++
 gcc/config/i386/i386-builtin-types.def|   2 +
 gcc/config/i386/i386-builtin.def  |   4 +
 gcc/config/i386/i386-expand.cc|   2 +
 gcc/config/i386/sse.md|  13 +-
 gcc/config/i386/subst.md  |   1 +
 gcc/testsuite/gcc.target/i386/avx-1.c |   4 +
 .../gcc.target/i386/avx10_2-rounding-1.c  |  33 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   4 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  12 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  12 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   4 +
 12 files changed, 303 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 09285c1ffcd..3e5e9f3ba0e 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -348,6 +348,144 @@ _mm256_maskz_cvt_roundpd_ps (__mmask8 __U, __m256d __A, 
const int __R)
 (__mmask8) __U,
 __R);
 }
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundpd_epi32 (__m256d __A, const int __R)
+{
+  return
+(__m128i) __builtin_ia32_cvtpd2dq256_mask_round ((__v4df) __A,
+(__v4si)
+_mm_undefined_si128 (),
+(__mmask8) -1,
+__R);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundpd_epi32 (__m128i __W, __mmask8 __U, __m256d __A,
+  const int __R)
+{
+  return (__m128i) __builtin_ia32_cvtpd2dq256_mask_round ((__v4df) __A,
+ (__v4si) __W,
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvt_roundpd_epi32 (__mmask8 __U, __m256d __A, const int __R)
+{
+  return (__m128i) __builtin_ia32_cvtpd2dq256_mask_round ((__v4df) __A,
+ (__v4si)
+ _mm_setzero_si128 (),
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundpd_epi64 (__m256d __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_cvtpd2qq256_mask_round ((__v4df) __A,
+(__v4di)
+_mm256_setzero_si256 (),
+(__mmask8) -1,
+__R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundpd_epi64 (__m256i __W, __mmask8 __U, __m256d __A,
+  const int __R)
+{
+  return (__m256i) __builtin_ia32_cvtpd2qq256_mask_round ((__v4df) __A,
+ (__v4di) __W,
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvt_roundpd_epi64 (__mmask8 __U, __m256d __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_cvtpd2qq256_mask_round ((__v4df) __A,
+

[PATCH 07/22] AVX10.2 ymm rounding: Support vcvtqq2p{s, d, h} and vcvttpd2{, u}{dq, qq} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V4DF_FTYPE_V4DI_V4DF_UQI_INT, V4SF_FTYPE_V4DI_V4SF_UQI_INT,
V8HF_FTYPE_V4DI_V8HF_UQI_INT.
* config/i386/sse.md:
(avx512fp16_vcvtqq2ph_v4di_mask_round): New expand.
(*avx512fp16_vcvt2ph__mask):
Extend round control and add "_1" suffix.

(float2):
Add condition check.

(float2):
Ditto.
(float2):
Limit suffix output.
(unspec_fix_truncv4dfv4si2): Extend round control.
(unspec_fixuns_truncv4dfv4si2): Ditto.
* config/i386/subst.md (round_qq2pssuff): New iterator.
(round_saeonly_suff): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-2.c: New test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 390 ++
 gcc/config/i386/i386-builtin-types.def|   3 +
 gcc/config/i386/i386-builtin.def  |   7 +
 gcc/config/i386/i386-expand.cc|   3 +
 gcc/config/i386/sse.md|  43 +-
 gcc/config/i386/subst.md  |   2 +
 gcc/testsuite/gcc.target/i386/avx-1.c |   7 +
 .../gcc.target/i386/avx10_2-rounding-2.c  |  72 
 gcc/testsuite/gcc.target/i386/sse-13.c|   7 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  21 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  21 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   7 +
 12 files changed, 569 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-rounding-2.c

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index fca10a6b586..25efd9d7b96 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -1003,6 +1003,244 @@ _mm256_maskz_cvt_roundps_epu64 (__mmask8 __U, __m128 
__A, const int __R)
  (__mmask8) __U,
  __R);
 }
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundepi64_pd (__m256i __A, const int __R)
+{
+  return (__m256d) __builtin_ia32_cvtqq2pd256_mask_round ((__v4di) __A,
+ (__v4df)
+ _mm256_setzero_pd (),
+ (__mmask8) -1,
+ __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundepi64_pd (__m256d __W, __mmask8 __U, __m256i __A,
+  const int __R)
+{
+  return (__m256d) __builtin_ia32_cvtqq2pd256_mask_round ((__v4di) __A,
+ (__v4df) __W,
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvt_roundepi64_pd (__mmask8 __U, __m256i __A, const int __R)
+{
+  return (__m256d) __builtin_ia32_cvtqq2pd256_mask_round ((__v4di) __A,
+ (__v4df)
+ _mm256_setzero_pd (),
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundepi64_ph (__m256i __A, const int __R)
+{
+  return (__m128h) __builtin_ia32_vcvtqq2ph256_mask_round ((__v4di) __A,
+  (__v8hf)
+  _mm_setzero_ph (),
+  (__mmask8) -1,
+  __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundepi64_ph (__m128h __W, __mmask8 __U, __m256i __A,
+  const int __R)
+{
+  return (__m128h) __builtin_ia32_vcvtqq2ph256_mask_round ((__v4di) __A,
+  (__v8hf) __W,
+

[PATCH 09/22] AVX10.2 ymm rounding: Support vcvttps2{, u}{dq, qq} and vcvtu{dq, qq}2p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md
(unspec_fix_truncv8sfv8si2): Extend rounding control.
(fixuns_trunc2):
Ditto.

(floatuns2):
Add condition check.

(fix_trunc2):
Remove round_saeonly_name.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-2.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 492 ++
 gcc/config/i386/i386-builtin.def  |   9 +
 gcc/config/i386/sse.md|  27 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |   9 +
 .../gcc.target/i386/avx10_2-rounding-2.c  |  75 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   9 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  26 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  27 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   9 +
 9 files changed, 670 insertions(+), 13 deletions(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 45a04e5a7a8..384facb424c 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -1451,6 +1451,312 @@ _mm256_maskz_cvtt_roundph_epi16 (__mmask16 __U, __m256h 
__A, const int __R)
  (__mmask16) __U,
  __R);
 }
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtt_roundps_epi32 (__m256 __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_cvttps2dq256_mask_round ((__v8sf) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtt_roundps_epi32 (__m256i __W, __mmask8 __U, __m256 __A,
+   const int __R)
+{
+  return (__m256i) __builtin_ia32_cvttps2dq256_mask_round ((__v8sf) __A,
+  (__v8si) __W,
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtt_roundps_epi32 (__mmask8 __U, __m256 __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_cvttps2dq256_mask_round ((__v8sf) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtt_roundps_epi64 (__m128 __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_cvttps2qq256_mask_round ((__v4sf) __A,
+ (__v4di)
+ _mm256_setzero_si256 (),
+ (__mmask8) -1,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtt_roundps_epi64 (__m256i __W, __mmask8 __U, __m128 __A,
+   const int __R)
+{
+  return (__m256i) __builtin_ia32_cvttps2qq256_mask_round ((__v4sf) __A,
+  (__v4di) __W,
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtt_roundps_epi64 (__mmask8 __U, __m128 __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_cvttps2qq256_mask_round ((__v4sf) __A,
+ (__v4di)
+ _mm256_setzero_si256 (),
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline_

[PATCH 16/22] AVX10.2 ymm rounding: Support vfnmsub{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(_fnmsub__mask3): Add condition check.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 181 ++
 gcc/config/i386/i386-builtin.def  |   9 +
 gcc/config/i386/sse.md|   2 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |   9 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  31 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   9 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  12 ++
 gcc/testsuite/gcc.target/i386/sse-22.c|  12 ++
 gcc/testsuite/gcc.target/i386/sse-23.c|   9 +
 9 files changed, 273 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 3f833bffa54..afc1220fea4 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -2876,6 +2876,151 @@ _mm256_maskz_fnmadd_round_ps (__mmask8 __U, __m256 __A, 
__m256 __B,
   (__mmask8) __U,
   __R);
 }
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fnmsub_round_pd (__m256d __A, __m256d __B, __m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfnmsubpd256_mask_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df) __D,
+  (__mmask8) -1,
+  __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fnmsub_round_pd (__m256d __A, __mmask8 __U, __m256d __B,
+__m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfnmsubpd256_mask_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df) __D,
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fnmsub_round_pd (__m256d __A, __m256d __B, __m256d __D,
+ __mmask8 __U, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfnmsubpd256_mask3_round ((__v4df) __A,
+   (__v4df) __B,
+   (__v4df) __D,
+   (__mmask8) __U,
+   __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fnmsub_round_pd (__mmask8 __U, __m256d __A, __m256d __B,
+ __m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfnmsubpd256_maskz_round ((__v4df) __A,
+   (__v4df) __B,
+   (__v4df) __D,
+   (__mmask8) __U,
+   __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fnmsub_round_ph (__m256h __A, __m256h __B, __m256h __D, const int __R)
+{
+  return (__m256h)
+__builtin_ia32_vfnmsubph256_mask_round ((__v16hf) __A,
+   (__v16hf) __B,
+   (__v16hf) __D,
+   (__mmask16) -1,
+   __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fnmsub_round_ph (__m256h __A, __mmask16 __U, __m256h __B,
+__m256h __D, const int __R)
+{
+  return (__m256h)
+__builtin_ia32_vfnmsubph256_mask_round ((__v16hf) __A,
+   (__v16hf) __B,
+   (__v16hf) __D,
+   (__mmask16) __U,
+   __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_

[PATCH 11/22] AVX10.2 ymm rounding: Support vfc{madd, mul}cph, vfixupimmp{s, d} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V16HF_FTYPE_V16HF_V16HF_INT, V16HF_FTYPE_V16HF_V16HF_V16HF_INT,
V16HF_FTYPE_V16HF_V16HF_V16HF_UQI_INT,
V4DF_FTYPE_V4DF_V4DF_V4DI_INT_UQI_INT,
V8SF_FTYPE_V8SF_V8SF_V8SI_INT_UQI_INT.
* config/i386/sse.md:
(_fixupimm):
Add condition check.
(_fixupimm_mask): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: New test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 247 ++
 gcc/config/i386/i386-builtin-types.def|   5 +
 gcc/config/i386/i386-builtin.def  |  10 +
 gcc/config/i386/i386-expand.cc|   5 +
 gcc/config/i386/sse.md|   4 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |  10 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  49 
 gcc/testsuite/gcc.target/i386/sse-13.c|  10 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  13 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  13 +
 gcc/testsuite/gcc.target/i386/sse-23.c|  10 +
 11 files changed, 374 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 15ea46b5983..d5ea6bc57da 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -1934,6 +1934,164 @@ _mm256_maskz_div_round_ps (__mmask8 __U, __m256 __A, 
__m256 __B,
  (__mmask8) __U,
  __R);
 }
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fcmadd_round_pch (__m256h __A, __m256h __B, __m256h __D, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfcmaddcph256_round ((__v16hf) __A,
+  (__v16hf) __B,
+  (__v16hf) __D,
+  __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fcmadd_round_pch (__m256h __A, __mmask8 __U, __m256h __B,
+ __m256h __D, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfcmaddcph256_mask_round ((__v16hf) __A,
+   (__v16hf) __B,
+   (__v16hf) __D,
+   __U,
+   __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fcmadd_round_pch (__m256h __A, __m256h __B, __m256h __D,
+  __mmask8 __U, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfcmaddcph256_mask3_round ((__v16hf) __A,
+(__v16hf) __B,
+(__v16hf) __D,
+__U,
+__R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fcmadd_round_pch (__mmask8 __U, __m256h __A, __m256h __B,
+  __m256h __D, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfcmaddcph256_maskz_round ((__v16hf) __A,
+(__v16hf) __B,
+(__v16hf) __D,
+__U,
+__R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fcmul_round_pch (__m256h __A, __m256h __B, const int __R)
+{
+  return
+(__m256h) __builtin_ia32_vfcmulcph256_round ((__v16hf) __A,
+(__v16hf) __B,
+__R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fcmul_round_pch (__m256h __W, __mmask8 __U, __m256h __A,
+__m256h __B, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfcmulcph256_mask_round ((__v16hf) __A,
+

[PATCH 15/22] AVX10.2 ymm rounding: Support vfmulcph and vfnmadd{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 241 ++
 gcc/config/i386/i386-builtin.def  |  11 +
 gcc/testsuite/gcc.target/i386/avx-1.c |  11 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  50 
 gcc/testsuite/gcc.target/i386/sse-13.c|  11 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  14 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  14 +
 gcc/testsuite/gcc.target/i386/sse-23.c|  11 +
 8 files changed, 363 insertions(+)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 346a32c1a8a..3f833bffa54 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -2697,6 +2697,185 @@ _mm256_maskz_fmsubadd_round_ps (__mmask8 __U, __m256 
__A, __m256 __B,
 (__mmask8) __U,
 __R);
 }
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmul_round_pch (__m256h __B, __m256h __D, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfmulcph256_round ((__v16hf) __B,
+(__v16hf) __D,
+__R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmul_round_pch (__m256h __A, __mmask8 __U, __m256h __B,
+   __m256h __D, const int __R)
+{
+  return (__m256h) __builtin_ia32_vfmulcph256_mask_round ((__v16hf) __B,
+ (__v16hf) __D,
+ (__v16hf) __A,
+ (__mmask16) __U,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fmul_round_pch (__mmask8 __U, __m256h __B, __m256h __D,
+const int __R)
+{
+  return (__m256h) __builtin_ia32_vfmulcph256_mask_round ((__v16hf) __B,
+ (__v16hf) __D,
+ (__v16hf)
+ _mm256_setzero_ph (),
+ (__mmask16) __U,
+ __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fnmadd_round_pd (__m256d __A, __m256d __B, __m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfnmaddpd256_mask_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df) __D,
+  (__mmask8) -1,
+  __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fnmadd_round_pd (__m256d __A, __mmask8 __U, __m256d __B,
+__m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfnmaddpd256_mask_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df) __D,
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fnmadd_round_pd (__m256d __A, __m256d __B, __m256d __D,
+ __mmask8 __U, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfnmaddpd256_mask3_round ((__v4df) __A,
+   (__v4df) __B,
+   (__v4df) __D,
+   (__mmask8) __U,
+   __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fnmadd_round_pd (__mmask8 __U, __m256d __A, __m256d __B,
+ __m256d __D, const in

[PATCH 19/22] AVX10.2 ymm rounding: Support vmulp{s, d, h} and vrangep{s, d} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin):
Handle V8SF_FTYPE_V8SF_V8SF_INT_V8SF_UQI_INT,
V4DF_FTYPE_V4DF_V4DF_INT_V4DF_UQI_INT.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 313 ++
 gcc/config/i386/i386-builtin-types.def|   2 +
 gcc/config/i386/i386-builtin.def  |   5 +
 gcc/config/i386/i386-expand.cc|   2 +
 gcc/testsuite/gcc.target/i386/avx-1.c |   5 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  43 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   5 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  15 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  15 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   5 +
 10 files changed, 410 insertions(+)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index a5712f5230a..ac0914415c9 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -3454,6 +3454,198 @@ _mm256_maskz_min_round_ps (__mmask8 __U, __m256 __A, 
__m256 __B,
  (__mmask8) __U,
  __R);
 }
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mul_round_pd (__m256d __A, __m256d __B, const int __R)
+{
+  return (__m256d) __builtin_ia32_mulpd256_mask_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df)
+  _mm256_undefined_pd (),
+  (__mmask8) -1,
+  __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_mul_round_pd (__m256d __W, __mmask8 __U, __m256d __A,
+ __m256d __B, const int __R)
+{
+  return (__m256d) __builtin_ia32_mulpd256_mask_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df) __W,
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_mul_round_pd (__mmask8 __U, __m256d __A, __m256d __B,
+  const int __R)
+{
+  return (__m256d) __builtin_ia32_mulpd256_mask_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df)
+  _mm256_setzero_pd (),
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mul_round_ph (__m256h __A, __m256h __B, const int __R)
+{
+  return (__m256h) __builtin_ia32_mulph256_mask_round ((__v16hf) __A,
+  (__v16hf) __B,
+  (__v16hf)
+  _mm256_undefined_ph (),
+  (__mmask16) -1,
+  __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_mul_round_ph (__m256h __W, __mmask16 __U, __m256h __A,
+ __m256h __B, const int __R)
+{
+  return (__m256h) __builtin_ia32_mulph256_mask_round ((__v16hf) __A,
+  (__v16hf) __B,
+  (__v16hf) __W,
+  (__mmask16) __U,
+  __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_mul_round_ph (__mmask16 __U, __m256h __A, __m256h __B,
+  const int __R)
+{
+  return (__m256h) __builtin_ia32_mulph256_mask_round ((__v16hf) __A,
+

[PATCH 04/22] AVX10.2 ymm rounding: Support vcvtph2p{s, d, sx} and vcvtph2{, u}{dq, qq} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SF_FTYPE_V8HF_V8SF_UQI_INT, V8SI_FTYPE_V8HF_V8SI_UQI_INT,
V4DF_FTYPE_V8HF_V4DF_UQI_INT, V4DI_FTYPE_V8HF_V4DI_UQI_INT.
* config/i386/sse.md:
(avx512fp16_float_extend_ph2):
Add condition check.
(avx512fp16_vcvtph2_
 ):
Ditto.
(avx512fp16_float_extend_ph2): Extend round saeonly.
(vcvtph2ps256): Ditto.
* config/i386/subst.md
(round_saeonly_applied): New condition.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 384 ++
 gcc/config/i386/i386-builtin-types.def|   4 +
 gcc/config/i386/i386-builtin.def  |   7 +
 gcc/config/i386/i386-expand.cc|   4 +
 gcc/config/i386/sse.md|  19 +-
 gcc/config/i386/subst.md  |   1 +
 gcc/testsuite/gcc.target/i386/avx-1.c |   7 +
 .../gcc.target/i386/avx10_2-rounding-1.c  |  57 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   7 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  20 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  21 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   7 +
 12 files changed, 529 insertions(+), 9 deletions(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 3e5e9f3ba0e..29966f5e1bf 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -486,6 +486,246 @@ _mm256_maskz_cvt_roundpd_epu64 (__mmask8 __U, __m256d 
__A, const int __R)
  (__mmask8) __U,
  __R);
 }
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundph_epi32 (__m128h __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_vcvtph2dq256_mask_round ((__v8hf) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ (__mmask8) -1,
+ __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundph_epi32 (__m256i __W, __mmask8 __U, __m128h __A,
+  const int __R)
+{
+  return (__m256i) __builtin_ia32_vcvtph2dq256_mask_round ((__v8hf) __A,
+  (__v8si) __W,
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvt_roundph_epi32 (__mmask8 __U, __m128h __A, const int __R)
+{
+  return
+(__m256i) __builtin_ia32_vcvtph2dq256_mask_round ((__v8hf) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvt_roundph_pd (__m128h __A, const int __R)
+{
+  return (__m256d) __builtin_ia32_vcvtph2pd256_mask_round ((__v8hf) __A,
+  (__v4df)
+  _mm256_setzero_pd (),
+  (__mmask8) -1,
+  __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvt_roundph_pd (__m256d __W, __mmask8 __U, __m128h __A,
+   const int __R)
+{
+  return (__m256d) __builtin_ia32_vcvtph2pd256_mask_round ((__v8hf) __A,
+  (__v4df) __W,
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvt_roundph_pd (__mmask8 __U, __m128h __A

[PATCH 18/22] AVX10.2 ymm rounding: Support v{max, min}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 360 ++
 gcc/config/i386/i386-builtin.def  |   6 +
 gcc/testsuite/gcc.target/i386/avx-1.c |   6 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  50 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   6 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  18 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  18 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   6 +
 8 files changed, 470 insertions(+)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 07729a6cc04..a5712f5230a 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -3232,6 +3232,228 @@ _mm256_maskz_getmant_round_ps (__mmask8 __U, __m256 __A,
  _mm256_setzero_ps (),
  __U, __R);
 }
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_max_round_pd (__m256d __A, __m256d __B, const int __R)
+{
+  return (__m256d) __builtin_ia32_maxpd256_mask_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df)
+  _mm256_undefined_pd (),
+  (__mmask8) -1,
+  __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_max_round_pd (__m256d __W, __mmask8 __U, __m256d __A,
+ __m256d __B, const int __R)
+{
+  return (__m256d) __builtin_ia32_maxpd256_mask_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df) __W,
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_max_round_pd (__mmask8 __U, __m256d __A, __m256d __B,
+  const int __R)
+{
+  return (__m256d) __builtin_ia32_maxpd256_mask_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df)
+  _mm256_setzero_pd (),
+  (__mmask8) __U,
+  __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_max_round_ph (__m256h __A, __m256h __B, const int __R)
+{
+  return (__m256h) __builtin_ia32_maxph256_mask_round ((__v16hf) __A,
+  (__v16hf) __B,
+  (__v16hf)
+  _mm256_undefined_ph (),
+  (__mmask16) -1,
+  __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_max_round_ph (__m256h __W, __mmask16 __U, __m256h __A,
+ __m256h __B, const int __R)
+{
+  return (__m256h) __builtin_ia32_maxph256_mask_round ((__v16hf) __A,
+  (__v16hf) __B,
+  (__v16hf) __W,
+  (__mmask16) __U,
+  __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_max_round_ph (__mmask16 __U, __m256h __A, __m256h __B,
+  const int __R)
+{
+  return (__m256h) __builtin_ia32_maxph256_mask_round ((__v16hf) __A,
+  (__v16hf) __B,
+  (__v16hf)
+  _mm256_setzero_ph (),
+  (__mmask16) __U,
+  __R);
+}
+
+exter

[PATCH 01/22] AVX10.2 ymm rounding: Support vadd{s, d, h} and vcmp{s, d, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config.gcc: Add avx10_2roundingintrin.h.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V4DF_FTYPE_V4DF_V4DF_V4DF_UQI_INT, V8SF_FTYPE_V8SF_V8SF_V8SF_UQI_INT,
V16HF_FTYPE_V16HF_V16HF_V16HF_UHI_INT, UQI_FTYPE_V4DF_V4DF_INT_UQI_INT,
UHI_FTYPE_V16HF_V16HF_INT_UHI_INT, UQI_FTYPE_V8SF_V8SF_INT_UQI_INT.
* config/i386/immintrin.h: Include avx10_2roundingintrin.h.
* config/i386/sse.md: Change subst_attr name due to renaming.
* config/i386/subst.md:
(): Add condition check for avx10.2
rounding control 256bit intrins and renamed to ...
(): ...this.
(round_saeonly_mode512bit_condition): Add condition check for
avx10.2 rounding control 256 bit intris and renamed to ...
(round_saeonly_mode_condition): ...this.
* config/i386/avx10_2roundingintrin.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add -mavx10.2 and new builtin test.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/sse-13.c: Add new tests.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: New test.
---
 gcc/config.gcc|   2 +-
 gcc/config/i386/avx10_2roundingintrin.h   | 337 ++
 gcc/config/i386/i386-builtin-types.def|   8 +
 gcc/config/i386/i386-builtin.def  |   8 +
 gcc/config/i386/i386-expand.cc|   6 +
 gcc/config/i386/immintrin.h   |   2 +
 gcc/config/i386/sse.md| 100 +++---
 gcc/config/i386/subst.md  |  32 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |  10 +-
 gcc/testsuite/gcc.target/i386/avx-2.c |   2 +-
 .../gcc.target/i386/avx10_2-rounding-1.c  |  64 
 gcc/testsuite/gcc.target/i386/sse-13.c|   8 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  17 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  17 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   8 +
 15 files changed, 558 insertions(+), 63 deletions(-)
 create mode 100644 gcc/config/i386/avx10_2roundingintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-rounding-1.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a36dd1bcbc6..2c0f4518638 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -452,7 +452,7 @@ i[34567]86-*-* | x86_64-*-*)
   cmpccxaddintrin.h amxfp16intrin.h prfchiintrin.h
   raointintrin.h amxcomplexintrin.h avxvnniint16intrin.h
   sm3intrin.h sha512intrin.h sm4intrin.h
-  usermsrintrin.h"
+  usermsrintrin.h avx10_2roundingintrin.h"
;;
 ia64-*-*)
extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
new file mode 100644
index 000..5698ed05c1d
--- /dev/null
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -0,0 +1,337 @@
+/* Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef _IMMINTRIN_H_INCLUDED
+#error "Never use  directly; include  
instead."
+#endif
+
+#ifndef _AVX10_2ROUNDINGINTRIN_H_INCLUDED
+#define _AVX10_2ROUNDINGINTRIN_H_INCLUDED
+
+#ifndef __AVX10_2_256__
+#pragma GCC push_options
+#pragma GCC target("avx10.2-256")
+#define __DISABLE_AVX10_2_256__
+#endif /* __AVX10_2_256__ */
+
+#ifdef  __OPTIMIZE__
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_add_round_pd (__m256d __A, __m256d __B, const int __R)
+{
+  return (__m256d) __builtin_ia32_addpd256_mask_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df

[PATCH 14/22] AVX10.2 ymm rounding: Support vfm{sub, subadd}{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(_fmsub__mask): Add condition check.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 350 ++
 gcc/config/i386/i386-builtin.def  |  18 +
 gcc/config/i386/sse.md|   2 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |  18 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  62 
 gcc/testsuite/gcc.target/i386/sse-13.c|  18 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  24 ++
 gcc/testsuite/gcc.target/i386/sse-22.c|  24 ++
 gcc/testsuite/gcc.target/i386/sse-23.c|  18 +
 9 files changed, 533 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index 95e42410a10..346a32c1a8a 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -2419,6 +2419,284 @@ _mm256_maskz_fmaddsub_round_ps (__mmask8 __U, __m256 
__A, __m256 __B,
 (__mmask8) __U,
 __R);
 }
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmsub_round_pd (__m256d __A, __m256d __B, __m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfmsubpd256_mask_round ((__v4df) __A,
+ (__v4df) __B,
+ (__v4df) __D,
+ (__mmask8) -1, __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmsub_round_pd (__m256d __A, __mmask8 __U, __m256d __B,
+   __m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfmsubpd256_mask_round ((__v4df) __A,
+ (__v4df) __B,
+ (__v4df) __D,
+ (__mmask8) __U, __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fmsub_round_pd (__m256d __A, __m256d __B, __m256d __D,
+__mmask8 __U, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfmsubpd256_mask3_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df) __D,
+  (__mmask8) __U, __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fmsub_round_pd (__mmask8 __U, __m256d __A, __m256d __B,
+__m256d __D, const int __R)
+{
+  return (__m256d) __builtin_ia32_vfmsubpd256_maskz_round ((__v4df) __A,
+  (__v4df) __B,
+  (__v4df) __D,
+  (__mmask8) __U, __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmsub_round_ph (__m256h __A, __m256h __B, __m256h __D, const int __R)
+{
+  return (__m256h)
+__builtin_ia32_vfmsubph256_mask_round ((__v16hf) __A,
+  (__v16hf) __B,
+  (__v16hf) __D,
+  (__mmask16) -1, __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmsub_round_ph (__m256h __A, __mmask16 __U, __m256h __B,
+   __m256h __D, const int __R)
+{
+  return (__m256h)
+__builtin_ia32_vfmsubph256_mask_round ((__v16hf) __A,
+  (__v16hf) __B,
+  (__v16hf) __D,
+  (__mmask16) __U, __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fmsub_round_ph (__m256h __A, __m256h __B, __m256h __D,
+__mmask16 __U, const int __R)
+{
+  return (__m256h)
+__builtin_ia32_vfmsubph256_mask3_round ((__v16hf) __A,
+

[PATCH 21/22] AVX10.2 ymm rounding: Support vscalefp{s,d,h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/sse.md:
(_scalef): Add condition check.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 182 ++
 gcc/config/i386/i386-builtin.def  |   3 +
 gcc/config/i386/sse.md|   2 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |   3 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  25 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   3 +
 gcc/testsuite/gcc.target/i386/sse-14.c|   9 +
 gcc/testsuite/gcc.target/i386/sse-22.c|   9 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   3 +
 9 files changed, 238 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index d6b8e2695de..f35f2337858 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -3873,6 +3873,119 @@ _mm256_maskz_roundscale_round_ps (__mmask8 __U, __m256 
__A, const int __C,
   (__mmask8) __U,
   __R);
 }
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_scalef_round_pd (__m256d __A, __m256d __B, const int __R)
+{
+  return
+(__m256d) __builtin_ia32_scalefpd256_mask_round ((__v4df) __A,
+(__v4df) __B,
+(__v4df)
+_mm256_undefined_pd (),
+(__mmask8) -1,
+__R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_scalef_round_pd (__m256d __W, __mmask8 __U, __m256d __A,
+__m256d __B, const int __R)
+{
+  return (__m256d) __builtin_ia32_scalefpd256_mask_round ((__v4df) __A,
+ (__v4df) __B,
+ (__v4df) __W,
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_scalef_round_pd (__mmask8 __U, __m256d __A, __m256d __B,
+ const int __R)
+{
+  return (__m256d) __builtin_ia32_scalefpd256_mask_round ((__v4df) __A,
+ (__v4df) __B,
+ (__v4df)
+ _mm256_setzero_pd (),
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_scalef_round_ph (__m256h __A, __m256h __B, const int __R)
+{
+  return
+(__m256h) __builtin_ia32_scalefph256_mask_round ((__v16hf) __A,
+(__v16hf) __B,
+(__v16hf)
+_mm256_undefined_ph (),
+(__mmask16) -1,
+__R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_scalef_round_ph (__m256h __W, __mmask16 __U, __m256h __A,
+__m256h __B, const int __R)
+{
+  return (__m256h) __builtin_ia32_scalefph256_mask_round ((__v16hf) __A,
+ (__v16hf) __B,
+ (__v16hf) __W,
+ (__mmask16) __U,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_scalef_round_ph (__mmask16 __U, __m256h __A, __m256h __B,
+ const int __R)
+{
+  return (__m256h) __builtin_ia32_scalefph256_mask_round ((__v16hf) __A,
+ (__v16hf) __B,
+

[PATCH 20/22] AVX10.2 ymm rounding: Support vreducep{s, d, h} and vrndscalep{s, d, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(reducep):
Add condition check.
(_rndscale): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 367 ++
 gcc/config/i386/i386-builtin.def  |   6 +
 gcc/config/i386/sse.md|   4 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |   6 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  50 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   6 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  18 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  18 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   6 +
 9 files changed, 479 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index ac0914415c9..d6b8e2695de 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -3646,6 +3646,233 @@ _mm256_maskz_range_round_ps (__mmask8 __U, __m256 __A, 
__m256 __B,
(__mmask8) __U,
__R);
 }
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_reduce_round_pd (__m256d __A, const int __C, const int __R)
+{
+  return (__m256d) __builtin_ia32_reducepd256_mask_round ((__v4df) __A,
+ __C,
+ (__v4df)
+ _mm256_setzero_pd (),
+ (__mmask8) -1,
+ __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_reduce_round_pd (__m256d __W, __mmask8 __U, __m256d __A,
+const int __C, const int __R)
+{
+  return (__m256d) __builtin_ia32_reducepd256_mask_round ((__v4df) __A,
+ __C,
+ (__v4df) __W,
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_reduce_round_pd (__mmask8 __U, __m256d __A, const int __C,
+ const int __R)
+{
+  return (__m256d) __builtin_ia32_reducepd256_mask_round ((__v4df) __A,
+ __C,
+ (__v4df)
+ _mm256_setzero_pd (),
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_reduce_round_ph (__m256h __A, const int __C, const int __R)
+{
+  return (__m256h) __builtin_ia32_reduceph256_mask_round ((__v16hf) __A,
+ __C,
+ (__v16hf)
+ _mm256_setzero_ph (),
+ (__mmask16) -1,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_reduce_round_ph (__m256h __W, __mmask16 __U, __m256h __A,
+const int __C, const int __R)
+{
+  return (__m256h) __builtin_ia32_reduceph256_mask_round ((__v16hf) __A,
+ __C,
+ (__v16hf) __W,
+ (__mmask16) __U,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_reduce_round_ph (__mmask16 __U, __m256h __A, const int __C,
+ const int __R)
+{
+  return (__m256h) __builtin_ia32_reduceph256_mask_round ((__v16hf) __A,
+ __C,
+

[PATCH 17/22] AVX10.2 ymm rounding: Support vgetexpp{s, d, h} and vgetmantp{s, d, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SF_FTYPE_V8SF_V8SF_UQI_INT, V4DF_FTYPE_V4DF_V4DF_UQI_INT,
V16HF_FTYPE_V16HF_V16HF_UHI_INT, V16HF_FTYPE_V16HF_INT_V16HF_UHI_INT,
V4DF_FTYPE_V4DF_INT_V4DF_UQI_INT, V8SF_FTYPE_V8SF_INT_V8SF_UQI_INT.
* config/i386/sse.md:
(_getexp):
Add condition check.
(_getmant):
Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 341 ++
 gcc/config/i386/i386-builtin-types.def|   6 +
 gcc/config/i386/i386-builtin.def  |   6 +
 gcc/config/i386/i386-expand.cc|   6 +
 gcc/config/i386/sse.md|   4 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |   6 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  59 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   6 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  18 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  18 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   6 +
 11 files changed, 474 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index afc1220fea4..07729a6cc04 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -3021,6 +3021,217 @@ _mm256_maskz_fnmsub_round_ps (__mmask8 __U, __m256 __A, 
__m256 __B,
   (__mmask8) __U,
   __R);
 }
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_getexp_round_pd (__m256d __A, const int __R)
+{
+  return
+(__m256d) __builtin_ia32_getexppd256_mask_round ((__v4df) __A,
+(__v4df)
+_mm256_undefined_pd (),
+(__mmask8) -1,
+__R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_getexp_round_pd (__m256d __W, __mmask8 __U, __m256d __A,
+const int __R)
+{
+  return (__m256d) __builtin_ia32_getexppd256_mask_round ((__v4df) __A,
+ (__v4df) __W,
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_getexp_round_pd (__mmask8 __U, __m256d __A, const int __R)
+{
+  return (__m256d) __builtin_ia32_getexppd256_mask_round ((__v4df) __A,
+ (__v4df)
+ _mm256_setzero_pd (),
+ (__mmask8) __U,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_getexp_round_ph (__m256h __A, const int __R)
+{
+  return (__m256h) __builtin_ia32_getexpph256_mask_round ((__v16hf) __A,
+ (__v16hf)
+ _mm256_setzero_ph (),
+ (__mmask16) -1,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_getexp_round_ph (__m256h __W, __mmask16 __U, __m256h __A,
+const int __R)
+{
+  return (__m256h) __builtin_ia32_getexpph256_mask_round ((__v16hf) __A,
+ (__v16hf) __W,
+ (__mmask16) __U,
+ __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_getexp_round_ph (__mmask16 __U, __m256h __A, const int __R)
+{
+  return (__m256h) __builtin_ia32_getexpph256_mask_round ((__v16hf) __A,
+ (__v16hf)
+

[PATCH 22/22] AVX10.2 ymm rounding: Support vsqrtp{s, d, h} and vsubp{s, d, h} intrins

2024-08-14 Thread Haochen Jiang

From: "Hu, Lin1" 

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
---
 gcc/config/i386/avx10_2roundingintrin.h   | 339 ++
 gcc/config/i386/i386-builtin.def  |   6 +
 gcc/testsuite/gcc.target/i386/avx-1.c |   6 +
 .../gcc.target/i386/avx10_2-rounding-3.c  |  50 +++
 gcc/testsuite/gcc.target/i386/sse-13.c|   7 +
 gcc/testsuite/gcc.target/i386/sse-14.c|  18 +
 gcc/testsuite/gcc.target/i386/sse-22.c|  15 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   6 +
 8 files changed, 447 insertions(+)

diff --git a/gcc/config/i386/avx10_2roundingintrin.h 
b/gcc/config/i386/avx10_2roundingintrin.h
index f35f2337858..c7146e37ec9 100644
--- a/gcc/config/i386/avx10_2roundingintrin.h
+++ b/gcc/config/i386/avx10_2roundingintrin.h
@@ -3986,6 +3986,216 @@ _mm256_maskz_scalef_round_ps (__mmask8 __U, __m256 __A, 
__m256 __B,
 (__mmask8) __U,
 __R);
 }
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_sqrt_round_pd (__m256d __A, const int __R)
+{
+  return (__m256d) __builtin_ia32_sqrtpd256_mask_round ((__v4df) __A,
+   (__v4df)
+   _mm256_undefined_pd (),
+   (__mmask8) -1,
+   __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_sqrt_round_pd (__m256d __W, __mmask8 __U, __m256d __A,
+  const int __R)
+{
+  return (__m256d) __builtin_ia32_sqrtpd256_mask_round ((__v4df) __A,
+   (__v4df) __W,
+   (__mmask8) __U,
+   __R);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_sqrt_round_pd (__mmask8 __U, __m256d __A, const int __R)
+{
+  return (__m256d) __builtin_ia32_sqrtpd256_mask_round ((__v4df) __A,
+   (__v4df)
+   _mm256_setzero_pd (),
+   (__mmask8) __U,
+   __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_sqrt_round_ph (__m256h __A, const int __R)
+{
+  return (__m256h) __builtin_ia32_sqrtph256_mask_round ((__v16hf) __A,
+   (__v16hf)
+   _mm256_undefined_ph (),
+   (__mmask16) -1,
+   __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_sqrt_round_ph (__m256h __W, __mmask16 __U, __m256h __A,
+  const int __R)
+{
+  return (__m256h) __builtin_ia32_sqrtph256_mask_round ((__v16hf) __A,
+   (__v16hf) __W,
+   (__mmask16) __U,
+   __R);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_sqrt_round_ph (__mmask16 __U, __m256h __A, const int __R)
+{
+  return (__m256h) __builtin_ia32_sqrtph256_mask_round ((__v16hf) __A,
+   (__v16hf)
+   _mm256_setzero_ph (),
+   (__mmask16) __U,
+   __R);
+}
+
+extern __inline __m256
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_sqrt_round_ps (__m256 __A, const int __R)
+{
+  return (__m256) __builtin_ia32_sqrtps256_mask_round ((__v8sf) __A,
+  (__v8sf)
+  _mm256_undefined_ps (),
+  (__mmask8) -1,
+  __R);
+}
+
+extern __inline __m256
+

Re: [PATCH v3 03/12] libgomp: runtime support for target_device selector

2024-08-14 Thread Jakub Jelinek

On Sat, Jul 20, 2024 at 02:42:22PM -0600, Sandra Loosemore wrote:
> This patch implements the libgomp runtime support for the dynamic
> target_device selector via the GOMP_evaluate_target_device function.

For kind, isa and arch traits in the device sets we decide based on
compiler flags and overrides through target attribute etc., not on actual
hw capabilities (and I think we have to, it shouldn't be a dynamic
selection).

Now for kind, isa and arch traits in the target_device set this patch
decides based on compiler flags used to compile some routine in libgomp.so
or libgomp.a.

While this can work in the (very unfortunate) GCN state of things where
only exact isa match is possible (I really hope we can one day generalize
it by being able to compile for a set of isas by supporting lowest
denominator and patching the EM_* in the ELF header or something similar,
perhaps with runtime decisions on what to do for different CPUs), deciding
what to do based on how libgomp.a or libgomp.so.1 has been compiled for the
rest is IMHO wrong.

Now, at least in 5.2 I don't see a restriction that target_device trait
can't be used inside of selectors in a target region.
IMHO that is a bug in the standard.  E.g. it says that
"The expression of a device_num trait must evaluate to a non-negative integer 
value that is
less than or equal to the value returned by omp_get_num_devices."
but it is unspecified what happens when omp_get_num_devices is
called in the target region.
Not really sure if in the patch you actually support say metadirective
with target_device from inside of a target region querying properties of
say the host device or something similar.

If (hopefully) one can only query target_device on the host, then I think
the best would be that at least for the initial device we actually use
the ISAs etc. of whatever function queries that trait, rather than what
compiler flags were used to compile libgomp.so.1.  That would mean
returning from the function something to the caller to say it is actually
a host device and in the emitted code do the matching based on that rather
than on what the function would otherwise match.
That would then mean we don't need to supply special x86 etc. versions (and
whatever other host, powerpc, ..., where we just didn't define enough
details).

For other devices, this is harder because there is no specific offload code
associated with the target_device trait use.  Guess it would be best if
it could be picked from the minimum ISA actually supported in the offloading
code or something similar, by the time this is invoked libgomp should have
the offloading code (if any) already registered (unless it is in some
library dlopened later, that is fuzzy thing), so best would be e.g. for PTX
to watch the minimum required SM level of the code that is being registered,
whether stored in the offload section separately or figured out from the PTX
code being loaded.  But perhaps initially what you do for offloading devices
might be still ok.

> include/ChangeLog
>   * cuda/cuda.h (CUdevice_attribute): Add definitions for
>   CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and
>   CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR.
> 
> libgomp/ChangeLog
>   * Makefile.am (libgomp_la_SOURCES): Add selector.c.
>   * Makefile.in: Regenerate.
>   * config/gcn/selector.c: New.
>   * config/linux/selector.c: New.
>   * config/linux/x86/selector.c: New.
>   * config/nvptx/selector.c: New.
>   * libgomp-plugin.h (GOMP_OFFLOAD_evaluate_device): New.
>   * libgomp.h (struct gomp_device_descr): Add evaluate_device_func field.
>   * libgomp.map (GOMP_5.1.3): New, add GOMP_evaluate_target_device.
>   * libgomp.texi (OpenMP Context Selectors): Document dynamic selector
>   matching of kind/arch/isa.
>   * libgomp_g.h (GOMP_evaluate_current_device): New.
>   (GOMP_evaluate_target_device): New.
>   * oacc-host.c (host_evaluate_device): New.
>   (host_openacc_exec): Initialize evaluate_device_func field to
>   host_evaluate_device.
>   * plugin/plugin-gcn.c (gomp_match_selectors): New.
>   (gomp_match_isa): New.
>   (GOMP_OFFLOAD_evaluate_device): New.
>   * plugin/plugin-nvptx.c (struct ptx_device): Add compute_major and
>   compute_minor fields.
>   (nvptx_open_device): Read compute capability information from device.
>   (gomp_match_selectors): New.
>   (gomp_match_selector): New.
>   (CHECK_ISA): New macro.
>   (GOMP_OFFLOAD_evaluate_device): New.
>   * selector.c: New.
>   * target.c (GOMP_evaluate_target_device): New.
>   (gomp_load_plugin_for_device): Load evaluate_device plugin function.
> 
> Co-Authored-By: Kwok Cheung Yeung 
> Co-Authored-By: Sandra Loosemore 

> --- /dev/null
> +++ b/libgomp/config/gcn/selector.c
> @@ -0,0 +1,102 @@
> +/* Copyright (C) 2022 Free Software Foundation, Inc.

2022-2024

> +
> +/* The selectors are passed as strings, but are actually sets of multiple
> +   trait

Re: [Fortran, Patch, PR110033, v1] Fix associate for coarrays

2024-08-14 Thread Paul Richard Thomas

Hi Andre,

>From a very rapid scan(in the style of somebody on vacation :-) ) of the
two patches, it all looks good to me. Adding the corank structure to
gfc_expr is long overdue. Thanks also for rolling select type into the
second patch. It would be good if you would check if PRs 46371 and 56496
are fixed by the patch.

Regards

Paul


On Mon, 12 Aug 2024 at 13:11, Andre Vehreschild  wrote:

> Hi all,
>
> the attached two patches fix ASSOCIATE for coarrays, i.e. that a coarray
> associated to a variable is also a coarray in the block of the ASSOCIATE
> command. The patch has two parts:
>
> 1. pr110033p1_1.patch: Adds a corank member to the gfc_expr structure. I
> decided to add it here and keep track of the corank of an expression,
> because
> calling gfc_get_corank was getting to expensive with the associate patch.
> This
> patch also improves the usage of coarrays in select type/rank constructs.
>
> 2. pr110033p2_1.patch: The changes and testcase for PR 110033. In essence
> the
> coarray is not detected correctly on the expression to associate to and
> therefore not propagated correctly into the block of the ASSOCIATE
> command. The
> patch adds correct treatment for propagating the coarray token into the
> block,
> too.
>
> The costs of tracking the corank along side to the rank of an expression
> are
> about 30 seconds real user time (i.e. time's "real" row) on a rather old
> Intel
> i7-5775C@3.3GHz  with 24G RAM that was used for work during the test. If
> need be
> I can tuned that more.
>
> Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
>
> Regards,
> Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>

Re: [PATCH v3 03/12] libgomp: runtime support for target_device selector

2024-08-14 Thread Jakub Jelinek

On Wed, Aug 14, 2024 at 12:25:23PM +0200, Jakub Jelinek wrote:
> Now, at least in 5.2 I don't see a restriction that target_device trait
> can't be used inside of selectors in a target region.
> IMHO that is a bug in the standard.  E.g. it says that
> "The expression of a device_num trait must evaluate to a non-negative integer 
> value that is
> less than or equal to the value returned by omp_get_num_devices."
> but it is unspecified what happens when omp_get_num_devices is
> called in the target region.
> Not really sure if in the patch you actually support say metadirective
> with target_device from inside of a target region querying properties of
> say the host device or something similar.

I've filed https://github.com/OpenMP/spec/issues/4133 for this (only
accessible to OpenMP language committee members unfortunately).

Jakub

RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Ballman, Aaron

Sorry for top-posting, my work account is stuck on Outlook. :-/

> For a WG14 paper you should add these findings to support that choice.
> Another option would be for WG14 to standardize the then existing 
> implementation with the double underscores.

+1, it's always good to explain prior art and existing uses as part of the 
paper. However, please also point out that C++ has a prior art as well which is 
slightly different and very much worth considering: they have one API for 
getting the array's rank, and another for getting a specific rank's extent. 
This is a general solution that doesn't require the programmer to have deep 
knowledge of C's declarator syntax and how it relates to multidimensional 
arrays.

That said, I suspect WG14 would not be keen on standardizing `lengthof` without 
an ugly keyword given that there are plenty of other uses of it that would 
break: 

https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
(and many, many others)

>> > As for the parentheses, I personally think lengthof should follow 
>> > similar rules compared to sizeof.
>> 
>> I think most people agree with this.
>
> I still don't, in particular not for standardisation.
> 
> We have to remember that there are many small C compilers out there.

Those compilers already have to handle parsing this for sizeof, so that's not 
particularly compelling (even if we wanted to design C for the lowest common 
denominator of implementation effort, which I'm not convinced is a good 
approach these days). That said, if we went with a rank/extent design, I think 
we'd *have* to use parens because the extent interface would take two operands 
(the array and the rank you're interested in getting the extent of) and it 
would be inconsistent for the rank interface to then not require parens.

~Aaron

-Original Message-
From: Jens Gustedt  
Sent: Wednesday, August 14, 2024 2:11 AM
To: Alejandro Colomar ; Xavier Del Campo Romero 

Cc: Gcc Patches ; Daniel Plakosh ; 
Martin Uecker ; Joseph Myers ; Gabriel 
Ravier ; Jakub Jelinek ; Kees Cook 
; Qing Zhao ; David Brown 
; Florian Weimer ; Andreas Schwab 
; Timm Baeder ; A. Jiang 
; Eugene Zelenko ; Ballman, Aaron 

Subject: Re: v2.1 Draft for a lengthof paper

Am 14. August 2024 01:27:33 MESZ schrieb Alejandro Colomar :
> Hi Xavier,
> 
> On Wed, Aug 14, 2024 at 12:38:53AM GMT, Xavier Del Campo Romero wrote:
> > I have been overseeing these last emails -
> 
> Ahhh, good to know; thanks!  :)
> 
> > thank you very much for your
> > efforts, Alex!
> 
> :-)
> 
> > I did not reply until now because I do not have prior experience 
> > with gcc internals, so my feedback would probably have not been that 
> > useful.
> 
> Ok.
> 
> > Those emails from 2020 were in fact discussing two completely 
> > different proposals at once:
> > 
> > 1. Add _Lengthof + #include  2. Allow static 
> > qualifier on compound literals
> 
> Yup.
> 
> > Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), and 
> > as you already know by now, proposal #1 received some negative 
> > feedback, suggesting _Typeof/typeof + some macro magic as a 
> > pragmatic workaround instead.
> 
> The original author of that negative feedback talked to me in private 
> a week ago, and said he likes my proposal.  We have no negative 
> feedback anymore.  :)
> 
> > Since the proposal did not get much traction and I would had been 
> > unable to contribute to gcc myself, I just gave up on it. IIRC the 
> > deadline for new proposals closed soon after, anyway.
> 
> Ok.
> 
> > But I am glad that someone with proper experience took the initiative.
> 
> Fun fact: this is my second non-trivial patch to GCC.  I wouldn't say 
> I had the proper experience with GCC internals when I started this 
> patch set.  But I'm unemployed at the moment, which gives me all the 
> time I need for learning those.  :)
> 
> > I still think the proposal is relevant and has interesting use cases.
> > 
> > > I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> > > Depending on feedback, I'll propose the uglified version.
> > 
> > Probably, all of us know why the uglified version is the usual 
> > approach preferred by the C standard: we do not know how many 
> > applications would break otherwise.
> 
> Yup.
> 
> > However, we see that this trend is now changing with C23, so 
> > probably it makes sense to define lengthof directly.
> 
> Yeah, since Jens is in WG14 and he suggested to follow this trend, 
> maybe we can.  If not, it's trivial to change the proposal to use the 
> uglified name plus a macro.
> 
> Checking , I see that while several 
> projects have a lengthof() macro, all of them use it with semantics 
> compatible with this keyword, so it shouldn't

Re: [PATCH, gfortran] libgfortran: implement fpu-macppc for Darwin, support IEEE arithmetic

2024-08-14 Thread FX Coudert

> Thank you for responding.
> I have added a changelog (is this a correct way?).

Content seems ok, lines are maybe too long. Check with 
contrib/gcc-changelog/git_check_commit.py before pushing.
Once that is fine, OK to push.

FX

RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Jens Gustedt

Hi Aaron,

Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" 
:
> Sorry for top-posting, my work account is stuck on Outlook. :-/
> 
> > For a WG14 paper you should add these findings to support that choice.
> > Another option would be for WG14 to standardize the then existing 
> > implementation with the double underscores.
> 
> +1, it's always good to explain prior art and existing uses as part of the 
> paper. However, please also point out that C++ has a prior art as well which 
> is slightly different and very much worth considering: they have one API for 
> getting the array's rank, and another for getting a specific rank's extent. 
> This is a general solution that doesn't require the programmer to have deep 
> knowledge of C's declarator syntax and how it relates to multidimensional 
> arrays.
> 
> That said, I suspect WG14 would not be keen on standardizing `lengthof` 
> without an ugly keyword given that there are plenty of other uses of it that 
> would break: 
> 
> https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
> https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
> https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
> (and many, many others)
> 
> >> > As for the parentheses, I personally think lengthof should follow 
> >> > similar rules compared to sizeof.
> >> 
> >> I think most people agree with this.
> >
> > I still don't, in particular not for standardisation.
> > 
> > We have to remember that there are many small C compilers out there.
> 
> Those compilers already have to handle parsing this for sizeof, so that's not 
> particularly compelling (even if we wanted to design C for the lowest common 
> denominator of implementation effort, which I'm not convinced is a good 
> approach these days). That said, if we went with a rank/extent design, I 
> think we'd *have* to use parens because the extent interface would take two 
> operands (the array and the rank you're interested in getting the extent of) 
> and it would be inconsistent for the rank interface to then not require 
> parens.

I think that this argument goes too short. E. g. implementation that already 
have
compound expressions (or lambdas ;-) may provide a quality implementation using 
`static_assert` and `typeof` alone, and don't have to touch their compiler at 
all.

We should not impose an implementation in the language where doing it in a 
header can be completely sufficient.

Plus, implementing as a macro in a header (probably ) makes also a 
feature test, for those applications that already have something similar. 
this was basically what we did for `unreachable` and I think it worked out fine.

Jens

> ~Aaron
> 
> -Original Message-
> From: Jens Gustedt  
> Sent: Wednesday, August 14, 2024 2:11 AM
> To: Alejandro Colomar ; Xavier Del Campo Romero 
> 
> Cc: Gcc Patches ; Daniel Plakosh 
> ; Martin Uecker ; Joseph Myers 
> ; Gabriel Ravier ; Jakub Jelinek 
> ; Kees Cook ; Qing Zhao 
> ; David Brown ; Florian 
> Weimer ; Andreas Schwab ; Timm 
> Baeder ; A. Jiang ; Eugene Zelenko 
> ; Ballman, Aaron 
> Subject: Re: v2.1 Draft for a lengthof paper
> 
> Am 14. August 2024 01:27:33 MESZ schrieb Alejandro Colomar :
> > Hi Xavier,
> > 
> > On Wed, Aug 14, 2024 at 12:38:53AM GMT, Xavier Del Campo Romero wrote:
> > > I have been overseeing these last emails -
> > 
> > Ahhh, good to know; thanks!  :)
> > 
> > > thank you very much for your
> > > efforts, Alex!
> > 
> > :-)
> > 
> > > I did not reply until now because I do not have prior experience 
> > > with gcc internals, so my feedback would probably have not been that 
> > > useful.
> > 
> > Ok.
> > 
> > > Those emails from 2020 were in fact discussing two completely 
> > > different proposals at once:
> > > 
> > > 1. Add _Lengthof + #include  2. Allow static 
> > > qualifier on compound literals
> > 
> > Yup.
> > 
> > > Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), and 
> > > as you already know by now, proposal #1 received some negative 
> > > feedback, suggesting _Typeof/typeof + some macro magic as a 
> > > pragmatic workaround instead.
> > 
> > The original author of that negative feedback talked to me in private 
> > a week ago, and said he likes my proposal.  We have no negative 
> > feedback anymore.  :)
> > 
> > > Since the proposal did not get much traction and I would had been 
> > > unable to contribute to gcc myself, I just gave up on it. IIRC the 
> > > deadline for new proposals closed soon after, anyway.
> > 
> > Ok.
> > 
> > > But I am glad that someone with proper experience took the initiative.
> > 
> > Fun fact: this is my second non-trivial patch to GCC.  I wouldn't say 
> > I had the proper experience with GCC internals when I started this 
> > patch set.  But I'm unemployed at the moment, which gives me all the 
> > time I need for learning those.  :)
> > 
> > > I still think the pro

RE: [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs

2024-08-14 Thread Tamar Christina

Hi Victor,

> -Original Message-
> From: Victor Do Nascimento 
> Sent: Tuesday, August 13, 2024 1:42 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar Christina ; claz...@gmail.com;
> hongtao@intel.com; s...@gcc.gnu.org; bernds_...@t-online.de;
> al...@redhat.com; Victor Do Nascimento 
> Subject: [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs
> 
> Given the shift from modeling dot products as direct optabs to
> treating them as conversion optabs, we make necessary changes to the
> autovectorizer code to ensure that given the relevant tree code,
> together with the input and output data modes, we can retrieve the
> relevant optab and subsequently the insn_code for it.
> 
> gcc/ChangeLog:
> 
>   * gimple-match-exports.cc (directly_supported_p): Add overload
>   for conversion-type optabs.
>   * gimple-match.h (directly_supported_p): Add new function
>   prototype.
>   * optabs.cc (expand_widen_pattern_expr): Make the
>   DOT_PROD_EXPR tree code use `find_widening_optab_handler' to
>   retrieve icode.
>   * tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): make it
>   call conversion-type overloaded `directly_supported_p'.
>   * tree-vect-patterns.cc (vect_supportable_conv_optab_p): New.
>   (vect_recog_dot_prod_pattern): s/direct/conv/ in call to
>   `vect_supportable_direct_optab_p'.
> ---
>  gcc/gimple-match-exports.cc | 23 
>  gcc/gimple-match.h  |  2 ++
>  gcc/optabs.cc   |  3 ++-
>  gcc/tree-vect-loop.cc   |  1 +
>  gcc/tree-vect-patterns.cc   | 43 +++--
>  5 files changed, 69 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index aacf3ff0414..d18497e7c83 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -1381,6 +1381,29 @@ directly_supported_p (code_helper code, tree type,
> optab_subtype query_type)
> && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED));
>  }
> 
> +/* As above, overloading the function for conversion-type optabs.  */
> +bool
> +directly_supported_p (code_helper code, tree type_out, tree type_in,
> +   optab_subtype query_type)
> +{
> +  if (code.is_tree_code ())
> +{
> +  convert_optab optab = optab_for_tree_code (tree_code (code), type_in,
> + query_type);
> +  return (optab != unknown_optab
> +   && convert_optab_handler (optab, TYPE_MODE (type_out),
> + TYPE_MODE (type_in)) !=
> CODE_FOR_nothing);
> +}
> +  gcc_assert (query_type == optab_default
> +   || (query_type == optab_vector && VECTOR_TYPE_P (type_in))
> +   || (query_type == optab_scalar && !VECTOR_TYPE_P (type_in)));
> +  internal_fn ifn = associated_internal_fn (combined_fn (code), type_in);
> +  return (direct_internal_fn_p (ifn)
> +   && direct_internal_fn_supported_p (ifn, tree_pair (type_out, type_in),
> +  OPTIMIZE_FOR_SPEED));
> +}
> +
> +
>  /* A wrapper around the internal-fn.cc versions of 
> get_conditional_internal_fn
> for a code_helper CODE operating on type TYPE.  */
> 
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index d710fcbace2..0333a5db00a 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -419,6 +419,8 @@ code_helper canonicalize_code (code_helper, tree);
> 
>  #ifdef GCC_OPTABS_TREE_H
>  bool directly_supported_p (code_helper, tree, optab_subtype = optab_default);
> +bool directly_supported_p (code_helper, tree, tree,
> +optab_subtype = optab_default);
>  #endif
> 
>  internal_fn get_conditional_internal_fn (code_helper, tree);
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 185c5b1a705..32737fb80e8 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -317,7 +317,8 @@ expand_widen_pattern_expr (const_sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>  widen_pattern_optab
>= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
>if (ops->code == WIDEN_MULT_PLUS_EXPR
> -  || ops->code == WIDEN_MULT_MINUS_EXPR)
> +  || ops->code == WIDEN_MULT_MINUS_EXPR
> +  || ops->code == DOT_PROD_EXPR)
>  icode = find_widening_optab_handler (widen_pattern_optab,
>TYPE_MODE (TREE_TYPE (ops->op2)),
>tmode0);
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 6456220cdc9..5f3de7b72a8 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5289,6 +5289,7 @@ vect_is_emulated_mixed_dot_prod (stmt_vec_info
> stmt_info)
> 
>gcc_assert (STMT_VINFO_REDUC_VECTYPE_IN (stmt_info));
>return !directly_supported_p (DOT_PROD_EXPR,
> + STMT_VINFO_VECTYPE (stmt_info),
>   STMT_VINFO_REDUC_VECTYPE_IN (stmt_info),
>

RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Ballman, Aaron

> I think that this argument goes too short. E. g. implementation that already 
> have compound expressions (or lambdas ;-) may provide a > quality 
> implementation using `static_assert` and `typeof` alone, and don't have to 
> touch their compiler at all.
>
> We should not impose an implementation in the language where doing it in a 
> header can be completely sufficient.

But can doing this in a header be completely sufficient in practice? e.g., the 
user who passes a pointer rather than an array is in for quite a surprise, or 
passing a struct, or passing a FAM, etc. If we want to put constraints on the 
interface, that may be more challenging to do from a header file than from the 
compiler. offsetof is a cautionary tale in that compilers that want a 
reasonable QoI basically all implement this as a builtin rather than the 
header-only version.

> Plus, implementing as a macro in a header (probably ) makes also a 
> feature test, for those applications that already have something similar. 
> this was basically what we did for `unreachable` and I think it worked out 
> fine.

True!

I'm still thinking on how important rank + extent is vs overall array length. 
If C had constexpr functions, then I'd almost certainly want array rank and 
extent to be the building blocks and then lengthof can be a constexpr function 
looping over rank and summing extents. But we don't have that yet, and "bird 
hand" vs "bird in bush"... :-D

~Aaron

-Original Message-
From: Jens Gustedt  
Sent: Wednesday, August 14, 2024 8:18 AM
To: Ballman, Aaron ; Alejandro Colomar 
; Xavier Del Campo Romero 
Cc: Gcc Patches ; Daniel Plakosh ; 
Martin Uecker ; Joseph Myers ; Gabriel 
Ravier ; Jakub Jelinek ; Kees Cook 
; Qing Zhao ; David Brown 
; Florian Weimer ; Andreas Schwab 
; Timm Baeder ; A. Jiang 
; Eugene Zelenko 
Subject: RE: v2.1 Draft for a lengthof paper

Hi Aaron,

Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" 
:
> Sorry for top-posting, my work account is stuck on Outlook. :-/
> 
> > For a WG14 paper you should add these findings to support that choice.
> > Another option would be for WG14 to standardize the then existing 
> > implementation with the double underscores.
> 
> +1, it's always good to explain prior art and existing uses as part of the 
> paper. However, please also point out that C++ has a prior art as well which 
> is slightly different and very much worth considering: they have one API for 
> getting the array's rank, and another for getting a specific rank's extent. 
> This is a general solution that doesn't require the programmer to have deep 
> knowledge of C's declarator syntax and how it relates to multidimensional 
> arrays.
> 
> That said, I suspect WG14 would not be keen on standardizing `lengthof` 
> without an ugly keyword given that there are plenty of other uses of it that 
> would break: 
> 
> https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src
> /cmd/mailx/names.c?L53-55
> https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_f
> w.c?L292-294
> https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/bl
> ob/src/spur64.stack/validImage.c?L7014-7018
> (and many, many others)
> 
> >> > As for the parentheses, I personally think lengthof should follow 
> >> > similar rules compared to sizeof.
> >> 
> >> I think most people agree with this.
> >
> > I still don't, in particular not for standardisation.
> > 
> > We have to remember that there are many small C compilers out there.
> 
> Those compilers already have to handle parsing this for sizeof, so that's not 
> particularly compelling (even if we wanted to design C for the lowest common 
> denominator of implementation effort, which I'm not convinced is a good 
> approach these days). That said, if we went with a rank/extent design, I 
> think we'd *have* to use parens because the extent interface would take two 
> operands (the array and the rank you're interested in getting the extent of) 
> and it would be inconsistent for the rank interface to then not require 
> parens.

I think that this argument goes too short. E. g. implementation that already 
have compound expressions (or lambdas ;-) may provide a quality implementation 
using `static_assert` and `typeof` alone, and don't have to touch their 
compiler at all.

We should not impose an implementation in the language where doing it in a 
header can be completely sufficient.

Plus, implementing as a macro in a header (probably ) makes also a 
feature test, for those applications that already have something similar. 
this was basically what we did for `unreachable` and I think it worked out fine.

Jens

> ~Aaron
> 
> -Original Message-
> From: Jens Gustedt 
> Sent: Wednesday, August 14, 2024 2:11 AM
> To: Alejandro Colomar ; Xavier Del Campo Romero 
> 
> Cc: Gcc Patches ; Daniel Plakosh 
> ; Martin Uecker ; Joseph Myers 
> ; Gabriel Ravier ; Jakub 
> Jelinek ; Kees Cook ; Qing 
> Zhao ; David Brown ; 
> Florian Weimer

[PATCH] ltmain.sh: allow more flags at link-time

2024-08-14 Thread Sam James

libtool defaults to filtering flags passed at link-time.

This brings the filtering in GCC's 'fork' of libtool into sync with
upstream libtool commit 22a7e547e9857fc94fe5bc7c921d9a4b49c09f8e.

In particular, this now allows some harmless diagnostic flags (especially
useful for things like -Werror=odr), more optimization flags, and some
Clang-specific options.

GCC's -flto documentation mentions:
> To use the link-time optimizer, -flto and optimization options should be
> specified at compile time and during the final link. It is recommended
> that you compile all the files participating in the same link with the
> same options and also specify those options at link time.

This allows compliance with that.

* ltmain.sh (func_mode_link): Allow various flags through filter.
---
We have been using this for a while now downstream.

H.J., please take a look.

I think this also explains 
https://src.fedoraproject.org/rpms/binutils/blob/rawhide/f/binutils.spec#_947.

 ltmain.sh | 46 ++
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/ltmain.sh b/ltmain.sh
index 493e83c36f14..79cd7c57f42e 100644
--- a/ltmain.sh
+++ b/ltmain.sh
@@ -4966,19 +4966,41 @@ func_mode_link ()
arg="$func_quote_for_eval_result"
;;
 
-  # -64, -mips[0-9] enable 64-bit mode on the SGI compiler
-  # -r[0-9][0-9]* specifies the processor on the SGI compiler
-  # -xarch=*, -xtarget=* enable 64-bit mode on the Sun compiler
-  # +DA*, +DD* enable 64-bit mode on the HP compiler
-  # -q* pass through compiler args for the IBM compiler
-  # -m*, -t[45]*, -txscale* pass through architecture-specific
-  # compiler args for GCC
-  # -F/path gives path to uninstalled frameworks, gcc on darwin
-  # -p, -pg, --coverage, -fprofile-* pass through profiling flag for GCC
-  # @file GCC response files
-  # -tp=* Portland pgcc target processor selection
+  # Flags to be passed through unchanged, with rationale:
+  # -64, -mips[0-9]  enable 64-bit mode for the SGI compiler
+  # -r[0-9][0-9]*specify processor for the SGI compiler
+  # -xarch=*, -xtarget=* enable 64-bit mode for the Sun compiler
+  # +DA*, +DD*   enable 64-bit mode for the HP compiler
+  # -q*  compiler args for the IBM compiler
+  # -m*, -t[45]*, -txscale* architecture-specific flags for GCC
+  # -F/path  path to uninstalled frameworks, gcc on darwin
+  # -p, -pg, --coverage, -fprofile-*  profiling flags for GCC
+  # -fstack-protector*   stack protector flags for GCC
+  # @fileGCC response files
+  # -tp=*Portland pgcc target processor selection
+  # -O*, -g*, -flto*, -fwhopr*, -fuse-linker-plugin GCC link-time 
optimization
+  # -specs=* GCC specs files
+  # -stdlib=*select c++ std lib with clang
+  # -fdiagnostics-color* simply affects output
+  # -frecord-gcc-switches used to verify flags were respected
+  # -fsanitize=* Clang/GCC memory and address sanitizer
+  # -fno-sanitize*   Clang/GCC memory and address sanitizer
+  # -shared-libsan   Link with shared sanitizer runtimes (Clang)
+  # -static-libsan   Link with static sanitizer runtimes (Clang)
+  # -fuse-ld=*   Linker select flags for GCC
+  # -rtlib=* select c runtime lib with clang
+  # --unwindlib=*select unwinder library with clang
+  # -f{file|debug|macro|profile}-prefix-map=* needed for lto linking
+  # -Wa,*Pass flags directly to the assembler
+  # -Werror, -Werror=*   Report (specified) warnings as errors
   -64|-mips[0-9]|-r[0-9][0-9]*|-xarch=*|-xtarget=*|+DA*|+DD*|-q*|-m*| \
-  -t[45]*|-txscale*|-p|-pg|--coverage|-fprofile-*|-F*|@*|-tp=*)
+  -t[45]*|-txscale*|-p|-pg|--coverage|-fprofile-*|-F*|@*|-tp=*| \
+  -O*|-g*|-flto*|-fwhopr*|-fuse-linker-plugin|-fstack-protector*| \
+  -stdlib=*|-rtlib=*|--unwindlib=*| \
+  -specs=*|-fsanitize=*|-fno-sanitize*|-shared-libsan|-static-libsan| \
+  
-ffile-prefix-map=*|-fdebug-prefix-map=*|-fmacro-prefix-map=*|-fprofile-prefix-map=*|
 \
+  -fdiagnostics-color*|-frecord-gcc-switches| \
+  -fuse-ld=*|-Wa,*|-Werror|-Werror=*)
 func_quote_for_eval "$arg"
arg="$func_quote_for_eval_result"
 func_append compile_command " $arg"
-- 
2.45.2

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Alejandro Colomar

Hi Aaron, Jens,

On Wed, Aug 14, 2024 at 02:17:52PM GMT, Jens Gustedt wrote:
> Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" 
> :
> > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > 
> > > For a WG14 paper you should add these findings to support that choice.
> > > Another option would be for WG14 to standardize the then existing 
> > > implementation with the double underscores.
> > 
> > +1, it's always good to explain prior art and existing uses as part
> > of the paper. However, please also point out that C++ has a prior
> > art as well which is slightly different and very much worth
> > considering: they have one API for getting the array's rank,
> > and another for getting a specific rank's extent. This is a general
> > solution that doesn't require the programmer to have deep knowledge
> > of C's declarator syntax and how it relates to multidimensional
> > arrays.

I have added that to my draft.  I'll publish it soon as a reply to the
GCC mailing list.  See below for details of what I have added for now.

> > 
> > That said, I suspect WG14 would not be keen on standardizing
> > `lengthof` without an ugly keyword given that there are plenty of other 
> > uses of it that would break: 
> > 
> > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
> > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
> > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
> > (and many, many others)

What regex did you use for searching?

I was thinking of renaming the proposal to elementsof(), to avoid
confusion between length of an array and length of a string.  Would you
mind checking if elementsof() is ok?

> > >> > As for the parentheses, I personally think lengthof should follow 
> > >> > similar rules compared to sizeof.
> > >> 
> > >> I think most people agree with this.
> > >
> > > I still don't, in particular not for standardisation.
> > > 
> > > We have to remember that there are many small C compilers out there.
> > 
> > Those compilers already have to handle parsing this for sizeof, so
> > that's not particularly compelling

Agree.  I suspect it will be simpler for existing compilers to follow
sizeof than to have new syntax.  However, it's easy to keep it as a QoI
detail, so I've temporarily changed the wording to require parentheses,
and let implementations lift that restriction.

> > (even if we wanted to design C
> > for the lowest common denominator of implementation effort, which
> > I'm not convinced is a good approach these days).

Off-topic, but I wish that had been the approach when a few
implementations (I suspect proprietary vendors; this was never
disclosed) rejected redefining NULL as the right thing: (void *) 0.

I fixed one of the last free-software implementations of NULL that
expanded to 0, and nullptr would probably never have been added if WG14
had not accepted the pressure from such horrible implementations.

> > That said, if we went with a rank/extent design, I think we'd *have*
> > to use parens because the extent interface would take two operands
> > (the array and the rank you're interested in getting the extent of)
> > and it would be inconsistent for the rank interface to then not
> > require parens.

   Prior art
 C
It is common in C programs to get the number of elements of
an array via the usual sizeof division and  wrap  it  in  a
macro.  Common names include:

•  ARRAY_SIZE()
•  NELEM()
•  NELEMS()
•  NITEMS()
•  NELTS()
•  elementsof()
•  lengthof()

 C++
In  C++,  there  are several standard features to determine
the number of elements of an array:

std::size()   (since C++17)
std::ssize()  (since C++20)
   The syntax of these is  identical  to  the  usual  C
   macros named above.

   It’s  a  bit different, since it’s a general purpose
   sizing template, which works on non‐array types too,
   with different semantics.

   But when applied to an array, it has the same seman‐
   tics as the macros above.

std::extent  (since C++23)
   The syntax of this is quite different.   It  uses  a
   numeric index as a second parameter to determine the
   dimension  in which the number of elements should be
   counted.

   C arrays are much simpler than C++’s many array‐like
   types, and I don’t see a reason why  we  would  need
   something  as  complex  as  std::extent  in C.  Cer‐
   tainly, existing projects have not developed such  a

Re: [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs

2024-08-14 Thread Victor Do Nascimento


On 8/14/24 13:24, Tamar Christina wrote:

Hi Victor,


-Original Message-
From: Victor Do Nascimento 
Sent: Tuesday, August 13, 2024 1:42 PM
To: gcc-patches@gcc.gnu.org
Cc: Tamar Christina ; claz...@gmail.com;
hongtao@intel.com; s...@gcc.gnu.org; bernds_...@t-online.de;
al...@redhat.com; Victor Do Nascimento 
Subject: [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs

Given the shift from modeling dot products as direct optabs to
treating them as conversion optabs, we make necessary changes to the
autovectorizer code to ensure that given the relevant tree code,
together with the input and output data modes, we can retrieve the
relevant optab and subsequently the insn_code for it.

gcc/ChangeLog:

* gimple-match-exports.cc (directly_supported_p): Add overload
for conversion-type optabs.
* gimple-match.h (directly_supported_p): Add new function
prototype.
* optabs.cc (expand_widen_pattern_expr): Make the
DOT_PROD_EXPR tree code use `find_widening_optab_handler' to
retrieve icode.
* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): make it
call conversion-type overloaded `directly_supported_p'.
* tree-vect-patterns.cc (vect_supportable_conv_optab_p): New.
(vect_recog_dot_prod_pattern): s/direct/conv/ in call to
`vect_supportable_direct_optab_p'.
---
  gcc/gimple-match-exports.cc | 23 
  gcc/gimple-match.h  |  2 ++
  gcc/optabs.cc   |  3 ++-
  gcc/tree-vect-loop.cc   |  1 +
  gcc/tree-vect-patterns.cc   | 43 +++--
  5 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index aacf3ff0414..d18497e7c83 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -1381,6 +1381,29 @@ directly_supported_p (code_helper code, tree type,
optab_subtype query_type)
  && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED));
  }

+/* As above, overloading the function for conversion-type optabs.  */
+bool
+directly_supported_p (code_helper code, tree type_out, tree type_in,
+ optab_subtype query_type)
+{
+  if (code.is_tree_code ())
+{
+  convert_optab optab = optab_for_tree_code (tree_code (code), type_in,
+   query_type);
+  return (optab != unknown_optab
+ && convert_optab_handler (optab, TYPE_MODE (type_out),
+   TYPE_MODE (type_in)) !=
CODE_FOR_nothing);
+}
+  gcc_assert (query_type == optab_default
+ || (query_type == optab_vector && VECTOR_TYPE_P (type_in))
+ || (query_type == optab_scalar && !VECTOR_TYPE_P (type_in)));
+  internal_fn ifn = associated_internal_fn (combined_fn (code), type_in);
+  return (direct_internal_fn_p (ifn)
+ && direct_internal_fn_supported_p (ifn, tree_pair (type_out, type_in),
+OPTIMIZE_FOR_SPEED));
+}
+
+
  /* A wrapper around the internal-fn.cc versions of get_conditional_internal_fn
 for a code_helper CODE operating on type TYPE.  */

diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index d710fcbace2..0333a5db00a 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -419,6 +419,8 @@ code_helper canonicalize_code (code_helper, tree);

  #ifdef GCC_OPTABS_TREE_H
  bool directly_supported_p (code_helper, tree, optab_subtype = optab_default);
+bool directly_supported_p (code_helper, tree, tree,
+  optab_subtype = optab_default);
  #endif

  internal_fn get_conditional_internal_fn (code_helper, tree);
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 185c5b1a705..32737fb80e8 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -317,7 +317,8 @@ expand_widen_pattern_expr (const_sepops ops, rtx op0,
rtx op1, rtx wide_op,
  widen_pattern_optab
= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
if (ops->code == WIDEN_MULT_PLUS_EXPR
-  || ops->code == WIDEN_MULT_MINUS_EXPR)
+  || ops->code == WIDEN_MULT_MINUS_EXPR
+  || ops->code == DOT_PROD_EXPR)
  icode = find_widening_optab_handler (widen_pattern_optab,
 TYPE_MODE (TREE_TYPE (ops->op2)),
 tmode0);
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 6456220cdc9..5f3de7b72a8 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5289,6 +5289,7 @@ vect_is_emulated_mixed_dot_prod (stmt_vec_info
stmt_info)

gcc_assert (STMT_VINFO_REDUC_VECTYPE_IN (stmt_info));
return !directly_supported_p (DOT_PROD_EXPR,
+   STMT_VINFO_VECTYPE (stmt_info),
STMT_VINFO_REDUC_VECTYPE_IN (stmt_info),
optab_vector_mixed_sign);
  }
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-v

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Alejandro Colomar

Hi Aaron,

On Wed, Aug 14, 2024 at 12:40:41PM GMT, Ballman, Aaron wrote:
> > We should not impose an implementation in the language where doing
> > it in a header can be completely sufficient.
> 
> But can doing this in a header be completely sufficient in practice?
> e.g., the user who passes a pointer rather than an array is in for
> quite a surprise, or passing a struct, or passing a FAM, etc. If we
> want to put constraints on the interface, that may be more challenging
> to do from a header file than from the compiler.

I've provided a C23-portable and safe implementation of lengthof() as a
macro:

   Portability
 Prior  to C23 it was impossible to do this portably, but since C23
 it is possible to portably write a macro that determines the  num‐
 ber  of  elements  of an array, that is, the number of elements in
 the array.

#define must_be(e)  \
(   \
0 * (int) sizeof(   \
struct {\
static_assert(e);   \
int ISO_C_forbids_a_struct_with_no_members; \
}   \
)   \
)
#define is_array(a) \
(   \
_Generic(&(a),  \
typeof((a)[0]) **:  0,  \
default:1   \
)   \
)
#define sizeof_array(a)  (sizeof(a) + must_be(is_array(a)))
#define nitems(a)(sizeof_array(a) / sizeof((a)[0]))

 While diagnostics could be better, with good  helper‐macro  names,
 they are decent.

The issues with this implementation are also listed in the paper.
Here's a TL;DR:

-  It doesn't accept type names.

-  In results unnecessarily in run-time values where a keyword could
   result in an integer constant expression:

int  a[7][n];
int  (*p)[7][n];

p = &a;
nitems(*p++);

-  Double evaluation: not only the macro evaluates in more cases than a
   keyword, it evaluates twice (due to the two sizeof calls).

-  Less diagnostics.  Since there are less constant expressions, there
   are less opportunities to catch UB.

So far, we've lived with all of those issues (plus the lack of
portability, since this could only be implemented via compiler
extensions until C23).

But ideally, I'd like to avoid the wording juggling that would be
required to allow such an implementation.  Here's an example of the
difference in wording that would be required:

 The elementsof operator yields the number of elements
 of its operand.
 The number of elements is determined from the type of the operand.
 The result is an integer.
 If the number of elements of the array type is variable,
 the operand is evaluated;
+otherwise,
+if the operand is a variable-length array,
+it is unspecified whether the operand is evaluated;
 otherwise,
 the operand is not evaluated and the result is an integer constant.
+If the operand is evaluated,
+it is unspecified the number of times it is evaluated.

Which sounds very suspicious.

> I'm still thinking on how important rank + extent is vs overall array
> length. If C had constexpr functions, then I'd almost certainly want
> array rank and extent to be the building blocks and then lengthof can
> be a constexpr function looping over rank and summing extents. But we
> don't have that yet, and "bird hand" vs "bird in bush"... :-D

Or you can build it the other way around: define extent() as a macro
that wraps lengthof().

About rank, I suspect you could also develop something with _Generic(3),
but I didn't try.

Cheers,
Alex

-- 

signature.asc
Description: PGP signature

[PATCH] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook

2024-08-14 Thread H.J. Lu

The new hook allows the linker plugin to distinguish calls to
claim_file_handler that know the object is being used by the linker
(from ldmain.c:add_archive_element), from calls that don't know it's
being used by the linker (from elf_link_is_defined_archive_symbol); in
the latter case, the plugin should avoid including the unused LTO archive
members in linker output.  To get the proper support for archives with
LTO common symbols, the linker fix for

https://sourceware.org/bugzilla/show_bug.cgi?id=32083

is required.

PR lto/116361
* lto-plugin.c (claim_file_handler_v2): Include the LTO object
only if it is known to be used for link output.

Signed-off-by: H.J. Lu 
---
 lto-plugin/lto-plugin.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index 152648338b9..2d2bfa60d42 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -1286,13 +1286,17 @@ claim_file_handler_v2 (const struct 
ld_plugin_input_file *file, int *claimed,
  lto_file.symtab.syms);
   check (status == LDPS_OK, LDPL_FATAL, "could not add symbols");
 
-  LOCK_SECTION;
-  num_claimed_files++;
-  claimed_files =
-   xrealloc (claimed_files,
- num_claimed_files * sizeof (struct plugin_file_info));
-  claimed_files[num_claimed_files - 1] = lto_file;
-  UNLOCK_SECTION;
+  /* Include it only if it is known to be used for link output.  */
+  if (known_used)
+   {
+ LOCK_SECTION;
+ num_claimed_files++;
+ claimed_files =
+   xrealloc (claimed_files,
+ num_claimed_files * sizeof (struct plugin_file_info));
+ claimed_files[num_claimed_files - 1] = lto_file;
+ UNLOCK_SECTION;
+   }
 
   *claimed = 1;
 }
@@ -1313,7 +1317,7 @@ claim_file_handler_v2 (const struct ld_plugin_input_file 
*file, int *claimed,
   if (*claimed && !obj.offload && offload_files_last_lto == NULL)
 offload_files_last_lto = offload_files_last;
 
-  if (obj.offload && (known_used || obj.found > 0))
+  if (obj.offload && known_used && obj.found > 0)
 {
   /* Add file to the list.  The order must be exactly the same as the final
 order after recompilation and linking, otherwise host and target tables
-- 
2.46.0

Re: [PATCH-1v4] Value Range: Add range op for builtin isinf

2024-08-14 Thread Jeff Law





On 8/14/24 1:04 AM, HAO CHEN GUI wrote:

Hi Jeff,

   May I know your final decision on this patch?
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656937.html

I thought I acked the whole set.

Jeff

RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Ballman, Aaron

> What regex did you use for searching?

I went cheap and easy rather than trying to narrow down:
https://sourcegraph.com/search?q=context:global+lang:C+lengthof&patternType=regexp&sm=0

> I was thinking of renaming the proposal to elementsof(), to avoid confusion 
> between length of an array and length of a string.  Would you mind checking 
> if elementsof() is ok?

From what I was seeing, it looks to be used more uniformly as a function-like 
macro accepting a single argument.

~Aaron

-Original Message-
From: Alejandro Colomar  
Sent: Wednesday, August 14, 2024 8:58 AM
To: Jens Gustedt ; Ballman, Aaron 

Cc: Xavier Del Campo Romero ; Gcc Patches 
; Daniel Plakosh ; Martin Uecker 
; Joseph Myers ; Gabriel Ravier 
; Jakub Jelinek ; Kees Cook 
; Qing Zhao ; David Brown 
; Florian Weimer ; Andreas Schwab 
; Timm Baeder ; A. Jiang 
; Eugene Zelenko 
Subject: Re: v2.1 Draft for a lengthof paper

Hi Aaron, Jens,

On Wed, Aug 14, 2024 at 02:17:52PM GMT, Jens Gustedt wrote:
> Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" 
> :
> > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > 
> > > For a WG14 paper you should add these findings to support that choice.
> > > Another option would be for WG14 to standardize the then existing 
> > > implementation with the double underscores.
> > 
> > +1, it's always good to explain prior art and existing uses as part
> > of the paper. However, please also point out that C++ has a prior 
> > art as well which is slightly different and very much worth
> > considering: they have one API for getting the array's rank, and 
> > another for getting a specific rank's extent. This is a general 
> > solution that doesn't require the programmer to have deep knowledge 
> > of C's declarator syntax and how it relates to multidimensional 
> > arrays.

I have added that to my draft.  I'll publish it soon as a reply to the GCC 
mailing list.  See below for details of what I have added for now.

> > 
> > That said, I suspect WG14 would not be keen on standardizing 
> > `lengthof` without an ugly keyword given that there are plenty of other 
> > uses of it that would break:
> > 
> > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/s
> > rc/cmd/mailx/names.c?L53-55
> > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod
> > _fw.c?L292-294
> > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/
> > blob/src/spur64.stack/validImage.c?L7014-7018
> > (and many, many others)

What regex did you use for searching?

I was thinking of renaming the proposal to elementsof(), to avoid confusion 
between length of an array and length of a string.  Would you mind checking if 
elementsof() is ok?

> > >> > As for the parentheses, I personally think lengthof should 
> > >> > follow similar rules compared to sizeof.
> > >> 
> > >> I think most people agree with this.
> > >
> > > I still don't, in particular not for standardisation.
> > > 
> > > We have to remember that there are many small C compilers out there.
> > 
> > Those compilers already have to handle parsing this for sizeof, so 
> > that's not particularly compelling

Agree.  I suspect it will be simpler for existing compilers to follow sizeof 
than to have new syntax.  However, it's easy to keep it as a QoI detail, so 
I've temporarily changed the wording to require parentheses, and let 
implementations lift that restriction.

> > (even if we wanted to design C
> > for the lowest common denominator of implementation effort, which 
> > I'm not convinced is a good approach these days).

Off-topic, but I wish that had been the approach when a few implementations (I 
suspect proprietary vendors; this was never
disclosed) rejected redefining NULL as the right thing: (void *) 0.

I fixed one of the last free-software implementations of NULL that expanded to 
0, and nullptr would probably never have been added if WG14 had not accepted 
the pressure from such horrible implementations.

> > That said, if we went with a rank/extent design, I think we'd *have* 
> > to use parens because the extent interface would take two operands 
> > (the array and the rank you're interested in getting the extent of) 
> > and it would be inconsistent for the rank interface to then not 
> > require parens.

   Prior art
 C
It is common in C programs to get the number of elements of
an array via the usual sizeof division and  wrap  it  in  a
macro.  Common names include:

•  ARRAY_SIZE()
•  NELEM()
•  NELEMS()
•  NITEMS()
•  NELTS()
•  elementsof()
•  lengthof()

 C++
In  C++,  there  are several standard features to determine
the number of elements of an array:

std::size()   (since C++17)
std::ssize()  (since C++20)
   The syntax of these is  identical  to

RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Jens Gustedt

Am 14. August 2024 14:40:41 MESZ schrieb "Ballman, Aaron" 
:
> > I think that this argument goes too short. E. g. implementation that 
> > already have compound expressions (or lambdas ;-) may provide a > quality 
> > implementation using `static_assert` and `typeof` alone, and don't have to 
> > touch their compiler at all.
> >
> > We should not impose an implementation in the language where doing it in a 
> > header can be completely sufficient.
> 
> But can doing this in a header be completely sufficient in practice? 

Ithindso.

> e.g., the user who passes a pointer rather than an array is in for quite a 
> surprise, or passing a struct, or passing a FAM, etc. If we want to put 
> constraints on the interface, that may be more challenging to do from a 
> header file than from the compiler. offsetof is a cautionary tale in that 
> compilers that want a reasonable QoI basically all implement this as a 
> builtin rather than the header-only version.

Yes,  with the tools that I listed and the ideas that are already in the
paper you can basically do all that, including given valuable feedback
in case of failure. 

I am currently on a summer bike trip, so not able to provide
a full reference implantation. But could do so, once I am back. 


> > Plus, implementing as a macro in a header (probably ) makes also 
> > a feature test, for those applications that already have something similar. 
> > this was basically what we did for `unreachable` and I think it worked out 
> > fine.
> 
> True!
> 
> I'm still thinking on how important rank + extent is vs overall array length. 
> If C had constexpr functions, then I'd almost certainly want array rank and 
> extent to be the building blocks and then lengthof can be a constexpr 
> function looping over rank and summing extents. But we don't have that yet, 
> and "bird hand" vs "bird in bush"... :-D

Why would you be looping? lengthof only addresses the outer dimension
sizeof would need a loop, no ?

Generally I would be opposed to imposing a complicated solution for a simple
feature

Jens

> 
> ~Aaron
> 
> -Original Message-
> From: Jens Gustedt  
> Sent: Wednesday, August 14, 2024 8:18 AM
> To: Ballman, Aaron ; Alejandro Colomar 
> ; Xavier Del Campo Romero 
> Cc: Gcc Patches ; Daniel Plakosh 
> ; Martin Uecker ; Joseph Myers 
> ; Gabriel Ravier ; Jakub Jelinek 
> ; Kees Cook ; Qing Zhao 
> ; David Brown ; Florian 
> Weimer ; Andreas Schwab ; Timm 
> Baeder ; A. Jiang ; Eugene Zelenko 
> 
> Subject: RE: v2.1 Draft for a lengthof paper
> 
> Hi Aaron,
> 
> Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" 
> :
> > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > 
> > > For a WG14 paper you should add these findings to support that choice.
> > > Another option would be for WG14 to standardize the then existing 
> > > implementation with the double underscores.
> > 
> > +1, it's always good to explain prior art and existing uses as part of the 
> > paper. However, please also point out that C++ has a prior art as well 
> > which is slightly different and very much worth considering: they have one 
> > API for getting the array's rank, and another for getting a specific rank's 
> > extent. This is a general solution that doesn't require the programmer to 
> > have deep knowledge of C's declarator syntax and how it relates to 
> > multidimensional arrays.
> > 
> > That said, I suspect WG14 would not be keen on standardizing `lengthof` 
> > without an ugly keyword given that there are plenty of other uses of it 
> > that would break: 
> > 
> > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src
> > /cmd/mailx/names.c?L53-55
> > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_f
> > w.c?L292-294
> > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/bl
> > ob/src/spur64.stack/validImage.c?L7014-7018
> > (and many, many others)
> > 
> > >> > As for the parentheses, I personally think lengthof should follow 
> > >> > similar rules compared to sizeof.
> > >> 
> > >> I think most people agree with this.
> > >
> > > I still don't, in particular not for standardisation.
> > > 
> > > We have to remember that there are many small C compilers out there.
> > 
> > Those compilers already have to handle parsing this for sizeof, so that's 
> > not particularly compelling (even if we wanted to design C for the lowest 
> > common denominator of implementation effort, which I'm not convinced is a 
> > good approach these days). That said, if we went with a rank/extent design, 
> > I think we'd *have* to use parens because the extent interface would take 
> > two operands (the array and the rank you're interested in getting the 
> > extent of) and it would be inconsistent for the rank interface to then not 
> > require parens.
> 
> I think that this argument goes too short. E. g. implementation that already 
> have compound expressions (or lambdas ;-) may provide a quality 
> implementation using `sta

Re: [PATCH, gfortran] libgfortran: implement fpu-macppc for Darwin, support IEEE arithmetic

2024-08-14 Thread Sergey Fedorov

Thank you, Iain.
I have adjusted a longer line and added an intro sentence before changelog
record.



On Wed, Aug 14, 2024 at 8:24 PM Iain Sandoe  wrote:

>
>
> > On 14 Aug 2024, at 13:17, Sergey Fedorov  wrote:
> >
> >
> >
> > On Wed, Aug 14, 2024 at 8:03 PM FX Coudert  wrote:
> > > Thank you for responding.
> > > I have added a changelog (is this a correct way?).
> >
> > Content seems ok, lines are maybe too long. Check with
> contrib/gcc-changelog/git_check_commit.py before pushing.
> > Once that is fine, OK to push.
> >
> > Looks like the script is okay with formatting:
> >
> > 36-25% /opt/local/bin/python3.11
> /Users/svacchanda/Github_barracuda156/gcc-git/contrib/gcc-changelog/git_check_commit.py
>
> > Checking 16e8ea376ada59306583decf1a218b2281a48638: OK
>
> * config/fpu-macppc.h (new file): initial support for
> powerpc-darwin.
> * configure.host: enable ieee_support for powerpc-darwin case, set
> fpu_host='fpu-macppc’.
>
> The description lines should begin with a capital letter and the lines
> should
> not exceed 80 chars (some people prefer if they do not exceed 76 chars so
> that  “git show” output fits into 80 columns).
>
> hth
> Iain
>
>
> >
> > Sergey
> >
>
>


0001-libgfortran-implement-fpu-macppc-for-Darwin-support-.patch
Description: Binary data

Re: [PATCH] c++/coroutines: fix passing *this to promise type, again [PR116327]

2024-08-14 Thread Patrick Palka

On Tue, 13 Aug 2024, Jason Merrill wrote:

> On 8/13/24 7:52 PM, Patrick Palka wrote:
> > On Tue, 13 Aug 2024, Jason Merrill wrote:
> > 
> > > On 8/12/24 10:01 PM, Patrick Palka wrote:
> > > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk/14?
> > > > 
> > > > -- >8 --
> > > > 
> > > > In r15-2210 we got rid of the unnecessary cast to lvalue reference when
> > > > passing *this to the promise type ctor, and as a drive-by change we also
> > > > simplified the code to use cp_build_fold_indirect_ref.
> > > > 
> > > > But cp_build_fold_indirect_ref apparently does too much here, namely
> > > > it has a shortcut for returning current_class_ref if the operand is
> > > > current_class_ptr.  The problem with that shortcut is current_class_ref
> > > > might have gotten clobbered earlier if it appeared in the function body,
> > > > since rewrite_param_uses walks and rewrites in-place all local variable
> > > > uses to their corresponding frame copy.
> > > > 
> > > > So later this cp_build_fold_indirect_ref for *__closure will instead
> > > > return
> > > > the mutated current_class_ref i.e. *frame_ptr->__closure, which doesn't
> > > > make sense here since we're in the ramp function and not the actor
> > > > function
> > > > where frame_ptr is in scope.
> > > > 
> > > > This patch fixes this by building INDIRECT_REF directly instead of using
> > > > cp_build_fold_indirect_ref.  (Another approach might be to restore an
> > > > unshare_expr'd current_class_ref after doing coro_rewrite_function_body
> > > > to avoid it remaining clobbered after the rewriting process.  Yet
> > > > another more ambitious approach might be to avoid this tree sharing in
> > > > the first place by returning unshared versions of current_class_ref from
> > > > maybe_dummy_object etc.)
> > > 
> > > Maybe clear current_class_ptr/ref in coro rewriting so we don't hit the
> > > shortcut?
> > 
> > That seems to work, but I'm kind of worried about what other code paths
> > that'd disable, particularly semantic code paths vs just optimizations
> > code paths such as the cp_build_fold_indirect_ref shortcut.  IIUC the
> > ramp function has the same signature as the original presumably non-static
> > member function so ideally current class ref should remain set when
> > building the ramp function body and cleared only when building/rewriting
> > the actor function body (which is never a non-static member function and
> > so doesn't have a this pointer, I think?).
> > 
> > We do the actor body stuff first however, so even if we clear
> > current_class_ref then, the restored current_class_ref during the
> > later ramp function body stuff (including during the call to
> > cp_build_fold_indirect_ref) will still be clobbered :(
> > 
> > So ISTM this more narrow approach might be preferable unless we ever run
> > into another instance of this current_class_ref clobbering issue?
> 
> Fair enough.
> 
> Is there a reason not to use build_fold_indirect_ref (without cp_)?

Not AFAICT, works for me.  Like so?  I also extended the 104981 test so
that it too triggers the issues.

-- >8 --

Subject: [PATCH] c++/coroutines: fix passing *this to promise type, again
 [PR116327]

In r15-2210 we got rid of the unnecessary cast to lvalue reference when
passing *this to the promise type ctor, and as a drive-by change we also
simplified the code to use cp_build_fold_indirect_ref.

But cp_build_fold_indirect_ref apparently does too much here, namely
it has a shortcut for returning current_class_ref if the operand is
current_class_ptr.  The problem with that shortcut is current_class_ref
might have gotten clobbered earlier if it appeared in the function body,
since rewrite_param_uses walks and rewrites in-place all local variable
uses to their corresponding frame copy.

So later this cp_build_fold_indirect_ref for *__closure will instead return
the mutated current_class_ref i.e. *frame_ptr->__closure, which doesn't
make sense here since we're in the ramp function and not the actor function
where frame_ptr is in scope.

This patch fixes this by using the build_fold_indirect_ref instead of
cp_build_fold_indirect_ref.

PR c++/116327
PR c++/104981
PR c++/115550

gcc/cp/ChangeLog:

* coroutines.cc (morph_fn_to_coro): Use build_fold_indirect_ref
instead of cp_build_fold_indirect_ref.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr104981-preview-this.C: Improve coverage by
adding a non-static data member use within the coroutine member
function.
* g++.dg/coroutines/pr116327-preview-this.C: New test.
---
 gcc/cp/coroutines.cc  |  4 ++--
 .../g++.dg/coroutines/pr104981-preview-this.C |  4 +++-
 .../g++.dg/coroutines/pr116327-preview-this.C | 22 +++
 3 files changed, 27 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116327-preview-this.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 145ec4b1d16..b1eae94a957

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Martin Uecker

Am Mittwoch, dem 14.08.2024 um 12:40 + schrieb Ballman, Aaron:
> > I think that this argument goes too short. E. g. implementation that 
> > already have compound expressions (or lambdas
> > ;-) may provide a > quality implementation using `static_assert` and 
> > `typeof` alone, and don't have to touch their
> > compiler at all.
> > 
> > We should not impose an implementation in the language where doing it in a 
> > header can be completely sufficient.
> 
> But can doing this in a header be completely sufficient in practice? e.g., 
> the user who passes a pointer rather than
> an array is in for quite a surprise, or passing a struct, or passing a FAM, 
> etc. If we want to put constraints on the
> interface, that may be more challenging to do from a header file than from 
> the compiler. offsetof is a cautionary tale
> in that compilers that want a reasonable QoI basically all implement this as 
> a builtin rather than the header-only
> version.
> 
> > Plus, implementing as a macro in a header (probably ) makes also 
> > a feature test, for those applications
> > that already have something similar. 
> > this was basically what we did for `unreachable` and I think it worked out 
> > fine.
> 
> True!
> 
> I'm still thinking on how important rank + extent is vs overall array length. 
> If C had constexpr functions, then I'd
> almost certainly want array rank and extent to be the building blocks and 
> then lengthof can be a constexpr function
> looping over rank and summing extents. But we don't have that yet, and "bird 
> hand" vs "bird in bush"... :-D

An operator that returns an array with all dimensions of a multi-dimensional
array would make a a lot of sense to me. 


double array[4][3][2];

// array_dims(array) = (constexpr size_t[3]){ 4, 3, 2 }

int dim1 = (array_dims(array))[0]
int dim2 = (array_dims(array))[1]
int dim3 = (array_dims(array))[2]
 
You can then implement lengthof in terms of this operator:

#define lengthof(x) (array_dims(array)[0])

and you can obtain the rank by applying lengthof to the array:

#define rank(x) lengthof(array_dims(x))


If the array is constexpr for regular arrays and array
indexing returns a constant again for constexpr arrays, this
would all work out.

Martin


> 
> ~Aaron
> 
> -Original Message-
> From: Jens Gustedt  
> Sent: Wednesday, August 14, 2024 8:18 AM
> To: Ballman, Aaron ; Alejandro Colomar 
> ; Xavier Del Campo Romero
> 
> Cc: Gcc Patches ; Daniel Plakosh 
> ; Martin Uecker ;
> Joseph Myers ; Gabriel Ravier ; 
> Jakub Jelinek ; Kees Cook
> ; Qing Zhao ; David Brown 
> ; Florian Weimer
> ; Andreas Schwab ; Timm Baeder 
> ; A. Jiang
> ; Eugene Zelenko 
> Subject: RE: v2.1 Draft for a lengthof paper
> 
> Hi Aaron,
> 
> Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" 
> :
> > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > 
> > > For a WG14 paper you should add these findings to support that choice.
> > > Another option would be for WG14 to standardize the then existing 
> > > implementation with the double underscores.
> > 
> > +1, it's always good to explain prior art and existing uses as part of the 
> > paper. However, please also point out
> > that C++ has a prior art as well which is slightly different and very much 
> > worth considering: they have one API for
> > getting the array's rank, and another for getting a specific rank's extent. 
> > This is a general solution that doesn't
> > require the programmer to have deep knowledge of C's declarator syntax and 
> > how it relates to multidimensional
> > arrays.
> > 
> > That said, I suspect WG14 would not be keen on standardizing `lengthof` 
> > without an ugly keyword given that there are
> > plenty of other uses of it that would break: 
> > 
> > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src
> > /cmd/mailx/names.c?L53-55
> > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_f
> > w.c?L292-294
> > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/bl
> > ob/src/spur64.stack/validImage.c?L7014-7018
> > (and many, many others)
> > 
> > > > > As for the parentheses, I personally think lengthof should follow 
> > > > > similar rules compared to sizeof.
> > > > 
> > > > I think most people agree with this.
> > > 
> > > I still don't, in particular not for standardisation.
> > > 
> > > We have to remember that there are many small C compilers out there.
> > 
> > Those compilers already have to handle parsing this for sizeof, so that's 
> > not particularly compelling (even if we
> > wanted to design C for the lowest common denominator of implementation 
> > effort, which I'm not convinced is a good
> > approach these days). That said, if we went with a rank/extent design, I 
> > think we'd *have* to use parens because the
> > extent interface would take two operands (the array and the rank you're 
> > interested in getting the extent of) and it
> > would be inconsistent for the rank interface to then not

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Jens Gustedt

Am 14. August 2024 14:58:16 MESZ schrieb Alejandro Colomar :
> Hi Aaron, Jens,
> 
> On Wed, Aug 14, 2024 at 02:17:52PM GMT, Jens Gustedt wrote:
> > Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" 
> > :
> > > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > > 
> > > > For a WG14 paper you should add these findings to support that choice.
> > > > Another option would be for WG14 to standardize the then existing 
> > > > implementation with the double underscores.
> > > 
> > > +1, it's always good to explain prior art and existing uses as part
> > > of the paper. However, please also point out that C++ has a prior
> > > art as well which is slightly different and very much worth
> > > considering: they have one API for getting the array's rank,
> > > and another for getting a specific rank's extent. This is a general
> > > solution that doesn't require the programmer to have deep knowledge
> > > of C's declarator syntax and how it relates to multidimensional
> > > arrays.
> 
> I have added that to my draft.  I'll publish it soon as a reply to the
> GCC mailing list.  See below for details of what I have added for now.
> 
> > > 
> > > That said, I suspect WG14 would not be keen on standardizing
> > > `lengthof` without an ugly keyword given that there are plenty of other 
> > > uses of it that would break: 
> > > 
> > > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
> > > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
> > > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
> > > (and many, many others)
> 
> What regex did you use for searching?
> 
> I was thinking of renaming the proposal to elementsof(), to avoid
> confusion between length of an array and length of a string.  Would you
> mind checking if elementsof() is ok?

No, not for me. I really want as to go consistently to talk about
array length for this. Consistent terminology is important.


> > > >> > As for the parentheses, I personally think lengthof should follow 
> > > >> > similar rules compared to sizeof.
> > > >> 
> > > >> I think most people agree with this.
> > > >
> > > > I still don't, in particular not for standardisation.
> > > > 
> > > > We have to remember that there are many small C compilers out there.
> > > 
> > > Those compilers already have to handle parsing this for sizeof, so
> > > that's not particularly compelling
> 
> Agree.  I suspect it will be simpler for existing compilers to follow
> sizeof than to have new syntax.  However, it's easy to keep it as a QoI
> detail, so I've temporarily changed the wording to require parentheses,
> and let implementations lift that restriction.

great ! that is a reasonable approach, I think.

> > > (even if we wanted to design C
> > > for the lowest common denominator of implementation effort, which
> > > I'm not convinced is a good approach these days).
> 
> Off-topic, but I wish that had been the approach when a few
> implementations (I suspect proprietary vendors; this was never
> disclosed) rejected redefining NULL as the right thing: (void *) 0.
> 
> I fixed one of the last free-software implementations of NULL that
> expanded to 0, and nullptr would probably never have been added if WG14
> had not accepted the pressure from such horrible implementations.
> 
> 
> 
> > > That said, if we went with a rank/extent design, I think we'd *have*
> > > to use parens because the extent interface would take two operands
> > > (the array and the rank you're interested in getting the extent of)
> > > and it would be inconsistent for the rank interface to then not
> > > require parens.
> 
>Prior art
>  C
> It is common in C programs to get the number of elements of
> an array via the usual sizeof division and  wrap  it  in  a
> macro.  Common names include:
> 
> •  ARRAY_SIZE()
> •  NELEM()
> •  NELEMS()
> •  NITEMS()
> •  NELTS()
> •  elementsof()
> •  lengthof()
> 
>  C++
> In  C++,  there  are several standard features to determine
> the number of elements of an array:
> 
> std::size()   (since C++17)
> std::ssize()  (since C++20)
>The syntax of these is  identical  to  the  usual  C
>macros named above.
> 
>It’s  a  bit different, since it’s a general purpose
>sizing template, which works on non‐array types too,
>with different semantics.
> 
>But when applied to an array, it has the same seman‐
>tics as the macros above.
> 
> std::extent  (since C++23)
>The syntax of this is quite different.   It  uses  a
>numer

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Alejandro Colomar

Hi Aaron,

On Wed, Aug 14, 2024 at 01:21:18PM GMT, Ballman, Aaron wrote:
> > What regex did you use for searching?
> 
> I went cheap and easy rather than trying to narrow down:
> https://sourcegraph.com/search?q=context:global+lang:C+lengthof&patternType=regexp&sm=0

Ahh, context:global seems to be what I wanted.  Where is that
documented?

> > I was thinking of renaming the proposal to elementsof(), to avoid confusion 
> > between length of an array and length of a string.  Would you mind checking 
> > if elementsof() is ok?
> 
> From what I was seeing, it looks to be used more uniformly as a
> function-like macro accepting a single argument.

Thanks!  I'll rename it to elementsof().

Cheers,
Alex

> ~Aaron

-- 



signature.asc
Description: PGP signature

RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Ballman, Aaron

> I am currently on a summer bike trip, so not able to provide a full reference 
> implantation. But could do so, once I am back.

No need (after thinking on this a bit more, I believe you're right that this 
can be done in a macro-only implementation; we might not go that route in Clang 
because of AST matching needs and whatnot, but that's not an issue), but thank 
you for the offer. Please enjoy your summer bike trip! 😊

> Why would you be looping? lengthof only addresses the outer dimension sizeof 
> would need a loop, no ?

Due to poor reading comprehension, I missed in the paper that lengthof works on 
the outer dimension. 😉 I think having a way to get the flattened size of a 
multidimensional array is a useful feature.

~Aaron

-Original Message-
From: Jens Gustedt  
Sent: Wednesday, August 14, 2024 9:25 AM
To: Ballman, Aaron ; Alejandro Colomar 
; Xavier Del Campo Romero 
Cc: Gcc Patches ; Daniel Plakosh ; 
Martin Uecker ; Joseph Myers ; Gabriel 
Ravier ; Jakub Jelinek ; Kees Cook 
; Qing Zhao ; David Brown 
; Florian Weimer ; Andreas Schwab 
; Timm Baeder ; A. Jiang 
; Eugene Zelenko 
Subject: RE: v2.1 Draft for a lengthof paper

Am 14. August 2024 14:40:41 MESZ schrieb "Ballman, Aaron" 
:
> > I think that this argument goes too short. E. g. implementation that 
> > already have compound expressions (or lambdas ;-) may provide a > quality 
> > implementation using `static_assert` and `typeof` alone, and don't have to 
> > touch their compiler at all.
> >
> > We should not impose an implementation in the language where doing it in a 
> > header can be completely sufficient.
> 
> But can doing this in a header be completely sufficient in practice? 

Ithindso.

> e.g., the user who passes a pointer rather than an array is in for quite a 
> surprise, or passing a struct, or passing a FAM, etc. If we want to put 
> constraints on the interface, that may be more challenging to do from a 
> header file than from the compiler. offsetof is a cautionary tale in that 
> compilers that want a reasonable QoI basically all implement this as a 
> builtin rather than the header-only version.

Yes,  with the tools that I listed and the ideas that are already in the paper 
you can basically do all that, including given valuable feedback in case of 
failure. 

I am currently on a summer bike trip, so not able to provide a full reference 
implantation. But could do so, once I am back. 


> > Plus, implementing as a macro in a header (probably ) makes also 
> > a feature test, for those applications that already have something similar. 
> > this was basically what we did for `unreachable` and I think it worked out 
> > fine.
> 
> True!
> 
> I'm still thinking on how important rank + extent is vs overall array 
> length. If C had constexpr functions, then I'd almost certainly want 
> array rank and extent to be the building blocks and then lengthof can 
> be a constexpr function looping over rank and summing extents. But we 
> don't have that yet, and "bird hand" vs "bird in bush"... :-D

Why would you be looping? lengthof only addresses the outer dimension sizeof 
would need a loop, no ?

Generally I would be opposed to imposing a complicated solution for a simple 
feature

Jens

> 
> ~Aaron
> 
> -Original Message-
> From: Jens Gustedt 
> Sent: Wednesday, August 14, 2024 8:18 AM
> To: Ballman, Aaron ; Alejandro Colomar 
> ; Xavier Del Campo Romero 
> Cc: Gcc Patches ; Daniel Plakosh 
> ; Martin Uecker ; Joseph Myers 
> ; Gabriel Ravier ; Jakub 
> Jelinek ; Kees Cook ; Qing 
> Zhao ; David Brown ; 
> Florian Weimer ; Andreas Schwab 
> ; Timm Baeder ; A. Jiang 
> ; Eugene Zelenko 
> Subject: RE: v2.1 Draft for a lengthof paper
> 
> Hi Aaron,
> 
> Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" 
> :
> > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > 
> > > For a WG14 paper you should add these findings to support that choice.
> > > Another option would be for WG14 to standardize the then existing 
> > > implementation with the double underscores.
> > 
> > +1, it's always good to explain prior art and existing uses as part of the 
> > paper. However, please also point out that C++ has a prior art as well 
> > which is slightly different and very much worth considering: they have one 
> > API for getting the array's rank, and another for getting a specific rank's 
> > extent. This is a general solution that doesn't require the programmer to 
> > have deep knowledge of C's declarator syntax and how it relates to 
> > multidimensional arrays.
> > 
> > That said, I suspect WG14 would not be keen on standardizing `lengthof` 
> > without an ugly keyword given that there are plenty of other uses of it 
> > that would break: 
> > 
> > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/s
> > rc
> > /cmd/mailx/names.c?L53-55
> > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod
> > _f
> > w.c?L292-294
> > https://sourcegraph.com/github.com/OpenSmallta

RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Ballman, Aaron

> Ahh, context:global seems to be what I wanted.  Where is that documented?

For me it is the default when I go to https://sourcegraph.com/search but 
there's documentation at 
https://sourcegraph.com/docs/code-search/working/search_contexts

> Thanks!  I'll rename it to elementsof().

Rather than renaming it, I'd say that the name chosen in the proposed text is a 
placeholder, and have a section in the prose that describes different naming 
choices, pros and cons, suggests a name from you as the author, but asks WG14 
to pick the final name. I know Jens mentioned he doesn’t like the name 
`elementsof` and I suspect if we ask five more people we'll get about seven 
more opinions on what the name could/should be. 😝

~Aaron

-Original Message-
From: Alejandro Colomar  
Sent: Wednesday, August 14, 2024 10:00 AM
To: Ballman, Aaron 
Cc: Jens Gustedt ; Xavier Del Campo Romero 
; Gcc Patches ; Daniel Plakosh 
; Martin Uecker ; Joseph Myers 
; Gabriel Ravier ; Jakub Jelinek 
; Kees Cook ; Qing Zhao 
; David Brown ; Florian Weimer 
; Andreas Schwab ; Timm Baeder 
; A. Jiang ; Eugene Zelenko 

Subject: Re: v2.1 Draft for a lengthof paper

Hi Aaron,

On Wed, Aug 14, 2024 at 01:21:18PM GMT, Ballman, Aaron wrote:
> > What regex did you use for searching?
> 
> I went cheap and easy rather than trying to narrow down:
> https://sourcegraph.com/search?q=context:global+lang:C+lengthof&patter
> nType=regexp&sm=0

Ahh, context:global seems to be what I wanted.  Where is that documented?

> > I was thinking of renaming the proposal to elementsof(), to avoid confusion 
> > between length of an array and length of a string.  Would you mind checking 
> > if elementsof() is ok?
> 
> From what I was seeing, it looks to be used more uniformly as a 
> function-like macro accepting a single argument.

Thanks!  I'll rename it to elementsof().

Cheers,
Alex

> ~Aaron

--

Re: [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs

2024-08-14 Thread Victor Do Nascimento


On 8/14/24 13:24, Tamar Christina wrote:


It seems to me that this should take a code_helper, create the vector modes and 
call directly_supported_p, or am I missing something?


Ok. Having done some digging around in the git history, I see that 
`vect_supportable_direct_optab_p', upon which I based my implementation 
of `vect_supportable_conv_optab_p', was committed before you wrote the 
`directly_supported_p' function, which I guess explains why 
`vect_supportable_direct_optab_p' does not take advantage of the latter 
function to avoid code duplication.


I'd wrongly presumed that we'd have been aware of the existence of 
`directly_supported_p' when writing `vect_supportable_direct_optab_p', 
such that any apparent code duplication would have been a conscious 
choice by the author, thus making it a relevant design consideration in 
my own implementation.


Anyway, will submit updated patch shortly.

Cheers,
Victor

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Alejandro Colomar

Hi Martin,

On Wed, Aug 14, 2024 at 03:50:00PM GMT, Martin Uecker wrote:
> An operator that returns an array with all dimensions of a multi-dimensional
> array would make a a lot of sense to me. 
> 
> 
> double array[4][3][2];
> 
> // array_dims(array) = (constexpr size_t[3]){ 4, 3, 2 }

And what if array[4][n][2]?  No constexpr anymore, which is bad.

> 
> int dim1 = (array_dims(array))[0]
> int dim2 = (array_dims(array))[1]
> int dim3 = (array_dims(array))[2]
>  
> You can then implement lengthof in terms of this operator:
> 
> #define lengthof(x) (array_dims(array)[0])

Not really.  This implementation would result in less constant
expressions that my proposal.  That's detrimental for diagnostics and
usability.

And the fundamental operator would be very complex, to allow users
implementing simpler wrappers.  I think the fundamental operators should
be as simple as possible, in the spirit of C, and let users build on top
of those basic tools.

This reminds me of the 'static' specifier for array parameters, which is
conflated with two meanings: nonnull and length.  I'd rather have a way
to specify nullness, and another one to specify length, and let users
compose them.

At first glance I oppose this array_dims operator.

> and you can obtain the rank by applying lengthof to the array:
> 
> #define rank(x) lengthof(array_dims(x))

I'm curious to see what kind of code would be enabled by a rank()
operator in C that we can't write at the moment.

> If the array is constexpr for regular arrays and array
> indexing returns a constant again for constexpr arrays, this
> would all work out.
> 
> Martin

Have a lovely day!
Alex

-- 

signature.asc
Description: PGP signature

Re: [PATCH] PR tree-optimization/101390: Vectorize modulo operator

2024-08-14 Thread Richard Sandiford

Jennifer Schmitz  writes:
> This patch adds a new vectorization pattern that detects the modulo
> operation where the second operand is a variable.
> It replaces the statement by division, multiplication, and subtraction.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> Ok for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>
>   PR tree-optimization/101390
>   * tree-vect-pattern.cc (vect_recog_mod_var_pattern): Add new pattern.
>
> gcc/testsuite/
>   PR tree-optimization/101390
>   * gcc.dg/vect/vect-mod-var.c: New test.
>
> From 15e7abe690ba4f2702b02b29a3198a3309aeb48e Mon Sep 17 00:00:00 2001
> From: Jennifer Schmitz 
> Date: Wed, 7 Aug 2024 08:56:45 -0700
> Subject: [PATCH] PR tree-optimization/101390: Vectorize modulo operator
>
> This patch adds a new vectorization pattern that detects the modulo
> operation where the second operand is a variable.
> It replaces the statement by division, multiplication, and subtraction.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> Ok for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>
>   PR tree-optimization/101390
>   * tree-vect-pattern.cc (vect_recog_mod_var_pattern): Add new pattern.
>
> gcc/testsuite/
>   PR tree-optimization/101390
>   * gcc.dg/vect/vect-mod-var.c: New test.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-mod-var.c | 40 ++
>  gcc/tree-vect-patterns.cc| 66 
>  2 files changed, 106 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-mod-var.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mod-var.c 
> b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c
> new file mode 100644
> index 000..023ee43f51f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c
> @@ -0,0 +1,40 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-march=armv8.2-a+sve" { target aarch64-*-* } } */

Since this is ordinarily a run test (good!), we can't add extra target
flags unless we're sure that the target hardware supports it.  Also:

> +
> +#include "tree-vect.h"
> +
> +#define N 64
> +
> +__attribute__ ((noinline)) int
> +f (int *restrict a, int *restrict b, int *restrict c)
> +{
> +  for (int i = 0; i < N; ++i)
> +c[i] = a[i] % b[i];
> +}
> +
> +#define BASE1 -126
> +#define BASE2 116
> +
> +int
> +main (void)
> +{
> +  check_vect ();
> +
> +  int a[N], b[N], c[N];
> +
> +  for (int i = 0; i < N; ++i)
> +{
> +  a[i] = BASE1 + i * 5;
> +  b[i] = BASE2 - i * 4;
> +  __asm__ volatile ("");
> +}
> +
> +  f (a, b, c);
> +
> +#pragma GCC novector
> +  for (int i = 0; i < N; ++i)
> +if (c[i] != a[i] % b[i])
> +  __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "vect_recog_mod_var_pattern: detected" "vect" 
> } } */

...it looks like this wouldn't pass on non-aarch64 targets.

I think for the vect testsuite, we should add a new:

  check_effective_target_vect_int_div

that returns true if [check_effective_target_aarch64_sve].  We can then
drop the dg-additional-options and instead add { target vect_int_div }
to the scan-tree-dump test.  We wouldn't need the:

  /* { dg-require-effective-target vect_int } */

since the test should pass (without the scan) on all targets.

This means that the vect testsuite will only cover SVE if the testsuite
is run with SVE enabled.  That's usual practice for the vect testsuite
though, and doesn't seem too bad IMO.  We should sometimes run the
testsuite with SVE enabled by default if we want get good test coverage
for SVE.

That said, if we want a test for SVE that runs on all aarch64 targets,
it might be worth adding a compile-only test to gcc.target/aarch64/sve,
say "mod_1.c", in addition to the test above.  mod_1.c can then test for
code quality, not just whether vectorisation takes place.  It could cover
both 32-bit and 64-bit cases, and both signed and unsigned; see
div_1.c for an example.

> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index f52de2b6972..8ea31510f6f 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -5264,6 +5264,71 @@ vect_recog_divmod_pattern (vec_info *vinfo,
>return pattern_stmt;
>  }
>  
> +/* Detects pattern with a modulo operation (S1) where both arguments
> +  are variables of integral type.
> +  The statement is replaced by division, multiplication, and subtraction.
> +  The last statement (S4) is returned.
> +
> +  Example:
> +  S1 c_t = a_t % b_t;
> +
> +  is replaced by
> +  S2 x_t = a_t / b_t;
> +  S3 y_t = x_t * b_t;
> +  S4 z_t = a_t - y_t;  */

Minor formatting nit, but: the comment should be indented under the
"Detects".

> +
> +static gimple *
> +vect_recog_mod_var_pattern (vec_info *vinfo,
> + stmt_vec_info stmt_vinfo, tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +  tree oprnd0, oprnd1, vectype, itype;
> +  gimple *pattern_s

[PATCH] c: Enable -f{,no-}char8_t option for C/ObjC

2024-08-14 Thread Jakub Jelinek

Hi!

The N2653 paper contains:
"The proposed change to the type of UTF-8 string literals impacts backward
compatibility as described in the following sections. Implementors are
encouraged to offer options to disable char8_t support when necessary to
preserve compatibility with C17."
While we do have such an option for C++ (where we default to -fchar8_t
for C++20 and later, and to -fno-char8_t otherwise, but let the user
override it), for C we just use that option under the hood (similarly
set flag_char8_t to true for C23 and to false otherwise) but don't actually
allow the users to tweak that, which can help with incremental porting,
either allow -std=c17 -fchar8_t or -std=c23 -fno-char8_t.

The following patch enables the option also for C/ObjC.

Ok for trunk if it passes bootstrap/regtest?

2024-08-14  Jakub Jelinek  

gcc/
* doc/invoke.texi (-fchar8_t): Move to C section from C++,
document behavior for both C and C++.
gcc/c-family/
* c.opt (fchar8_t): Also enable for C and ObjC.
* c.opt.urls: Regenerate.
gcc/testsuite/
* gcc.dg/c17-utf8str-type-2.c: New test.
* gcc.dg/c23-utf8str-type-2.c: New test.
* gcc.dg/c23-utf8char-4.c: New test.

--- gcc/doc/invoke.texi.jj  2024-08-12 10:49:12.521610231 +0200
+++ gcc/doc/invoke.texi 2024-08-14 16:15:06.824421678 +0200
@@ -208,12 +208,12 @@ in the following sections.
 -fpermitted-flt-eval-methods=@var{standard}
 -fplan9-extensions  -fsigned-bitfields  -funsigned-bitfields
 -fsigned-char  -funsigned-char  -fstrict-flex-arrays[=@var{n}]
--fsso-struct=@var{endianness}}
+-fsso-struct=@var{endianness}  -fchar8_t}
 
 @item C++ Language Options
 @xref{C++ Dialect Options,,Options Controlling C++ Dialect}.
 @gccoptlist{-fabi-version=@var{n}  -fno-access-control
--faligned-new=@var{n}  -fargs-in-order=@var{n}  -fchar8_t  -fcheck-new
+-faligned-new=@var{n}  -fargs-in-order=@var{n}  -fcheck-new
 -fconstexpr-depth=@var{n}  -fconstexpr-cache-depth=@var{n}
 -fconstexpr-loop-limit=@var{n}  -fconstexpr-ops-limit=@var{n}
 -fno-elide-constructors
@@ -3013,6 +3013,62 @@ the target (the default).  This option i
 @strong{Warning:} the @option{-fsso-struct} switch causes GCC to generate
 code that is not binary compatible with code generated without it if the
 specified endianness is not the native endianness of the target.
+
+@opindex fchar8_t
+@opindex fno-char8_t
+@item -fchar8_t
+@itemx -fno-char8_t
+Enable support for @code{char8_t} as adopted for C++20 and C23.  This includes
+the addition of a new @code{char8_t} fundamental type (for C++ only), changes 
to the
+types of UTF-8 string and character literals, and for C++ only also new 
signatures for
+user-defined literals, associated standard library updates, and new
+@code{__cpp_char8_t} and @code{__cpp_lib_char8_t} feature test macros.
+For C @code{char8_t} is a typedef to @code{unsigned char} in
+@code{} header.
+
+For C++ this option enables functions to be overloaded for ordinary and UTF-8
+strings:
+
+@smallexample
+int f(const char *);// #1
+int f(const char8_t *); // #2
+int v1 = f("text"); // Calls #1
+int v2 = f(u8"text");   // Calls #2
+@end smallexample
+
+@noindent
+and introduces new signatures for user-defined literals:
+
+@smallexample
+int operator""_udl1(char8_t);
+int v3 = u8'x'_udl1;
+int operator""_udl2(const char8_t*, std::size_t);
+int v4 = u8"text"_udl2;
+template int operator""_udl3();
+int v5 = u8"text"_udl3;
+@end smallexample
+
+@noindent
+The change to the types of UTF-8 string and character literals
+introduces incompatibilities with ISO C++11 and later standards.  For example,
+the following code is well-formed under ISO C++11, but is ill-formed when
+@option{-fchar8_t} is specified.
+
+@smallexample
+const char *cp = u8"xx";// error: invalid conversion from
+//`const char8_t*' to `const char*'
+int f(const char*);
+int v = f(u8"xx");  // error: invalid conversion from
+//`const char8_t*' to `const char*'
+#ifdef __cplusplus
+std::string s@{u8"xx"@};  // error: no matching function for call to
+//`std::basic_string::basic_string()'
+using namespace std::literals;
+s = u8"xx"s;// error: conversion from
+//`basic_string' to non-scalar
+//type `basic_string' requested
+#endif
+@end smallexample
 @end table
 
 @node C++ Dialect Options
@@ -3157,58 +3213,6 @@ but few users will need to override the
 
 This flag is enabled by default for @option{-std=c++17}.
 
-@opindex fchar8_t
-@opindex fno-char8_t
-@item -fchar8_t
-@itemx -fno-char8_t
-Enable support for @code{char8_t} as adopted for C++20.  This includes
-the addition of a new @code{char8_t} fundamental type, changes to the
-types of UTF-8 string and character literals, new signatures for
-user-defined literals, associated standard library updates, and new
-@code{__cpp_char8_t} and @code{__cpp_lib_char8_t} featur

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Alejandro Colomar

Hi Aaron,

On Wed, Aug 14, 2024 at 01:59:58PM GMT, Ballman, Aaron wrote:
> > Why would you be looping? lengthof only addresses the outer dimension 
> > sizeof would need a loop, no ?
> 
> Due to poor reading comprehension, I missed in the paper that lengthof
> works on the outer dimension. 😉 I think having a way to get the
> flattened size of a multidimensional array is a useful feature.

As long as you know the type of the inner-most element, you can do it.
This excludes auto, but I think you usually know this.

double x[4][5][6][7];
size_t n = sizeof(x) / sizeof(double);

This hard-codes 'double', but should be good enough usually.

Cheers,
Alex

-- 



signature.asc
Description: PGP signature

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Martin Uecker

Am Mittwoch, dem 14.08.2024 um 16:12 +0200 schrieb Alejandro Colomar:
> Hi Martin,
> 
> On Wed, Aug 14, 2024 at 03:50:00PM GMT, Martin Uecker wrote:
> > An operator that returns an array with all dimensions of a multi-dimensional
> > array would make a a lot of sense to me. 
> > 
> > 
> > double array[4][3][2];
> > 
> > // array_dims(array) = (constexpr size_t[3]){ 4, 3, 2 }
> 
> And what if array[4][n][2]?  No constexpr anymore, which is bad.

> > 
> > int dim1 = (array_dims(array))[0]
> > int dim2 = (array_dims(array))[1]
> > int dim3 = (array_dims(array))[2]
> >  
> > You can then implement lengthof in terms of this operator:
> > 
> > #define lengthof(x) (array_dims(array)[0])
> 
> Not really.  This implementation would result in less constant
> expressions that my proposal.  That's detrimental for diagnostics and
> usability.

Yes, this would be a downside when implementing lengthof
in this way.

> 
> And the fundamental operator would be very complex, to allow users
> implementing simpler wrappers.  I think the fundamental operators should
> be as simple as possible, in the spirit of C, and let users build on top
> of those basic tools.
> 
> This reminds me of the 'static' specifier for array parameters, which is
> conflated with two meanings: nonnull and length.  I'd rather have a way
> to specify nullness, and another one to specify length, and let users
> compose them.
> 
> At first glance I oppose this array_dims operator.

Opinionated as usual ;-)

> > and you can obtain the rank by applying lengthof to the array:
> > 
> > #define rank(x) lengthof(array_dims(x))
> 
> I'm curious to see what kind of code would be enabled by a rank()
> operator in C that we can't write at the moment.

There seems to be no generic way to get all dimensions from
a multi-dimensional array of arbitrary rank.


Martin

> 
> > If the array is constexpr for regular arrays and array
> > indexing returns a constant again for constexpr arrays, this
> > would all work out.
> > 
> > Martin
> 
> Have a lovely day!
> Alex
> 

-- 
Univ.-Prof. Dr. rer. nat. Martin Uecker
Graz University of Technology
Institute of Biomedical Imaging

[Patch, rs6000, middle-end] v8: Add implementation for different targets for pair mem fusion

2024-08-14 Thread Ajit Agarwal

Hello Richard:

This patch addresses all the review comments.

Common infrastructure using generic code for pair mem fusion of different
targets.

rs6000 target specific code implement virtual functions defined by generic code.

Target specific code are added in rs6000-mem-fusion.cc.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


rs6000, middle-end: Add implementation for different targets for pair mem fusion

Common infrastructure using generic code for pair mem fusion of different
targets.

rs6000 target specific code implement virtual functions defined by generic code.

Target specific code are added in rs6000-mem-fusion.cc.

2024-08-14  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000-passes.def: New mem fusion pass
before pass_early_remat.
* pair-fusion.h: Add additional pure virtual function
required for rs6000 target implementation.
* pair-fusion.cc: Use of virtual functions for additional
virtual function addded for rs6000 target.
* config/rs6000/rs6000-mem-fusion.cc: Add new pass.
Add target specific implementation for generic pure virtual
functions.
* config/rs6000/mma.md: Modify movoo machine description.
Add new machine description movoo1.
* config/rs6000/rs6000.cc: Modify rs6000_split_multireg_move
to expand movoo machine description for all constraints.
* config.gcc: Add new object file.
* config/rs6000/rs6000-protos.h: Add new prototype for mem
fusion pass.
* config/rs6000/t-rs6000: Add new rule.
* rtl-ssa/functions.h: Move out allocate function from private
to public and add get_m_temp_defs function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/mem-fusion.C: New test.
* g++.target/powerpc/mem-fusion-1.C: New test.
* gcc.target/powerpc/mma-builtin-1.c: Modify test.
---
 gcc/config.gcc|   2 +
 gcc/config/rs6000/mma.md  |  26 +-
 gcc/config/rs6000/rs6000-mem-fusion.cc| 677 ++
 gcc/config/rs6000/rs6000-passes.def   |   4 +-
 gcc/config/rs6000/rs6000-protos.h |   1 +
 gcc/config/rs6000/rs6000.cc   |  58 +-
 gcc/config/rs6000/rs6000.md   |   1 +
 gcc/config/rs6000/t-rs6000|   5 +
 gcc/pair-fusion.cc|  33 +-
 gcc/pair-fusion.h |  48 ++
 gcc/rtl-ssa/functions.h   |  16 +-
 .../g++.target/powerpc/mem-fusion-1.C |  22 +
 gcc/testsuite/g++.target/powerpc/mem-fusion.C |  15 +
 .../gcc.target/powerpc/mma-builtin-1.c|   4 +-
 14 files changed, 880 insertions(+), 32 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-mem-fusion.cc
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion.C

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a36dd1bcbc6..e794d6b62b6 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -524,6 +524,7 @@ powerpc*-*-*)
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -560,6 +561,7 @@ rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.cc 
\$(srcdir)/config/rs6000/rs6000-call.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
;;
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 04e2d0066df..88413926a02 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -294,7 +294,31 @@
 
 (define_insn_and_split "*movoo"
   [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa")
-   (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
+(match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
+  "TARGET_MMA
+   && (gpc_reg_operand (operands[0], OOmode)
+   || gpc_reg_operand (operands[1], OOmode))"
+;;""
+  "@
+   #
+   #
+   #"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "type" "vecload,vecstore,veclogical")
+   (set_attr "length" "*,*,8")])
+;;   (set_attr "max_prefixed_insns" "2,2,*")])
+
+
+(define_insn_and_split "*movoo1"
+  [(set (matc

Re: [Patch, rs6000, middle-end] v7: Add implementation for different targets for pair mem fusion

2024-08-14 Thread Ajit Agarwal

Hello Richard:

On 12/08/24 5:30 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> [...]
>> +static void
>> +update_change (set_info *set)
>> +{
>> +  if (!set->has_any_uses ())
>> +return;
>> +
>> +  auto *use = *set->all_uses ().begin ();
>> +  do
>> +{
>> +  auto *next_use = use->next_use ();
>> +  if (use->is_in_phi ())
>> +{
>> +  update_change (use->phi ());
>> +}
>> +  else
>> +{
>> +  crtl->ssa->remove_use (use);
> 
> This isn't right.  AFAICT it's simply removing uses from a debug
> instruction even though the debug instruction pattern still refers
> to the register.
> 
> Instead, the debug instruction needs to be updated to refer to the
> correct half of the new double-width load result.  If the update fails
> for any reason, the debug insn should be "reset" (i.e. changed to
> 
>   INSN_VAR_LOCATION_LOC (use_rtl) = gen_rtx_UNKNOWN_VAR_LOC ();
> 
> see insn_combination::substitute_debug_use in late-combine.cc for
> an example).
> 

Addressed in v8 of the patch.

>> +}
>> +  use = next_use;
>> +}
>> +  while (use);
>> +}
>> [...]
>> +// Generate new reg rtx with copy of OLD_DEST for OOmode pair.
>> +static rtx
>> +new_reg_rtx (rtx old_dest)
>> +{
>> +  rtx new_dest_exp = gen_reg_rtx (OOmode);
>> +  ORIGINAL_REGNO (new_dest_exp) = ORIGINAL_REGNO (old_dest);
>> +  REG_USERVAR_P (new_dest_exp) = REG_USERVAR_P (old_dest);
>> +  REG_POINTER (new_dest_exp) = REG_POINTER (old_dest);
>> +  REG_ATTRS (new_dest_exp) = REG_ATTRS (old_dest);
>> +  max_regno = max_reg_num ();
>> +  return new_dest_exp;
>> +}
> 
> The new register is a different size and mode from OLD_DEST, so it isn't
> appropriate to copy across this much information.  The caller should just
> use gen_reg_rtx directly.
>

Addressed in v8 of the patch.
 
>> +
>> +// Set subreg with use of INSN given SRC rtx instruction.
>> +static void
>> +set_load_subreg (insn_info *i1, rtx src)
>> +{
>> +  rtx set = single_set (i1->rtl());
>> +  rtx old_dest = SET_DEST (set);
>> +
>> +  for (auto def : i1->defs ())
>> +{
>> +  auto set = dyn_cast (def);
>> +  for (auto use : set->nondebug_insn_uses ())
>> +{
>> +  insn_info *info = use->insn ();
>> +  if (!info || !info->rtl ())
>> +continue;
>> +
>> +  rtx_insn *rtl_insn = info->rtl ();
>> +  df_ref ref;
>> +
>> +  FOR_EACH_INSN_USE (ref, rtl_insn)
>> +{
>> +  rtx dest_exp = SET_DEST (PATTERN (i1->rtl ()));
>> +  if (REG_P (dest_exp)
>> +  && DF_REF_REGNO (ref) == REGNO (dest_exp))
>> +{
>> +  rtx *loc = DF_REF_LOC (ref);
>> +  insn_propagation prop (rtl_insn, old_dest, src);
>> +  if (GET_CODE (*loc) == SUBREG)
>> +{
>> +  if (!prop.apply_to_pattern (loc))
>> +{
>> +  if (dump_file != NULL)
>> +{
>> +  fprintf (dump_file,
>> +   "Cannot propagate insn \n");
>> +  print_rtl_single (dump_file, rtl_insn);
> 
> We can't simply ignore the failure, since it would leave the instruction
> referring to the wrong register.  We either need to assert that the failure
> can't happen (dangerous) or bail out of the load fusion if propagation fails.
> 

Addressed in v8 of the patch.

>> +}
>> +  return;
>> +}
>> +}
>> +  else
>> +*loc = copy_rtx (src);
>> +}
>> +}
>> +}
> 
> We shouldn't have two ways of doing this.  The above FOR_EACH_INSN_USE
> loop should be replaced by something like:
> 
>   insn_propagation prop (rtl_insn, old_dest, src);
>   if (!prop.apply_to_pattern (loc))
> ...
> 
> so that the substitution is done in one step, and only done using
> insn_propagation.
> 
> Like I mentioned before, this change needs to be described as an
> insn_change as well, so that rtl-ssa framework will see it.  This is
> important because later insn movement decisions require the def-use
> information to be up-to-date.
> 

Addressed in v8 of the patch.

>> +}
>> +}
>> +
>> +// Set subreg for OO mode store pair to generate registers in pairs
>> +// given insn_info I1 and I2.
>> +static void
>> +set_multiword_subreg_store (insn_info *i1, insn_info *i2)
>> +{
>> +  rtx_insn *insn1 = i1->rtl ();
>> +  rtx_insn *insn2 = i2->rtl ();
>> +  rtx body = PATTERN (insn1);
>> +  rtx src_exp = SET_SRC (body);
>> +  rtx insn2_body = PATTERN (insn2);
>> +  rtx insn2_dest_exp = SET_DEST (insn2_body);
>> +  machine_mode mode = GET_MODE (src_exp);
>> +  int regoff;
>> +  rtx src;
>> +  rtx addr = XEXP (insn2_dest_exp, 0);
>> +
>> +  PUT_MODE_RAW (src_exp, OOmode);
>> +  if (GET_CODE (addr) == PLUS
>> +  && XEXP (addr, 1) && CONST_INT_P (XEXP (addr, 1)))
>> +regoff = 16;
>> +  else
>> +regoff = 0;
>> +
>> +  src = simplify_gen_

Re: [PATCH] (Re: Splitting up 27_io/basic_istream/ignore/wchar_t/94749.cc (takes too long))

2024-08-14 Thread Hans-Peter Nilsson

> From: Vaseeharan Vinayagamoorthy 
> Date: Sat, 22 Jun 2024 01:38:09 +

Sorry for the late reply.  I sort of hoped somebody else
would chime in.  Maybe the issue has resolved itself in the
meantime?

> Hi,
> 
> I have noticed that in gcc-13, test05 (in the 94749.cc
> testcase) is still enabled for simulators, and I have
> noticed that because of test05, the
> 27_io/basic_istream/ignore/char/94749.cc execution test is
> not terminating on our simulator for armv8.1-m.main+mve,
> even after 3 hours.
> 
> The execution test was passing before this commit :
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e30211cb0b3a2b88959e9bc40626a17461de52de
> 
> Could you please provide some hints or ideas as to what
> might be causing this regression?

No, sorry.  If I were you and in this situation: IIUC with a
noticeable codegen regression around a specific commit (not
"just" say a percent pushing it over a timeout), I'd analyze
it with regards to the actual code regression around that
commit.  Though, that's just the usual leg-work when this
kind of regression happens: I have no insight specific to
this test.  I see no easy way around that hard work here.

> I imagine that the issue could be with the simulator or
> with code-gen. However, could this also highlight a
> different issue in test05?

It could, but I'm guessing that commit just caused a codegen
regression, perhaps even generating incorrect code to the
effect of an infinite loop.  Or is somehow it that usual ARM
caveat: default unsigned char?

> Is this testing a commonly used
> feature or area of the compiler?

All I know is that the intent is specific to functionality
in libstdc++-v3.

> And would it be worth
> re-including it for simulators?

IMHO: only if you somehow make it ARM-specific.  It'd be bad
practice to "re-enable" it for all simulator targets only
because it exposes an uninvestigated issue for one specific
configuration, and a timeout at that.

Alternatively (after analysis), the SOP is to put a derived
minimal testcase in the *generic* parts of the test-suite (C
or C++, as a runtime test) unless the compiled code really
only runs on an ARM, in which case it goes in gcc.target/arm
or g++.target/arm.

HTH.

brgds, H-P


> 
> Kind regards,
> Vasee
> 
> 
> From: Libstdc++  on behalf of 
> Jonathan Wakely via Libstdc++ 
> Sent: 10 June 2023 08:12
> To: Hans-Peter Nilsson
> Cc: Jonathan Wakely; libstdc++; gcc-patches
> Subject: Re: [PATCH] (Re: Splitting up 
> 27_io/basic_istream/ignore/wchar_t/94749.cc (takes too long))
> 
> On Sat, 10 Jun 2023, 06:18 Hans-Peter Nilsson via Libstdc++, <
> libstd...@gcc.gnu.org> wrote:
> 
> > Thank you for your consideration.  (Or is that phrase only used
> > negatively?)
> >
> > > From: Jonathan Wakely 
> > > Date: Fri, 9 Jun 2023 21:40:15 +0100
> >
> > > test01, test02, test03 and test04 should run almost instantly. On my
> > system
> > > they take about 5 microseconds each. So I don't think splitting those up
> > > will help.
> >
> > Right.
> >
> > > I thought it would help to avoid re-allocating the buffer and zeroing it
> > > again. If we reuse the same buffer, then we just have to loop until we
> > > overflow the 32-bit counter. That would make the whole test run much
> > > faster, which would reduce the total time for a testsuite run. Splitting
> > > the file up into smaller files would not decrease the total time, only
> > > decrease the time for that single test so it doesn't time out.
> > >
> > > I've attached a patch that does that. I makes very little difference for
> > > me, probably because allocating zero-filled pages isn't actually
> > expensive
> > > on linux. Maybe it will make a differene for your simulator though?
> >
> > Nope, just some five seconds down (from about 10min 21s).
> >
> 
> Bah, worth a try :)
> 
> 
> > > You could also try reducing the size of the buffer:
> > > +#ifdef SIMULATOR_TEST
> > > +  static const streamsize bufsz = 16 << limits::digits10;
> > > +#else
> > >   static const streamsize bufsz = 2048 << limits::digits10;
> > > +#endif
> >
> > Was that supposed to be with or without the patch?  Anyway;
> > both: 606s.  Only smaller bufsz: 614s.  (All numbers subject
> > to usual system jitter.)
> >
> > > test06 is the really slow part, that takes 10+ seconds for me. But that
> > > entire function should already be skipped for simulators.
> >
> > Yep, we may have been here before...  I certainly get a
> > deja-vu feeling here, but visiting old email conversations
> > of ours, it seems I easily conflate several similar ones.
> > I see that here, test06 was always #ifndef SIMULATOR_TEST.
> >
> > > We can probably skip test05 for simulators too, none of the code it tests
> > > is platform-specific, so as long as it's being tested on x86 we don't
> > > really need to test it on cris-elf too.
> >
> > Thanks.  Let's do that, then.  The similar s/wchar_t/char/
> > test clocks in at "only" 3m30s, but I suggest treating it
> > the same, if nothi

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Alejandro Colomar

On Wed, Aug 14, 2024 at 03:50:21PM GMT, Jens Gustedt wrote:
> > > > 
> > > > That said, I suspect WG14 would not be keen on standardizing
> > > > `lengthof` without an ugly keyword given that there are plenty of other 
> > > > uses of it that would break: 
> > > > 
> > > > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
> > > > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
> > > > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
> > > > (and many, many others)
> > 
> > What regex did you use for searching?
> > 
> > I was thinking of renaming the proposal to elementsof(), to avoid
> > confusion between length of an array and length of a string.  Would you
> > mind checking if elementsof() is ok?
> 
> No, not for me. I really want as to go consistently to talk about
> array length for this. Consistent terminology is important.

I understand your desire for consistency.  I think your paper is a net
improvement over the status quo (which is a mix of length, size, and
number of elements).  After your proposal, there will be only length and
number of elements.  That's great.

However, strlen(3) came first, and we must respect it.

Since you haven't proposed eliminating "number of elements" from the
standard, and it would still be used alongside length, I think
elementsof() would be consistent with your view (consistent with "number
of elements").

Alternatively, you could use a new term, for example extent, for
referring to the number of elements of an array.  That would be more
respectful to strlen(3), keeping a strong distinction between string
length and array **.

Or how about always referring to it as "number of elements"?  It's
longer to type, but would be the most consistent approach.

Also, elementsof() is free to use, while lengthof() has a several
existing incompatible cases (as Aaron has shown), so we can't use that
name so freely.

> > I have concerns about a libc (or a predefined macro) implementation:
> > the sizeof division causes double evaluation with any VLAs, while my
> > implementation for GCC has less cases of evaluation, and when it needs
> > to evaluate, it only does it once.  It would be hard to find a good
> > wording that would allow an implementation to implement this as a macro.
> 
> No, we should not allow double evaluation.
> 
> putting this in a `({})`

I would love to see a proposal for adding this GNU extension to ISO C.
Did nobody do it yet?  I could try to, if I find some time.  (But I'll
take a longish time for that; if anyone else does it, it would be
great.)

> and doing a `typedef typeof(X) _my_type;` with the macro parameter `X` at the 
> beginning completely avoids double evaluation. So quality implantations are
> possible, but perhaps differently and with other builtins than we are
> imagining. Don't impose the view of one particular implementation onto others.

Ahhh, good.  I haven't thought of that possibility.  Sure, that makes
sense now.  It gives more strength to your proposal of allowing libc
implementations, and thus require parens in the standard.

> Somewhere was brought in an argument with `offsetof`. 
> This is exactly what we need. Implementations being able to start
> with a simple solution (as everybody did in the beginning of
> `offsetof`), and improve that implementation at their pace when they
> are ready for it. 

Agree.

> > > this was basically what we did for `unreachable` and I think it worked
> > > out fine.
> 
> I still think that the different options that we had there can be used
> to ask the right questions for WG14. 

I'm looking at it.  I've already taken some parts of it.  :)

Cheers,
Alex

-- 

signature.asc
Description: PGP signature

[PATCH] c++: c->B::m access resolved through current inst [PR116320]

2024-08-14 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and later backports?

-- >8 --

Here when checking the access of (the injected-class-name) B in c->B::m
at parse time, we notice its scope B (now the type) is a base of the
object type C, so we proceed to use C as qualifying type.  But
this C is the dependent specialization not the primary template type,
so it has empty TYPE_BINFO which leads to a segfault later from
perform_or_defer_access_check.

The reason DERIVED_FROM_P / lookup_base returns true despite the object
type having empty TYPE_BINFO is because of its currently_open_class logic
(added in r9-713-gd9338471b91bbe) which replaces a dependent specialization
with the primary template type if we're inside the latter.  So the safest
fix seems to be to use currently_open_class in the caller as well.

PR c++/116320

gcc/cp/ChangeLog:

* semantics.cc (check_accessibility_of_qualified_id): Use
currently_open_class when the object type is derived from the
scope of the declaration being accessed.

gcc/testsuite/ChangeLog:

* g++.dg/template/access42.C: New test.
---
 gcc/cp/semantics.cc  | 11 ---
 gcc/testsuite/g++.dg/template/access42.C | 17 +
 2 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/access42.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index e58612660c9..5ab2076b673 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -2516,9 +2516,14 @@ check_accessibility_of_qualified_id (tree decl,
 OBJECT_TYPE.  */
   && CLASS_TYPE_P (object_type)
   && DERIVED_FROM_P (scope, object_type))
-/* If we are processing a `->' or `.' expression, use the type of the
-   left-hand side.  */
-qualifying_type = object_type;
+{
+  /* If we are processing a `->' or `.' expression, use the type of the
+left-hand side.  */
+  if (tree open = currently_open_class (object_type))
+   qualifying_type = open;
+  else
+   qualifying_type = object_type;
+}
   else if (nested_name_specifier)
 {
   /* If the reference is to a non-static member of the
diff --git a/gcc/testsuite/g++.dg/template/access42.C 
b/gcc/testsuite/g++.dg/template/access42.C
new file mode 100644
index 000..f1dcbce80c2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access42.C
@@ -0,0 +1,17 @@
+// PR c++/116320
+// { dg-do compile { target c++11 } }
+
+template struct C;
+template using C_ptr = C*;
+
+struct B { int m; using B_typedef = B; };
+
+template
+struct C : B {
+  void f(C_ptr c) {
+c->B::m;
+c->B_typedef::m;
+  }
+};
+
+template struct C;
-- 
2.46.0.77.g25673b1c47

RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Ballman, Aaron

> I would love to see a proposal for adding this GNU extension to ISO C.
> Did nobody do it yet?  I could try to, if I find some time.  (But I'll take a 
> longish time for that; if anyone else does it, it would be
great.)

It's been discussed but hasn't moved forward because there are design issues 
with it (the odd way in which it produces a resulting value, sometimes 
surprising behavior with how it interacts with flow control, the fact that it 
can't be used in all contexts, etc). The committee was leaning more towards 
lambdas despite those being a bit orthogonal.

~Aaron

-Original Message-
From: Alejandro Colomar  
Sent: Wednesday, August 14, 2024 10:48 AM
To: Jens Gustedt 
Cc: Ballman, Aaron ; Xavier Del Campo Romero 
; Gcc Patches ; Daniel Plakosh 
; Martin Uecker ; Joseph Myers 
; Gabriel Ravier ; Jakub Jelinek 
; Kees Cook ; Qing Zhao 
; David Brown ; Florian Weimer 
; Andreas Schwab ; Timm Baeder 
; A. Jiang ; Eugene Zelenko 

Subject: Re: v2.1 Draft for a lengthof paper

On Wed, Aug 14, 2024 at 03:50:21PM GMT, Jens Gustedt wrote:
> > > > 
> > > > That said, I suspect WG14 would not be keen on standardizing 
> > > > `lengthof` without an ugly keyword given that there are plenty of other 
> > > > uses of it that would break:
> > > > 
> > > > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/u
> > > > sr/src/cmd/mailx/names.c?L53-55
> > > > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/
> > > > ipod_fw.c?L292-294
> > > > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-v
> > > > m/-/blob/src/spur64.stack/validImage.c?L7014-7018
> > > > (and many, many others)
> > 
> > What regex did you use for searching?
> > 
> > I was thinking of renaming the proposal to elementsof(), to avoid 
> > confusion between length of an array and length of a string.  Would 
> > you mind checking if elementsof() is ok?
> 
> No, not for me. I really want as to go consistently to talk about 
> array length for this. Consistent terminology is important.

I understand your desire for consistency.  I think your paper is a net 
improvement over the status quo (which is a mix of length, size, and number of 
elements).  After your proposal, there will be only length and number of 
elements.  That's great.

However, strlen(3) came first, and we must respect it.

Since you haven't proposed eliminating "number of elements" from the standard, 
and it would still be used alongside length, I think
elementsof() would be consistent with your view (consistent with "number of 
elements").

Alternatively, you could use a new term, for example extent, for referring to 
the number of elements of an array.  That would be more respectful to 
strlen(3), keeping a strong distinction between string length and array **.

Or how about always referring to it as "number of elements"?  It's longer to 
type, but would be the most consistent approach.

Also, elementsof() is free to use, while lengthof() has a several existing 
incompatible cases (as Aaron has shown), so we can't use that name so freely.

> > I have concerns about a libc (or a predefined macro) implementation:
> > the sizeof division causes double evaluation with any VLAs, while my 
> > implementation for GCC has less cases of evaluation, and when it 
> > needs to evaluate, it only does it once.  It would be hard to find a 
> > good wording that would allow an implementation to implement this as a 
> > macro.
> 
> No, we should not allow double evaluation.
> 
> putting this in a `({})`

I would love to see a proposal for adding this GNU extension to ISO C.
Did nobody do it yet?  I could try to, if I find some time.  (But I'll take a 
longish time for that; if anyone else does it, it would be
great.)

> and doing a `typedef typeof(X) _my_type;` with the macro parameter `X` 
> at the beginning completely avoids double evaluation. So quality 
> implantations are possible, but perhaps differently and with other builtins 
> than we are imagining. Don't impose the view of one particular implementation 
> onto others.

Ahhh, good.  I haven't thought of that possibility.  Sure, that makes sense 
now.  It gives more strength to your proposal of allowing libc implementations, 
and thus require parens in the standard.

> Somewhere was brought in an argument with `offsetof`. 
> This is exactly what we need. Implementations being able to start with 
> a simple solution (as everybody did in the beginning of `offsetof`), 
> and improve that implementation at their pace when they are ready for 
> it.

Agree.

> > > this was basically what we did for `unreachable` and I think it 
> > > worked out fine.
> 
> I still think that the different options that we had there can be used 
> to ask the right questions for WG14.

I'm looking at it.  I've already taken some parts of it.  :)

Cheers,
Alex

--

[PATCH] libstdc++-v3: Handle iconv as optional for newlib builds [PR116362]

2024-08-14 Thread Hans-Peter Nilsson

Regtested cris-elf, both an older newlib (FWIW: before the
getentropy issue that I hoped to investigate before
summer...maybe next summer) and a fresh checkout, both
with/without --enable-newlib-iconv.  I'm pleasantly
surprised that it works (there are no regressions) with
newlib iconv enabled compared to without: I had to
double-check the different libstdc++-v3/config.log that it
actually *was* enabled.

Ok to commit?

-- >8 --
Support for iconv in newlib seems to have been always
assumed present by libstdc++-v3, but is default off.

Though, it hasn't been used before recent libstdc++ changes
that actually call iconv functions.  This now leads to
failures exposed by running the test-suite, unless the
newlib being used has been explicitly configured with
--enable-newlib-iconv.  When failing, there are undefined
references to iconv, iconv_open or iconv_close for multiple
tests.

Thankfully there's a macro in newlib.h that we can check to
detect presence of iconv support for the newlib build that's
used.

libstdc++-v3:
PR libstdc++/116362
* configure.ac: Check newlib configuration whether iconv is enabled.
* configure: Regenerate.
---
 libstdc++-v3/configure| 26 +-
 libstdc++-v3/configure.ac | 10 +-
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index ccb24a82be79..4049f54bd5a3 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -376,7 +376,15 @@ dnl # rather than hardcoding that information.
   frexpl hypotl ldexpl log10l logl modfl powl sinhl sinl sqrtl
   tanhl tanl])
 
-AC_DEFINE(HAVE_ICONV)
+# Support for iconv in newlib is configurable.
+AC_TRY_COMPILE([#include ], [
+  #ifndef _ICONV_ENABLED
+  #error
+  #endif], [ac_newlib_iconv_enabled=yes], [ac_newlib_iconv_enabled=no])
+if test "$ac_newlib_iconv_enabled" = yes; then
+  AC_DEFINE(HAVE_ICONV)
+fi
+
 AC_DEFINE(HAVE_MEMALIGN)
 
 case "${target}" in
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index fe525308ae28..305675eaa1e1 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -28571,7 +28571,31 @@ _ACEOF
 
 
 
-$as_echo "#define HAVE_ICONV 1" >>confdefs.h
+# Support for iconv in newlib is configurable.
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+int
+main ()
+{
+
+  #ifndef _ICONV_ENABLED
+  #error
+  #endif
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  ac_newlib_iconv_enabled=yes
+else
+  ac_newlib_iconv_enabled=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+if test "$ac_newlib_iconv_enabled" = yes; then
+  $as_echo "#define HAVE_ICONV 1" >>confdefs.h
+
+fi
 
 $as_echo "#define HAVE_MEMALIGN 1" >>confdefs.h
 
-- 
2.30.2

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Alejandro Colomar

Hi Aaron,

On Wed, Aug 14, 2024 at 02:07:16PM GMT, Ballman, Aaron wrote:
> > Ahh, context:global seems to be what I wanted.  Where is that documented?
> 
> For me it is the default when I go to https://sourcegraph.com/search but 
> there's documentation at 
> https://sourcegraph.com/docs/code-search/working/search_contexts

Ahh, no, it was a red herring.  I though that was restricting the search
to global definitions.  There's no way to restrict to definitions,
right?  I'd like a way to discard uses, since that doesn't give much
info.

But for lengthof() it seems to quickly find incomatible cases, so we
were lucky that we don't need to restrict it.

> 
> > Thanks!  I'll rename it to elementsof().
> 
> Rather than renaming it, I'd say that the name chosen in the proposed
> text is a placeholder, and have a section in the prose that describes
> different naming choices, pros and cons, suggests a name from you as
> the author, but asks WG14 to pick the final name.
> I know Jens mentioned he doesn’t like the name `elementsof` and I
> suspect if we ask five more people we'll get about seven more opinions
> on what the name could/should be. 😝

Yup, but I want to have a placeholder that would be a name that I would
like, and a defendible one.  :-)

I'll add questions at the bottom, proposing alternatives.

Cheers,
Alex

-- 

signature.asc
Description: PGP signature

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Martin Uecker

Am Mittwoch, dem 14.08.2024 um 14:52 + schrieb Ballman, Aaron:
> > I would love to see a proposal for adding this GNU extension to ISO C.
> > Did nobody do it yet?  I could try to, if I find some time.  (But I'll take 
> > a longish time for that; if anyone else
> > does it, it would be
> great.)
> 
> It's been discussed but hasn't moved forward because there are design issues 
> with it (the odd way in which it produces
> a resulting value, sometimes surprising behavior with how it interacts with 
> flow control, the fact that it can't be
> used in all contexts, etc). The committee was leaning more towards lambdas 
> despite those being a bit orthogonal.

I do not think this is a fair characterization. We did not see any proposal
for ({ }) so it is not clear where the committee is leaning more towards.

Lambdas ultimately failed because they were too complex for not 
having any implementation and user experience in C.

I agree though that lambdas could be nicer, but I still have issues
with the last type-generic version and I do not have similar objections
against ({ }).

Martin

[PATCH] match: Fix A || B not optimized to true when !B implies A [PR114326]

2024-08-14 Thread Konstantinos Eleftheriou

From: kelefth 

In expressions like (a != b || ((a ^ b) & CST0) == CST1) and
(a != b || (a ^ b) == CST), (a ^ b) is folded to false.
In the equivalent expressions (((a ^ b) & CST0) == CST1 || a != b) and
((a ^ b) == CST, (a ^ b) || a != b) this is not happening.

This patch adds the following simplifications in match.pd:
((a ^ b) & CST0) == CST1 || a != b --> 0 == (CST1 || a != b)
(a ^ b) == CST || a != b --> 0 == CST || (a != b)

PR tree-optimization/114326

gcc/ChangeLog:

* match.pd: Add two patterns to fold a ^ b to 0, when a == b.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/fold-xor-and-or-1.c: New test.
* gcc.dg/tree-ssa/fold-xor-and-or-2.c: New test.
* gcc.dg/tree-ssa/fold-xor-or-1.c: New test.
* gcc.dg/tree-ssa/fold-xor-or-2.c: New test.

Reviewed-by: Christoph Müllner 
Signed-off-by: Philipp Tomsich 
Signed-off-by: Konstantinos Eleftheriou 
---
 gcc/match.pd  | 30 +++
 .../gcc.dg/tree-ssa/fold-xor-and-or-1.c   | 17 +++
 .../gcc.dg/tree-ssa/fold-xor-and-or-2.c   | 19 
 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-1.c | 17 +++
 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-2.c | 19 
 5 files changed, 102 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index c9c8478d286..1c55bd72f09 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -10680,3 +10680,33 @@ and,
   }
   (if (full_perm_p)
(vec_perm (op@3 @0 @1) @3 @2))
+
+/* ((a ^ b) & CST0) == CST1 || a != b --> 0 == (CST1 || a != b). */
+(for cmp (simple_comparison)
+  (simplify
+(bit_ior
+  (cmp
+   (bit_and
+ (bit_xor @0 @1)
+ INTEGER_CST)
+   @3)
+(ne@4 @0 @1))
+  (bit_ior
+(cmp
+  { build_zero_cst (TREE_TYPE (@0)); }
+  @3)
+@4)))
+
+/* (a ^ b) == CST || a != b --> 0 == CST || (a != b). */
+(for cmp (simple_comparison)
+  (simplify
+(bit_ior
+  (cmp
+   (bit_xor @0 @1)
+   @2)
+  (ne@3 @0 @1))
+(bit_ior
+  (cmp
+   {build_zero_cst (TREE_TYPE (@0)); }
+   @2)
+  @3)))
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-1.c
new file mode 100644
index 000..2bcf98d93c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-1.c
@@ -0,0 +1,17 @@
+/* { dg-do-compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int cmp1(int d1, int d2) {
+  if (((d1 ^ d2) & 0xabcd) == 0 || d1 != d2)
+return 0;
+  return 1;
+}
+
+int cmp2(int d1, int d2) {
+  if (d1 != d2 || ((d1 ^ d2) & 0xabcd) == 0)
+return 0;
+  return 1;
+}
+
+/* The if should be removed, so the condition should not exist */
+/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." 
"optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c
new file mode 100644
index 000..8771897181a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c
@@ -0,0 +1,19 @@
+/* { dg-do-compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+typedef unsigned long int uint64_t;
+
+int cmp1(uint64_t d1, uint64_t d2) {
+  if (((d1 ^ d2) & 0xabcd) == 0 || d1 != d2)
+return 0;
+  return 1;
+}
+
+int cmp2(uint64_t d1, uint64_t d2) {
+  if (d1 != d2 || ((d1 ^ d2) & 0xabcd) == 0)
+return 0;
+  return 1;
+}
+
+/* The if should be removed, so the condition should not exist */
+/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." 
"optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-1.c
new file mode 100644
index 000..eb08e086014
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-1.c
@@ -0,0 +1,17 @@
+/* { dg-do-compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int cmp1(int d1, int d2) {
+  if ((d1 ^ d2) == 0xabcd || d1 != d2)
+return 0;
+  return 1;
+}
+
+int cmp2(int d1, int d2) {
+  if (d1 != d2 || (d1 ^ d2) == 0xabcd)
+return 0;
+  return 1;
+}
+
+/* The if should be removed, so the condition should not exist */
+/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." 
"optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-2.c
new file mode 100644
index 000..b621651e636
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-2.c
@@ -0,0 +1,19 @@
+/* { dg-do-compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+typedef unsigned long int uint64_t;
+
+int cmp1(uint64_t d1, uint64_t d2) {
+  if ((d1 ^ d2) == 0xabcd || d1 != d2)

Re: [PATCH] c: Enable -f{,no-}char8_t option for C/ObjC

2024-08-14 Thread Joseph Myers

On Wed, 14 Aug 2024, Jakub Jelinek wrote:

> The following patch enables the option also for C/ObjC.

I still disapprove of creating the combinatorial complexity of language 
variants with individual small features like this enabled / disabled, as I 
said in  
and .

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] c: Enable -f{,no-}char8_t option for C/ObjC

2024-08-14 Thread Jakub Jelinek

On Wed, Aug 14, 2024 at 03:29:24PM +, Joseph Myers wrote:
> On Wed, 14 Aug 2024, Jakub Jelinek wrote:
> 
> > The following patch enables the option also for C/ObjC.
> 
> I still disapprove of creating the combinatorial complexity of language 
> variants with individual small features like this enabled / disabled, as I 
> said in  
> and .

Ok, patch withdrawn.

Jakub

Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Jens Gustedt

Am 14. August 2024 16:47:32 MESZ schrieb Alejandro Colomar :
> On Wed, Aug 14, 2024 at 03:50:21PM GMT, Jens Gustedt wrote:
> > > > > 
> > > > > That said, I suspect WG14 would not be keen on standardizing
> > > > > `lengthof` without an ugly keyword given that there are plenty of 
> > > > > other uses of it that would break: 
> > > > > 
> > > > > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
> > > > > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
> > > > > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
> > > > > (and many, many others)
> > > 
> > > What regex did you use for searching?
> > > 
> > > I was thinking of renaming the proposal to elementsof(), to avoid
> > > confusion between length of an array and length of a string.  Would you
> > > mind checking if elementsof() is ok?
> > 
> > No, not for me. I really want as to go consistently to talk about
> > array length for this. Consistent terminology is important.
> 
> I understand your desire for consistency.  I think your paper is a net
> improvement over the status quo (which is a mix of length, size, and
> number of elements).  After your proposal, there will be only length and
> number of elements.  That's great.
> 
> However, strlen(3) came first, and we must respect it.

Sure,  string length, a dynamic feature, and array length are two features.

But we also have VLA and not VNEA in the standard, So we should respect this ;-)

> Since you haven't proposed eliminating "number of elements" from the
> standard, and it would still be used alongside length, I think
> elementsof() would be consistent with your view (consistent with "number
> of elements").

didn't we ? Then this is actually a good idea to do so, thanks for the idea !

"elements of" is a stretch, linguistically, because you don't mean  the
elements themselves, you are referring to their number. "elementsof" for
me would refer to a list of these elements.

> Alternatively, you could use a new term, for example extent, for
> referring to the number of elements of an array.  That would be more
> respectful to strlen(3), keeping a strong distinction between string
> length and array **.

Only that this separation doesn't exist, even now, as said, it is called
"variable length array"

> Or how about always referring to it as "number of elements"?  It's
> longer to type, but would be the most consistent approach.
> 
> Also, elementsof() is free to use, while lengthof() has a several
> existing incompatible cases (as Aaron has shown), so we can't use that
> name so freely.
> 
> > > I have concerns about a libc (or a predefined macro) implementation:
> > > the sizeof division causes double evaluation with any VLAs, while my
> > > implementation for GCC has less cases of evaluation, and when it needs
> > > to evaluate, it only does it once.  It would be hard to find a good
> > > wording that would allow an implementation to implement this as a macro.
> > 
> > No, we should not allow double evaluation.
> >  
> > putting this in a `({})`
> 
> I would love to see a proposal for adding this GNU extension to ISO C.
> Did nobody do it yet?  I could try to, if I find some time.  (But I'll
> take a longish time for that; if anyone else does it, it would be
> great.)
> 
> > and doing a `typedef typeof(X) _my_type;` with the macro parameter `X` at 
> > the beginning completely avoids double evaluation. So quality implantations 
> > are
> > possible, but perhaps differently and with other builtins than we are
> > imagining. Don't impose the view of one particular implementation onto 
> > others.
> 
> Ahhh, good.  I haven't thought of that possibility.  Sure, that makes
> sense now.  It gives more strength to your proposal of allowing libc
> implementations, and thus require parens in the standard.
> 
> > Somewhere was brought in an argument with `offsetof`. 
> > This is exactly what we need. Implementations being able to start
> > with a simple solution (as everybody did in the beginning of
> > `offsetof`), and improve that implementation at their pace when they
> > are ready for it. 
> 
> Agree.
> 
> > > > this was basically what we did for `unreachable` and I think it worked
> > > > out fine.
> > 
> > I still think that the different options that we had there can be used
> > to ask the right questions for WG14. 
> 
> I'm looking at it.  I've already taken some parts of it.  :)
> 
> Cheers,
> Alex
> 


-- 
Jens Gustedt - INRIA & ICube, Strasbourg, France

[PATCH] c++: Implement for static locals CWG 2867 - Order of initialization for structured bindings [PR115769]

2024-08-14 Thread Jakub Jelinek

On Wed, Aug 14, 2024 at 10:06:24AM +0200, Jakub Jelinek wrote:
> Though, now that I think about it again, perhaps what we could do instead
> is just make sure the _ZGVZ3barvEDC1x1y1z1wE initialization doesn't have
> a CLEANUP_POINT_EXPR in it and wrap both the _ZGVZ3barvEDC1x1y1z1wE
> and cp_finish_decomp created stuff into a single CLEANUP_POINT_EXPR.
> That way, perhaps _ZGVZ3barvEDC1x1y1z1wE could be initialized by one thread
> and _ZGVZ3barvE1x by a different, but the temporaries from 
> _ZGVZ3barvEDC1x1y1z1wE
> initialization would be only destructed after the _ZGVZ3barvE1w guard
> was released by the thread which initialized _ZGVZ3barvEDC1x1y1z1wE.

Here is the I believe ABI compatible version, which uses the separate
guard variables, so different structured binding variables can be
initialized in different threads, but the thread that did the artificial
base initialization will keep temporaries live at least until the last
guard variable is released (i.e. when even that variable has been
initialized).

Bootstrapped/regtested on x86_64-linux and i686-linux on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660354.html
patch, ok for trunk?

As for namespace scope structured bindings and this DR, all of
set_up_extended_ref_temp, cp_finish_decl -> expand_static_init and
cp_finish_decl -> cp_finish_decomp -> cp_finish_decl -> expand_static_init
in that case just push some decls into the static_aggregates or
tls_aggregates chains.
So, we can end up e.g. with the most important decl for a extended ref
temporary (which initializes some temporaries), then perhaps some more
of those, then DECL_DECOMPOSITION_P base, then n times optionally some further
extended refs and DECL_DECOMPOSITION_P non-base and I think we need
to one_static_initialization_or_destruction all of them together, by
omitting CLEANUP_POINT_EXPR from the very first one (or all until the
DECL_DECOMPOSITION_P base?), say through temporarily clearing
stmts_are_full_exprs_p and then wrapping whatever
one_static_initialization_or_destruction produces for all of those into
a single CLEANUP_POINT_EXPR argument.
Perhaps remember static_aggregates or tls_aggregates early before any
check_initializer etc. calls and then after cp_finish_decomp cut that
TREE_LIST nodes and pass that as a separate TREE_VALUE in the list.
Though, not sure what to do about modules.cc uses of these, it needs
to save/restore that stuff somehow too.

2024-08-14  Jakub Jelinek  

PR c++/115769
* decl.cc: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decl): If need_decomp_init, for function scope structure
binding bases, temporarily clear stmts_are_full_exprs_p before
calling expand_static_init, after it call cp_finish_decomp and wrap
code emitted by both into maybe_cleanup_point_expr_void and ensure
cp_finish_decomp isn't called again.

* g++.dg/DRs/dr2867-3.C: New test.
* g++.dg/DRs/dr2867-4.C: New test.

--- gcc/cp/decl.cc.jj   2024-08-13 19:18:42.170052535 +0200
+++ gcc/cp/decl.cc  2024-08-14 10:15:18.021182513 +0200
@@ -9121,7 +9121,24 @@ cp_finish_decl (tree decl, tree init, bo
 initializer.  It is not legal to redeclare a static data
 member, so this issue does not arise in that case.  */
   else if (var_definition_p && TREE_STATIC (decl))
-   expand_static_init (decl, init);
+   {
+ if (need_decomp_init && DECL_FUNCTION_SCOPE_P (decl))
+   {
+ tree sl = push_stmt_list ();
+ auto saved_stmts_are_full_exprs_p = stmts_are_full_exprs_p ();
+ current_stmt_tree ()->stmts_are_full_exprs_p = 0;
+ expand_static_init (decl, init);
+ current_stmt_tree ()->stmts_are_full_exprs_p
+   = saved_stmts_are_full_exprs_p;
+ cp_finish_decomp (decl, decomp);
+ sl = pop_stmt_list (sl);
+ sl = maybe_cleanup_point_expr_void (sl);
+ add_stmt (sl);
+ need_decomp_init = false;
+   }
+ else
+   expand_static_init (decl, init);
+   }
 }

   /* If a CLEANUP_STMT was created to destroy a temporary bound to a
--- gcc/testsuite/g++.dg/DRs/dr2867-3.C.jj  2024-08-13 21:05:42.876446125 
+0200
+++ gcc/testsuite/g++.dg/DRs/dr2867-3.C 2024-08-13 21:05:42.876446125 +0200
@@ -0,0 +1,159 @@
+// CWG2867 - Order of initialization for structured bindings.
+// { dg-do run { target c++11 } }
+// { dg-options "" }
+
+#define assert(X) do { if (!(X)) __builtin_abort(); } while (0)
+
+namespace std {
+  template struct tuple_size;
+  template struct tuple_element;
+}
+
+int a, c, d, i;
+
+struct A {
+  A () { assert (c == 3); ++c; }
+  ~A () { ++a; }
+  template  int &get () const { assert (c == 5 + I); ++c; return i; }
+};
+
+template <> struct std::tuple_size  { static const int value = 4; };
+template  struct std::tuple_element  { using type = int; };
+template <>

Re: [PATCH] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

2024-08-14 Thread Richard Sandiford

Pengxuan Zheng  writes:
> SVE's INDEX instruction can be used to populate vectors by values starting 
> from
> "base" and incremented by "step" for each subsequent value. We can take
> advantage of it to generate vector constants if TARGET_SVE is available and 
> the
> base and step values are within [-16, 15].
>
> For example, with the following function:
>
> typedef int v4si __attribute__ ((vector_size (16)));
> v4si
> f_v4si (void)
> {
>   return (v4si){ 0, 1, 2, 3 };
> }
>
> GCC currently generates:
>
> f_v4si:
>   adrpx0, .LC4
>   ldr q0, [x0, #:lo12:.LC4]
>   ret
>
> .LC4:
>   .word   0
>   .word   1
>   .word   2
>   .word   3
>
> With this patch, we generate an INDEX instruction instead if TARGET_SVE is
> available.
>
> f_v4si:
>   index   z0.s, #0, #1
>   ret
>
> [...]
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 9e12bd9711c..01bfb8c52e4 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -22960,8 +22960,7 @@ aarch64_simd_valid_immediate (rtx op, 
> simd_immediate_info *info,
>if (CONST_VECTOR_P (op)
>&& CONST_VECTOR_DUPLICATE_P (op))
>  n_elts = CONST_VECTOR_NPATTERNS (op);
> -  else if ((vec_flags & VEC_SVE_DATA)
> -&& const_vec_series_p (op, &base, &step))
> +  else if (TARGET_SVE && const_vec_series_p (op, &base, &step))

I think we need to check which == AARCH64_CHECK_MOV too.  (Previously that
wasn't necessary, because native SVE only uses this routine for moves.)

FTR: I was initially a bit nervous about testing TARGET_SVE without looking
at vec_flags at all.  But looking at the previous handling of predicates
and structures, I agree it looks like the correct thing to do.

>  {
>gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
>if (!aarch64_sve_index_immediate_p (base)
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> index 216699b0536..3d6a0160f95 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> @@ -10,7 +10,6 @@ dupq (int x)
>return svdupq_s32 (x, 1, 2, 3);
>  }
>  
> -/* { dg-final { scan-assembler {\tldr\tq[0-9]+,} } } */
> +/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #1, #2} } } */
>  /* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */
>  /* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */
> -/* { dg-final { scan-assembler {\t\.word\t1\n\t\.word\t2\n\t\.word\t3\n} } } 
> */

This seems to be a regression of sorts.  Previously we had:

adrpx1, .LC0
ldr q0, [x1, #:lo12:.LC0]
ins v0.s[0], w0
dup z0.q, z0.q[0]

whereas now we have:

moviv0.2s, 0x2
index   z31.s, #1, #2
ins v0.s[0], w0
zip1v0.4s, v0.4s, v31.4s
dup z0.q, z0.q[0]

I think we should try to aim for:

index   z0.s, #0, #1
ins v0.s[0], w0
dup z0.q, z0.q[0]

instead.

> [...]
> +/*
> +** g_v4si:
> +**   index   z0\.s, #3, #\-4

The backslash looks redundant here.

Thanks,
Richard

> +**   ret
> +*/
> +v4si
> +g_v4si (void)
> +{
> +  return (v4si){ 3, -1, -5, -9 };
> +}

Re: [PATCH] Re-add calling emit_clobber in lower-subreg.cc's resolve_simple_move.

2024-08-14 Thread Xianmiao Qu

On Tue, Aug 13, 2024 at 09:13:37AM +0100, Roger Sayle wrote:
> 
> Hi Xianmiao,
> I have no objection to reverting that original patch, if it was indeed made
> obsolete by
> later changes to the i386 backend.
> 
> The theory at the time was that it was possible for backends to define mov
> instructions
> that emitted clobbers if necessary, but it's very difficult for a backend or
> any of the
> RTL middle-end passes to eliminate/remove these clobbers (that interfere
> with some
> passes).  In the x86_64 case, the high and low parts were already in the
> correct
> registers, but the clobber caused reload/register allocation to copy them
> somewhere
> else, then copy them back again after the clobber.

I agree that we should try to eliminate clobber to reduce the generation of
redundant instructions; I just think that removing clobber in lower-subreg
is not well-suited for the current GCC framework.

As I described in the commit message, the absence of clobber could
potentially lead to the register's lifetime occupying the entire function,
according to the algorithm of the 'df_lr_bb_local_compute' function.

I tried to create a use case on X86 that could trigger this side effect,
but I wasn't able to construct one. However, my analysis revealed that
there is a coincidental aspect that causes X86 to avoid this situation.
Let me first illustrate the coincidental point I discovered on X86,
which will help us better understand the issue. Afterwards, I'll provide
examples in RISC-V to demonstrate scenarios that could lead to side effects.

For the test case:
  extern __int128 foo1(__int128, __int128);
  __int128 foo(__int128 x, __int128 y)
  {
if (x > y)
  return foo1 (x+y, x);
else
  return foo1 (y, y -x);
  }

As the CLOBBER expression has been thoroughly eliminated by your later patch,
I have reverted the commit to 
  
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d8a6945c6ea22efa4d5e42fe1922d2b27953c8cd
to observe the phenomenon. 
The X86 architecture will expand the multi-register mode move operation into
subreg assignments during the 'expand' phase (which may differ from other
architectures, particularly from certain implementations in RISC-V), as shown 
below.
  ;; basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe 
hot
  ;;  prev block 0, next block 4, flags: (NEW, REACHABLE, RTL, VISITED)
  ;;  pred:   ENTRY [always]  count:1073741824 (estimated locally) 
(FALLTHRU)
  (note 9 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
  (insn 2 9 3 2 (set (reg:DI 87)
  (reg:DI 5 di [ x ])) "test.c":4:1 -1
   (nil))
  (insn 3 2 4 2 (set (reg:DI 88)
  (reg:DI 4 si [ x+8 ])) "test.c":4:1 -1
   (nil))
  (insn 4 3 5 2 (set (reg:TI 86)
  (subreg:TI (reg:DI 87) 0)) "test.c":4:1 -1
   (nil))
  (insn 5 4 6 2 (set (subreg:DI (reg:TI 86) 8)
  (reg:DI 88)) "test.c":4:1 -1
   (nil))
  (insn 6 5 7 2 (set (reg/v:TI 85 [ x ])
  (reg:TI 86)) "test.c":4:1 -1
   (nil))

For register 86, there is only an assignment via subreg, without any clobber or 
an
assignment of the entire register. According to the algorithm of 
'df_lr_bb_local_compute',
even if register 86 has completely updated its value through subreg, it still 
cannot use the formula
  IN = (OUT & ~DEF) | USE
It will still be considered within the LR IN. And its lifetime spans the entire 
function.
Below is the LR information and instructions for the same block after `reginfo´ 
pass(Due to
the elimination of register 86, we can focus on register 85).
  ;; lr  in1 [dx] 2 [cx] 4 [si] 5 [di] 6 [bp] 7 [sp] 16 [argp] 19 
[frame] 85
  ;; lr  use   1 [dx] 2 [cx] 4 [si] 5 [di] 6 [bp] 7 [sp] 16 [argp] 19 
[frame] 85
  ;; lr  def   17 [flags] 85 87 88 89
  ;; live  in  1 [dx] 2 [cx] 4 [si] 5 [di] 6 [bp] 7 [sp] 16 [argp] 19 
[frame]
  ;; live  gen 17 [flags] 85 87 88 89
  ;; live  kill
  (note 9 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
  (insn 2 9 3 2 (set (reg:DI 87 [ x ])
  (reg:DI 5 di [ x ])) "test.c":4:1 88 {*movdi_internal}
   (expr_list:REG_DEAD (reg:DI 5 di [ x ])
  (nil)))
  (insn 3 2 48 2 (set (reg:DI 88 [ x+8 ])
  (reg:DI 4 si [ x+8 ])) "test.c":4:1 88 {*movdi_internal}
   (expr_list:REG_DEAD (reg:DI 4 si [ x+8 ])
  (nil)))
  (insn 48 3 49 2 (set (subreg:DI (reg/v:TI 85 [ x ]) 0)
  (reg:DI 87 [ x ])) "test.c":4:1 88 {*movdi_internal}
   (expr_list:REG_DEAD (reg:DI 101 [ x ])
  (nil)))
  (insn 49 48 7 2 (set (subreg:DI (reg/v:TI 85 [ x ]) 8)
  (reg:DI 88 [ x+8 ])) "test.c":4:1 88 {*movdi_internal}
   (expr_list:REG_DEAD (reg:DI 102 [+8 ])
  (nil)))
  (insn 7 49 8 2 (set (reg/v:TI 89 [ y ])
  (reg:TI 1 dx [ y ])) "test.c":4:1 87 {*movti_internal}
   (expr_list:REG_DEAD (reg:TI 1 dx [ y ])
  (nil)))
  (note 8 7 11 2 NOTE_INSN_FUNCTION_BEG)
  (insn 11 8 12 2 (set (reg:CC 17 flags)
  (compare:CC (subreg:DI (reg/v:TI 89 [ y ]) 0)
  (subreg:DI (reg/v:TI 85 [

[PATCH] libstdc++-v3: testsuite: Prune uncapitalized "in function" linker warning

2024-08-14 Thread Hans-Peter Nilsson

(CC to the dejagnu project as a heads-up)

Regtested cris-elf with a fresh newlib checkout where 2640
libstdc++-v3 tests otherwise fail due to the stubbed newlib
_getentropy.  Ok to commit?

-- >8 --
Newer newlib trigger warnings about certain functions not implemented
(_getentropy) when testing libstdc++-v3.

Since 2018 (circa binutils-2.10) the "in function" prefix isn't
capitalized for those "not implemented" warnings when generated from
the linker (a GNU ld feature used by newlib).  Dejagnu up to and
including at least dejagnu-1.6.3 (and git @ 42979bd3b9) assumes a
capital "In function", leaving that part unpruned, and boom we have
thousands of "excess errors" from the libstdc++-v3 testsuite.

While gcc/testsuite/lib/prune.exp:prune_gcc_output already deals with
this quirk with a vastly more generic pattern, I choose this simpler
tweak.

libstdc++-v3:
* testsuite/lib/prune.exp (libstdc++-dg-prune): Prune
uncapitalized "in function" warning from linker.
---
 libstdc++-v3/testsuite/lib/prune.exp | 9 +
 1 file changed, 9 insertions(+)

diff --git a/libstdc++-v3/testsuite/lib/prune.exp 
b/libstdc++-v3/testsuite/lib/prune.exp
index 071dcf34c1e8..4250e2d39e7d 100644
--- a/libstdc++-v3/testsuite/lib/prune.exp
+++ b/libstdc++-v3/testsuite/lib/prune.exp
@@ -80,6 +80,15 @@ proc libstdc++-dg-prune { system text } {
 # Ignore dsymutil warning (tool bug is actually in the linker)
 regsub -all "(^|\n)\[^\n\]*could not find object file symbol for 
symbol\[^\n\]*" $text "" text
 
+# This pattern, except requiring a capitalized "In" and with a
+# sub-pattern matching a subsequent line "is not implemented and will
+# always fail", is part of the standard dejagnu prune_warnings function.
+# There's also a separate single-line pattern pruning the "is not
+# implemented and will always fail".  Since that pattern is processed
+# before this ${tool}-dg-prune function is called, we have to handle
+# the single uncapitalized "in function" line.
+regsub -all "(^|\n)\[^\n\]*: in function\[^\n\]*" $text "" text
+
 # If exceptions are disabled, mark tests expecting exceptions to be enabled
 # as unsupported.
 if { ![check_effective_target_exceptions_enabled] } {
-- 
2.30.2

Re: [PATCH] c++/coroutines: fix passing *this to promise type, again [PR116327]

2024-08-14 Thread Jason Merrill


On 8/14/24 9:47 AM, Patrick Palka wrote:

On Tue, 13 Aug 2024, Jason Merrill wrote:


On 8/13/24 7:52 PM, Patrick Palka wrote:

On Tue, 13 Aug 2024, Jason Merrill wrote:


On 8/12/24 10:01 PM, Patrick Palka wrote:

Tested on x86_64-pc-linux-gnu, does this look OK for trunk/14?

-- >8 --

In r15-2210 we got rid of the unnecessary cast to lvalue reference when
passing *this to the promise type ctor, and as a drive-by change we also
simplified the code to use cp_build_fold_indirect_ref.

But cp_build_fold_indirect_ref apparently does too much here, namely
it has a shortcut for returning current_class_ref if the operand is
current_class_ptr.  The problem with that shortcut is current_class_ref
might have gotten clobbered earlier if it appeared in the function body,
since rewrite_param_uses walks and rewrites in-place all local variable
uses to their corresponding frame copy.

So later this cp_build_fold_indirect_ref for *__closure will instead
return
the mutated current_class_ref i.e. *frame_ptr->__closure, which doesn't
make sense here since we're in the ramp function and not the actor
function
where frame_ptr is in scope.

This patch fixes this by building INDIRECT_REF directly instead of using
cp_build_fold_indirect_ref.  (Another approach might be to restore an
unshare_expr'd current_class_ref after doing coro_rewrite_function_body
to avoid it remaining clobbered after the rewriting process.  Yet
another more ambitious approach might be to avoid this tree sharing in
the first place by returning unshared versions of current_class_ref from
maybe_dummy_object etc.)


Maybe clear current_class_ptr/ref in coro rewriting so we don't hit the
shortcut?


That seems to work, but I'm kind of worried about what other code paths
that'd disable, particularly semantic code paths vs just optimizations
code paths such as the cp_build_fold_indirect_ref shortcut.  IIUC the
ramp function has the same signature as the original presumably non-static
member function so ideally current class ref should remain set when
building the ramp function body and cleared only when building/rewriting
the actor function body (which is never a non-static member function and
so doesn't have a this pointer, I think?).

We do the actor body stuff first however, so even if we clear
current_class_ref then, the restored current_class_ref during the
later ramp function body stuff (including during the call to
cp_build_fold_indirect_ref) will still be clobbered :(

So ISTM this more narrow approach might be preferable unless we ever run
into another instance of this current_class_ref clobbering issue?


Fair enough.

Is there a reason not to use build_fold_indirect_ref (without cp_)?


Not AFAICT, works for me.  Like so?  I also extended the 104981 test so
that it too triggers the issues.


OK with a comment about why the cp_ version is problematic.


-- >8 --

Subject: [PATCH] c++/coroutines: fix passing *this to promise type, again
  [PR116327]

In r15-2210 we got rid of the unnecessary cast to lvalue reference when
passing *this to the promise type ctor, and as a drive-by change we also
simplified the code to use cp_build_fold_indirect_ref.

But cp_build_fold_indirect_ref apparently does too much here, namely
it has a shortcut for returning current_class_ref if the operand is
current_class_ptr.  The problem with that shortcut is current_class_ref
might have gotten clobbered earlier if it appeared in the function body,
since rewrite_param_uses walks and rewrites in-place all local variable
uses to their corresponding frame copy.

So later this cp_build_fold_indirect_ref for *__closure will instead return
the mutated current_class_ref i.e. *frame_ptr->__closure, which doesn't
make sense here since we're in the ramp function and not the actor function
where frame_ptr is in scope.

This patch fixes this by using the build_fold_indirect_ref instead of
cp_build_fold_indirect_ref.

PR c++/116327
PR c++/104981
PR c++/115550

gcc/cp/ChangeLog:

* coroutines.cc (morph_fn_to_coro): Use build_fold_indirect_ref
instead of cp_build_fold_indirect_ref.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr104981-preview-this.C: Improve coverage by
adding a non-static data member use within the coroutine member
function.
* g++.dg/coroutines/pr116327-preview-this.C: New test.
---
  gcc/cp/coroutines.cc  |  4 ++--
  .../g++.dg/coroutines/pr104981-preview-this.C |  4 +++-
  .../g++.dg/coroutines/pr116327-preview-this.C | 22 +++
  3 files changed, 27 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116327-preview-this.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 145ec4b1d16..b1eae94a957 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4850,7 +4850,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
  if (parm_i->this_ptr || parm_i->lambda_cobj)
{

Re: [PATCH] c++: c->B::m access resolved through current inst [PR116320]

2024-08-14 Thread Jason Merrill


On 8/14/24 10:50 AM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and later backports?


Yes.


-- >8 --

Here when checking the access of (the injected-class-name) B in c->B::m
at parse time, we notice its scope B (now the type) is a base of the
object type C, so we proceed to use C as qualifying type.  But
this C is the dependent specialization not the primary template type,
so it has empty TYPE_BINFO which leads to a segfault later from
perform_or_defer_access_check.

The reason DERIVED_FROM_P / lookup_base returns true despite the object
type having empty TYPE_BINFO is because of its currently_open_class logic
(added in r9-713-gd9338471b91bbe) which replaces a dependent specialization
with the primary template type if we're inside the latter.  So the safest
fix seems to be to use currently_open_class in the caller as well.

PR c++/116320

gcc/cp/ChangeLog:

* semantics.cc (check_accessibility_of_qualified_id): Use
currently_open_class when the object type is derived from the
scope of the declaration being accessed.

gcc/testsuite/ChangeLog:

* g++.dg/template/access42.C: New test.
---
  gcc/cp/semantics.cc  | 11 ---
  gcc/testsuite/g++.dg/template/access42.C | 17 +
  2 files changed, 25 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/access42.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index e58612660c9..5ab2076b673 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -2516,9 +2516,14 @@ check_accessibility_of_qualified_id (tree decl,
 OBJECT_TYPE.  */
&& CLASS_TYPE_P (object_type)
&& DERIVED_FROM_P (scope, object_type))
-/* If we are processing a `->' or `.' expression, use the type of the
-   left-hand side.  */
-qualifying_type = object_type;
+{
+  /* If we are processing a `->' or `.' expression, use the type of the
+left-hand side.  */
+  if (tree open = currently_open_class (object_type))
+   qualifying_type = open;
+  else
+   qualifying_type = object_type;
+}
else if (nested_name_specifier)
  {
/* If the reference is to a non-static member of the
diff --git a/gcc/testsuite/g++.dg/template/access42.C 
b/gcc/testsuite/g++.dg/template/access42.C
new file mode 100644
index 000..f1dcbce80c2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access42.C
@@ -0,0 +1,17 @@
+// PR c++/116320
+// { dg-do compile { target c++11 } }
+
+template struct C;
+template using C_ptr = C*;
+
+struct B { int m; using B_typedef = B; };
+
+template
+struct C : B {
+  void f(C_ptr c) {
+c->B::m;
+c->B_typedef::m;
+  }
+};
+
+template struct C;

Re: [PATCH] s390: Fix high-level builtins vec_gfmsum{,_accum}_128

2024-08-14 Thread Andreas Krebbel


On 8/8/24 20:28, Stefan Schulze Frielinghaus wrote:

Starting with r14-9449-g9f2b16ce1efef0 builtins were streamlined with
those in LLVM.  In particular s390_vgfm{,a}g have been changed from
UV16QI to UINT128 in order to match those in LLVM.  However, these
low-level builtins are directly used by the high-level builtins
vec_gfmsum{,_accum}_128 which expect UV16QI instead.  Therefore,
introduce new low-level builtins s390_vgfm{,a}g_128 and make use of
them, respectively.

Bootstrapped on s390.  Ok for mainline and releases/gcc-14?

gcc/ChangeLog:

* config/s390/s390-builtin-types.def (BT_FN_UV16QI_UV2DI_UV2DI):
New.
(BT_FN_UV16QI_UV2DI_UV2DI_UV16QI): New.
* config/s390/s390-builtins.def (s390_vgfmg_128): New.
(s390_vgfmag_128): New.
* config/s390/vecintrin.h (vec_gfmsum_128): Use s390_vgfmg_128.
(vec_gfmsum_accum_128): Use s390_vgfmag_128.


Ok. Thanks!

Andreas

Re: [PATCH] s390: Remove vector intrinsics

2024-08-14 Thread Andreas Krebbel


On 8/8/24 20:29, Stefan Schulze Frielinghaus wrote:

The following intrinsics are not implemented.  Thus, remove them.

Ok for mainline?

gcc/ChangeLog:

* config/s390/vecintrin.h (vec_vstbrh): Remove.
(vec_vstbrf): Remove.
(vec_vstbrg): Remove.
(vec_vstbrq): Remove.
(vec_vstbrf_flt): Remove.
(vec_vstbrg_dbl): Remove.
(vec_vsterb): Remove.
(vec_vsterh): Remove.
(vec_vsterf): Remove.
(vec_vsterg): Remove.
(vec_vsterf_flt): Remove.
(vec_vsterg_dbl): Remove.


Ok. Thanks!

Andreas

Re: [PATCH-1v4] Value Range: Add range op for builtin isinf

2024-08-14 Thread Vineet Gupta

Ping - looks like this is blocking the patches for builtin_isnormal and 
builtin_isfinite !

Thx,
-Vineet

On 8/5/24 07:51, Jeff Law wrote:
>
> On 7/23/24 4:39 PM, Andrew MacLeod wrote:
>> the range is in r, and is set to [0,0].  this is the false part of what 
>> is being returned for the range.
>>
>> the "return true" indicates we determined a range, so use what is in R.
>>
>> returning false means we did not find a range to return, so r is garbage.
> Duh.  I guess I should have realized that.  I'll have to take another 
> look at Hao's patch.  It's likely OK, but let me take another looksie.
>
> jeff
>
>

Re: [PATCH-1v4] Value Range: Add range op for builtin isinf

2024-08-14 Thread Sam James

Vineet Gupta  writes:

> Ping - looks like this is blocking the patches for builtin_isnormal and 
> builtin_isfinite !
>

See 
https://inbox.sourceware.org/gcc-patches/d9459db0-7301-40f6-a3cf-077017b8c...@gmail.com/.

It appears to be approved.

(Please also avoid topposting.)

> Thx,
> -Vineet
>
> On 8/5/24 07:51, Jeff Law wrote:
>>
>> On 7/23/24 4:39 PM, Andrew MacLeod wrote:
>>> the range is in r, and is set to [0,0].  this is the false part of what 
>>> is being returned for the range.
>>>
>>> the "return true" indicates we determined a range, so use what is in R.
>>>
>>> returning false means we did not find a range to return, so r is garbage.
>> Duh.  I guess I should have realized that.  I'll have to take another 
>> look at Hao's patch.  It's likely OK, but let me take another looksie.
>>
>> jeff
>>
>>

Re: [PATCH-1v4] Value Range: Add range op for builtin isinf

2024-08-14 Thread Vineet Gupta




On 8/14/24 11:38, Sam James wrote:
>> Ping - looks like this is blocking the patches for builtin_isnormal and 
>> builtin_isfinite !
>>
> See 
> https://inbox.sourceware.org/gcc-patches/d9459db0-7301-40f6-a3cf-077017b8c...@gmail.com/.
>
> It appears to be approved.

Sorry, should have refreshed my newsreader feed before posting.


> (Please also avoid topposting.)

Sure, I generally do, except gcc pings seem to be top posted.

Thx,
-Vineet

>
>> Thx,
>> -Vineet
>>
>> On 8/5/24 07:51, Jeff Law wrote:
>>> On 7/23/24 4:39 PM, Andrew MacLeod wrote:
 the range is in r, and is set to [0,0].  this is the false part of what 
 is being returned for the range.

 the "return true" indicates we determined a range, so use what is in R.

 returning false means we did not find a range to return, so r is garbage.
>>> Duh.  I guess I should have realized that.  I'll have to take another 
>>> look at Hao's patch.  It's likely OK, but let me take another looksie.
>>>
>>> jeff
>>>
>>>

Re: [PATCH] Fortran: fix minor frontend GMP leaks

2024-08-14 Thread Harald Anlauf


Hi Andre,

Am 14.08.24 um 07:53 schrieb Andre Vehreschild:

Hi Harald,

I had a hard time to figure why this is correct, when gfc_array_size() returned
false, but now I get it. Ok to commit.


I know that reading code is always twice as hard as writing it... ;-)

Thanks for checking.  Pushed as r15-2917-ga82c4dfe52dac3 .

Harald


- Andre

On Tue, 13 Aug 2024 21:25:31 +0200
Harald Anlauf  wrote:


Dear all,

while running f951 under valgrind on testcase gfortran.dg/sizeof_6.f90
I found two minor memleaks with GMP variables that were not cleared.

Regtested on x86_64-pc-linux-gnu.

I intend to commit to mainline soon unless there are comments.

(And no, this does not address the recent intermittent runtime failures
reported in pr116261).

Thanks,
Harald




--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [Fortran, Patch, PR110033, v1] Fix associate for coarrays

2024-08-14 Thread Harald Anlauf


Hi Andre,

Am 12.08.24 um 14:11 schrieb Andre Vehreschild:

Hi all,

the attached two patches fix ASSOCIATE for coarrays, i.e. that a coarray
associated to a variable is also a coarray in the block of the ASSOCIATE
command. The patch has two parts:

1. pr110033p1_1.patch: Adds a corank member to the gfc_expr structure. I
decided to add it here and keep track of the corank of an expression, because
calling gfc_get_corank was getting to expensive with the associate patch. This
patch also improves the usage of coarrays in select type/rank constructs.

2. pr110033p2_1.patch: The changes and testcase for PR 110033. In essence the
coarray is not detected correctly on the expression to associate to and
therefore not propagated correctly into the block of the ASSOCIATE command. The
patch adds correct treatment for propagating the coarray token into the block,
too.

The costs of tracking the corank along side to the rank of an expression are
about 30 seconds real user time (i.e. time's "real" row) on a rather old Intel
i7-5775C@3.3GHz  with 24G RAM that was used for work during the test. If need be
I can tuned that more.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?


Paul already gave a basic OK, and I won't object.

However, the testcase should be fixed.  It is only correct for
single-image runs!  (Verified with Intel ifx).

You have:

  associate (y => x)
y = -1
y[1] = 35
  end associate

and check:

  if (x /= 35) stop 1

This should rather be

  if (x[1] /= 35) stop 1

or for number of images > 1:

  if (this_image() == 1) then
 if (x /= 35) stop 1
  else
 if (x /= -1) stop 99
  end if

and similarly

  if (.NOT. c%l) stop 3

needs to be adjusted accordingly.

Thanks,
Harald


Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [PATCH] ltmain.sh: allow more flags at link-time

2024-08-14 Thread Eric Gallager

On Wed, Aug 14, 2024 at 8:50 AM Sam James  wrote:
>
> libtool defaults to filtering flags passed at link-time.
>
> This brings the filtering in GCC's 'fork' of libtool into sync with
> upstream libtool commit 22a7e547e9857fc94fe5bc7c921d9a4b49c09f8e.

I think it'd be worthwhile to link to the upstream commit in the
ChangeLog / commit message, too. Also, are you sure that's the right
one? It looks just like a version revbump commit to me:
https://git.savannah.gnu.org/cgit/libtool.git/commit/?id=22a7e547e9857fc94fe5bc7c921d9a4b49c09f8e

>
> In particular, this now allows some harmless diagnostic flags (especially
> useful for things like -Werror=odr), more optimization flags, and some
> Clang-specific options.
>
> GCC's -flto documentation mentions:
> > To use the link-time optimizer, -flto and optimization options should be
> > specified at compile time and during the final link. It is recommended
> > that you compile all the files participating in the same link with the
> > same options and also specify those options at link time.
>
> This allows compliance with that.
>
> * ltmain.sh (func_mode_link): Allow various flags through filter.
> ---
> We have been using this for a while now downstream.
>
> H.J., please take a look.
>
> I think this also explains 
> https://src.fedoraproject.org/rpms/binutils/blob/rawhide/f/binutils.spec#_947.
>
>  ltmain.sh | 46 ++
>  1 file changed, 34 insertions(+), 12 deletions(-)
>
> diff --git a/ltmain.sh b/ltmain.sh
> index 493e83c36f14..79cd7c57f42e 100644
> --- a/ltmain.sh
> +++ b/ltmain.sh
> @@ -4966,19 +4966,41 @@ func_mode_link ()
> arg="$func_quote_for_eval_result"
> ;;
>
> -  # -64, -mips[0-9] enable 64-bit mode on the SGI compiler
> -  # -r[0-9][0-9]* specifies the processor on the SGI compiler
> -  # -xarch=*, -xtarget=* enable 64-bit mode on the Sun compiler
> -  # +DA*, +DD* enable 64-bit mode on the HP compiler
> -  # -q* pass through compiler args for the IBM compiler
> -  # -m*, -t[45]*, -txscale* pass through architecture-specific
> -  # compiler args for GCC
> -  # -F/path gives path to uninstalled frameworks, gcc on darwin
> -  # -p, -pg, --coverage, -fprofile-* pass through profiling flag for GCC
> -  # @file GCC response files
> -  # -tp=* Portland pgcc target processor selection
> +  # Flags to be passed through unchanged, with rationale:
> +  # -64, -mips[0-9]  enable 64-bit mode for the SGI compiler
> +  # -r[0-9][0-9]*specify processor for the SGI compiler
> +  # -xarch=*, -xtarget=* enable 64-bit mode for the Sun compiler
> +  # +DA*, +DD*   enable 64-bit mode for the HP compiler
> +  # -q*  compiler args for the IBM compiler
> +  # -m*, -t[45]*, -txscale* architecture-specific flags for GCC
> +  # -F/path  path to uninstalled frameworks, gcc on darwin
> +  # -p, -pg, --coverage, -fprofile-*  profiling flags for GCC
> +  # -fstack-protector*   stack protector flags for GCC
> +  # @fileGCC response files
> +  # -tp=*Portland pgcc target processor selection
> +  # -O*, -g*, -flto*, -fwhopr*, -fuse-linker-plugin GCC link-time 
> optimization
> +  # -specs=* GCC specs files
> +  # -stdlib=*select c++ std lib with clang
> +  # -fdiagnostics-color* simply affects output
> +  # -frecord-gcc-switches used to verify flags were respected
> +  # -fsanitize=* Clang/GCC memory and address sanitizer
> +  # -fno-sanitize*   Clang/GCC memory and address sanitizer
> +  # -shared-libsan   Link with shared sanitizer runtimes (Clang)
> +  # -static-libsan   Link with static sanitizer runtimes (Clang)
> +  # -fuse-ld=*   Linker select flags for GCC
> +  # -rtlib=* select c runtime lib with clang
> +  # --unwindlib=*select unwinder library with clang
> +  # -f{file|debug|macro|profile}-prefix-map=* needed for lto linking
> +  # -Wa,*Pass flags directly to the assembler
> +  # -Werror, -Werror=*   Report (specified) warnings as errors
>-64|-mips[0-9]|-r[0-9][0-9]*|-xarch=*|-xtarget=*|+DA*|+DD*|-q*|-m*| \
> -  -t[45]*|-txscale*|-p|-pg|--coverage|-fprofile-*|-F*|@*|-tp=*)
> +  -t[45]*|-txscale*|-p|-pg|--coverage|-fprofile-*|-F*|@*|-tp=*| \
> +  -O*|-g*|-flto*|-fwhopr*|-fuse-linker-plugin|-fstack-protector*| \
> +  -stdlib=*|-rtlib=*|--unwindlib=*| \
> +  -specs=*|-fsanitize=*|-fno-sanitize*|-shared-libsan|-static-libsan| \
> +  
> -ffile-prefix-map=*|-fdebug-prefix-map=*|-fmacro-prefix-map=*|-fprofile-prefix-map=*|
>  \
> +  -fdiagnostics-color*|-frecord-gcc-switches| \
> +  -fuse-ld=*|-Wa,*|-Werror|-Werror=*)
>  func_quote_for_eval "$arg"
> arg="$func_quote_for_eval_result"
>  func_append compile_command " $arg"
> --
> 2.45.2
>

Re: [PATCH v2] c++: ICE with NSDMIs and fn arguments [PR116015]

2024-08-14 Thread Marek Polacek

On Tue, Aug 13, 2024 at 03:12:01PM -0700, Jason Merrill wrote:
> On 8/12/24 7:21 PM, Marek Polacek wrote:
> > On Fri, Aug 09, 2024 at 05:15:05PM -0400, Jason Merrill wrote:
> > > On 8/9/24 4:21 PM, Marek Polacek wrote:
> > > > On Fri, Aug 09, 2024 at 12:58:34PM -0400, Jason Merrill wrote:
> > > > > On 8/8/24 1:37 PM, Marek Polacek wrote:
> > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > 
> > > > > > -- >8 --
> > > > > > The problem in this PR is that we ended up with
> > > > > > 
> > > > > >  {.rows=(&)->n,
> > > > > >   .outer_stride=(&)->rows}
> > > > > > 
> > > > > > that is, two PLACEHOLDER_EXPRs for different types on the same level
> > > > > > in one { }.  That should not happen; we may, for instance, neglect 
> > > > > > to
> > > > > > replace a PLACEHOLDER_EXPR due to CONSTRUCTOR_PLACEHOLDER_BOUNDARY 
> > > > > > on
> > > > > > the constructor.
> > > > > > 
> > > > > > The same problem happened in PR100252, which I fixed by introducing
> > > > > > replace_placeholders_for_class_temp_r.  That didn't work here, 
> > > > > > though,
> > > > > > because r_p_for_c_t_r only works for non-eliding TARGET_EXPRs: 
> > > > > > replacing
> > > > > > a PLACEHOLDER_EXPR with a temporary that is going to be elided will
> > > > > > result in a crash in gimplify_var_or_parm_decl when it encounters 
> > > > > > such
> > > > > > a loose decl.
> > > > > > 
> > > > > > But leaving the PLACEHOLDER_EXPRs in is also bad because then we end
> > > > > > up with this PR.
> > > > > > 
> > > > > > TARGET_EXPRs for function arguments are elided in gimplify_arg.  The
> > > > > > argument will get a real temporary only in get_formal_tmp_var.  One
> > > > > > idea was to use the temporary that is going to be elided anyway, and
> > > > > > then replace_decl it with the real object once we get it.  But that
> > > > > > didn't work out: one problem is that we elide the TARGET_EXPR for an
> > > > > > argument before we create the real temporary for the argument, and
> > > > > > when we get it, the context that this was a TARGET_EXPR for an 
> > > > > > argument
> > > > > > has been lost.  We're also in the middle end territory now, even 
> > > > > > though
> > > > > > this is a C++-specific problem.
> > > > > 
> > > > > How complex!
> > > > > 
> > > > > > I figured that since the to-be-elided temporary is going to stay 
> > > > > > around
> > > > > > until gimplification, the front end is free to use it.  Once we're 
> > > > > > done
> > > > > > with things like store_init_value, which replaces PLACEHOLDER_EXPRs 
> > > > > > with
> > > > > > the decl it is initializing, we can turn those to-be-elided 
> > > > > > temporaries
> > > > > > into PLACEHOLDER_EXPRs again, so that cp_gimplify_init_expr can 
> > > > > > replace
> > > > > > them with the real object once available.  The context is not lost 
> > > > > > so we
> > > > > > do not need an extra flag for these makeshift temporaries.
> > > > > 
> > > > > Clever, that makes a lot of sense.  But I wonder if we can avoid the 
> > > > > problem
> > > > > more simply than working around it?
> > > > > 
> > > > > I see that the get_formal_tmp_var happens directly from gimplify_arg, 
> > > > > so it
> > > > > strips the TARGET_EXPR to avoid a temporary...and then immediately 
> > > > > turns
> > > > > around and creates a new temporary.
> > > > > 
> > > > > Would it work to stop stripping the TARGET_EXPR in gimplify_arg and
> > > > > therefore stop setting TARGET_EXPR_ELIDING_P in 
> > > > > convert_for_arg_passing?
> > > > 
> > > > Well, it does fix the ICE.  But then a number of testcases fail :(.
> > > > For instance, pr23372.C.  .gimple diff w/ and w/o stripping the 
> > > > TARGET_EXPR:
> > > > 
> > > > @@ -1,6 +1,9 @@
> > > >void g (struct A * a)
> > > >{
> > > > -  f (MEM[(const struct A &)a]);
> > > > +  struct A D.2829;
> > > > +
> > > > +  D.2829 = MEM[(const struct A &)a];
> > > > +  f (D.2829);
> > > >}
> > > > 
> > > > The extra copy is there even in .optimized with -O2.
> > > > 
> > > > 
> > > > It's always sad when we have to add complicated code just to work around
> > > > a corner case, but the above pessimization looks pretty important :(.
> > > 
> > > Ah, good point.  In that case, the stripping avoids the copy because the
> > > TARGET_EXPR_INITIAL is already (adjustable into) a suitable lvalue.  The
> > > current code already fails to avoid the redundant copy when _INITIAL is a
> > > CONSTRUCTOR:
> > > 
> > > void g (struct A * a)
> > > {
> > >struct A D.2805;
> > > 
> > >D.2805 = {}; // boo
> > >f (D.2805);
> > > }
> > > 
> > > I'm failing to find the PR about this issue.
> > 
> > I also haven't found it (that doesn't mean it doesn't exist :)).  I can file
> > one if you'd like...
> > 
> > Note that if we do fix that, we may be facing this problem again.
> 
> Please do.  The way I would expect the bug to get fixed would be to assign
> the temporary the location of the argument slot, and then recognize th

Re: [PATCH] ltmain.sh: allow more flags at link-time

2024-08-14 Thread Sam James

Eric Gallager  writes:

> On Wed, Aug 14, 2024 at 8:50 AM Sam James  wrote:
>>
>> libtool defaults to filtering flags passed at link-time.
>>
>> This brings the filtering in GCC's 'fork' of libtool into sync with
>> upstream libtool commit 22a7e547e9857fc94fe5bc7c921d9a4b49c09f8e.
>
> I think it'd be worthwhile to link to the upstream commit in the
> ChangeLog / commit message, too. Also, are you sure that's the right
> one? It looks just like a version revbump commit to me:
> https://git.savannah.gnu.org/cgit/libtool.git/commit/?id=22a7e547e9857fc94fe5bc7c921d9a4b49c09f8e

'as of' meaning "this is the state of the repository when I
checked", so if you want to check my work, you should checkout
libtool.git at that commit and compare the product.

There is no single commit which does this, it was done over
many commits over many years. It's not worth trying to dig those many
commits up, IMO.

signature.asc
Description: PGP signature

[PATCH] testuite: Accept vmov.f64

2024-08-14 Thread Torbjörn SVENSSON

Ok for trunk and releases/gcc-14?

--

On Cortex-M55 with fpv5-d16, the vmov.f64 instruction is used.

gcc/testsuite/ChangeLog:

* armv8_1m-fp64-move-1.c: Accept vmov.f64 instruction.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c
index d236f0826c3..44abfcf1518 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c
@@ -2,7 +2,7 @@
 /* { dg-options "-O" } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
-/* { dg-additional-options "-mfloat-abi=hard" } *
+/* { dg-additional-options "-mfloat-abi=hard" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 /*
@@ -39,6 +39,8 @@ w_r ()
 ** |
 ** vmov.f32s3, s1
 ** vmov.f32s2, s0
+** |
+** vmov.f64d1, d0
 ** )
 ** bx  lr
 */
-- 
2.25.1

Re: [PATCH] Use add_name_and_src_coords_attributes in modified_type_die

2024-08-14 Thread Tom Tromey

> "Tom" == Tom Tromey  writes:

Tom> While working on a patch to the Ada compiler, I found a spot in
Tom> dwarf2out.cc that calls add_name_attribute where a call to
Tom> add_name_and_src_coords_attributes would be better, because the latter
Tom> respects DECL_NAMELESS.

Tom> gcc

Tom>* dwarf2out.cc (modified_type_die): Call
Tom>add_name_and_src_coords_attributes for type decls.

Ping.

Tom

[PATCH v9 0/3] c: Add elementsof operator

2024-08-14 Thread Alejandro Colomar

Hi!

v9:

-  Rename s/lengthof/elementsof/

   There are existing lengthof() functions in the wild, which are
   incompatible with (completely unrelated to) this operator.

   In the case of elementsof(), we only found macros that expand to the
   usual sizeof division (plus safety checks in some cases), so this one
   would be compatible.  [Aaron]

   ISO C uses "number of elements" and "length" indistinctly for
   referring to the number of elements of an array, so this name should
   also be obvious.

   Also, lengthof() might induce to ambiguity in contexts where string
   lengths are used, due to the overload of the term length.
   elementsof() is free of that ambiguity.

   I guess elementsof() has slightly more chances of being later
   accepted into ISO C than lengthof(), due to backwards compatibility
   with those existing functions named lengthof().

-  Cc: += Daniel, A., Eugene, Aaron, Paul

I'll send as a reply the updated draft for a WG14 C2y proposal
incorporating these changes.

As usual, below is a range diff against v8.

Have a lovely day!
Alex


Alejandro Colomar (3):
  gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()
  Merge definitions of array_type_nelts_top()
  c: Add __elementsof__ operator

 gcc/c-family/c-common.cc  |  26 
 gcc/c-family/c-common.def |   3 +
 gcc/c-family/c-common.h   |   2 +
 gcc/c/c-decl.cc   |  31 +++--
 gcc/c/c-fold.cc   |   7 +-
 gcc/c/c-parser.cc |  62 +++--
 gcc/c/c-tree.h|   4 +
 gcc/c/c-typeck.cc | 118 -
 gcc/config/aarch64/aarch64.cc |   2 +-
 gcc/config/i386/i386.cc   |   2 +-
 gcc/cp/cp-tree.h  |   1 -
 gcc/cp/decl.cc|   2 +-
 gcc/cp/init.cc|   8 +-
 gcc/cp/lambda.cc  |   3 +-
 gcc/cp/operators.def  |   1 +
 gcc/cp/tree.cc|  13 --
 gcc/doc/extend.texi   |  30 +
 gcc/expr.cc   |   8 +-
 gcc/fortran/trans-array.cc|   2 +-
 gcc/fortran/trans-openmp.cc   |   4 +-
 gcc/rust/backend/rust-tree.cc |  13 --
 gcc/rust/backend/rust-tree.h  |   2 -
 gcc/target.h  |   3 +
 gcc/testsuite/gcc.dg/elementsof-compile.c | 115 +
 gcc/testsuite/gcc.dg/elementsof-vla.c |  46 +++
 gcc/testsuite/gcc.dg/elementsof.c | 150 ++
 gcc/tree.cc   |  17 ++-
 gcc/tree.h|   3 +-
 28 files changed, 599 insertions(+), 79 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/elementsof-compile.c
 create mode 100644 gcc/testsuite/gcc.dg/elementsof-vla.c
 create mode 100644 gcc/testsuite/gcc.dg/elementsof.c

Range-diff against v8:
1:  a6aa38c9013 = 1:  a6aa38c9013 gcc/: Rename array_type_nelts() => 
array_type_nelts_minus_one()
2:  43300a17e4a ! 2:  4ce16ee4dfe Merge definitions of array_type_nelts_top()
@@ Commit message
 Merge definitions of array_type_nelts_top()
 
 There were two identical definitions, and none of them are available
-where they are needed for implementing __lengthof__.  Merge them, and
+where they are needed for implementing __elementsof__.  Merge them, and
 provide the single definition in gcc/tree.{h,cc}, where it's available
-for __lengthof__, which will be added in the following commit.
+for __elementsof__, which will be added in the following commit.
 
 gcc/ChangeLog:
 
3:  e6af87d54af ! 3:  caae5dbecb3 c: Add __lengthof__ operator
@@ Metadata
 Author: Alejandro Colomar 
 
  ## Commit message ##
-c: Add __lengthof__ operator
+c: Add __elementsof__ operator
 
 This operator is similar to sizeof but can only be applied to an array,
-and returns its length (number of elements).
+and returns its number of elements.
 
 FUTURE DIRECTIONS:
 
 -  We should make it work with array parameters to functions,
-   and somehow magically return the length designator of the array,
+   and somehow magically return the number of elements of the array,
regardless of it being really a pointer.
 
 -  Fix support for [0].
@@ Commit message
 Cc: Florian Weimer 
 Cc: Andreas Schwab 
 Cc: Timm Baeder 
+Cc: Daniel Plakosh 
+Cc: "A. Jiang" 
+Cc: Eugene Zelenko 
+Cc: Aaron Ballman 
+Cc: Paul Koning 
 
 gcc/ChangeLog:
 
-* doc/extend.texi: Document __lengthof__ operator.
-* target.h (enum type_context_kind): Add __lengthof__ operator.
+* doc/ex

[PATCH v9 1/3] gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()

2024-08-14 Thread Alejandro Colomar

The old name was misleading.

While at it, also rename some temporary variables that are used with
this function, for consistency.

Link: 
https://inbox.sourceware.org/gcc-patches/9fffd80-dca-2c7e-14b-6c9b509a7...@redhat.com/T/#m2f661c67c8f7b2c405c8c7fc3152dd85dc729120
Cc: Gabriel Ravier 
Cc: Martin Uecker 
Cc: Joseph Myers 
Cc: Xavier Del Campo Romero 
Cc: Jakub Jelinek 

gcc/ChangeLog:

* tree.cc (array_type_nelts, array_type_nelts_minus_one):
* tree.h (array_type_nelts, array_type_nelts_minus_one):
* expr.cc (count_type_elements):
* config/aarch64/aarch64.cc
(pure_scalable_type_info::analyze_array):
* config/i386/i386.cc (ix86_canonical_va_list_type):
Rename array_type_nelts() => array_type_nelts_minus_one()
The old name was misleading.

gcc/c/ChangeLog:

* c-decl.cc (one_element_array_type_p, get_parm_array_spec):
* c-fold.cc (c_fold_array_ref):
Rename array_type_nelts() => array_type_nelts_minus_one()

gcc/cp/ChangeLog:

* decl.cc (reshape_init_array):
* init.cc
(build_zero_init_1):
(build_value_init_noctor):
(build_vec_init):
(build_delete):
* lambda.cc (add_capture):
* tree.cc (array_type_nelts_top):
Rename array_type_nelts() => array_type_nelts_minus_one()

gcc/fortran/ChangeLog:

* trans-array.cc (structure_alloc_comps):
* trans-openmp.cc
(gfc_walk_alloc_comps):
(gfc_omp_clause_linear_ctor):
Rename array_type_nelts() => array_type_nelts_minus_one()

gcc/rust/ChangeLog:

* backend/rust-tree.cc (array_type_nelts_top):
Rename array_type_nelts() => array_type_nelts_minus_one()

Suggested-by: Richard Biener 
Signed-off-by: Alejandro Colomar 
---
 gcc/c/c-decl.cc   | 10 +-
 gcc/c/c-fold.cc   |  7 ---
 gcc/config/aarch64/aarch64.cc |  2 +-
 gcc/config/i386/i386.cc   |  2 +-
 gcc/cp/decl.cc|  2 +-
 gcc/cp/init.cc|  8 
 gcc/cp/lambda.cc  |  3 ++-
 gcc/cp/tree.cc|  2 +-
 gcc/expr.cc   |  8 
 gcc/fortran/trans-array.cc|  2 +-
 gcc/fortran/trans-openmp.cc   |  4 ++--
 gcc/rust/backend/rust-tree.cc |  2 +-
 gcc/tree.cc   |  4 ++--
 gcc/tree.h|  2 +-
 14 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 8cef8f2c289..e7c2783e724 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5309,7 +5309,7 @@ one_element_array_type_p (const_tree type)
 {
   if (TREE_CODE (type) != ARRAY_TYPE)
 return false;
-  return integer_zerop (array_type_nelts (type));
+  return integer_zerop (array_type_nelts_minus_one (type));
 }
 
 /* Determine whether TYPE is a zero-length array type "[0]".  */
@@ -6257,15 +6257,15 @@ get_parm_array_spec (const struct c_parm *parm, tree 
attrs)
  for (tree type = parm->specs->type; TREE_CODE (type) == ARRAY_TYPE;
   type = TREE_TYPE (type))
{
- tree nelts = array_type_nelts (type);
- if (error_operand_p (nelts))
+ tree nelts_minus_one = array_type_nelts_minus_one (type);
+ if (error_operand_p (nelts_minus_one))
return attrs;
- if (TREE_CODE (nelts) != INTEGER_CST)
+ if (TREE_CODE (nelts_minus_one) != INTEGER_CST)
{
  /* Each variable VLA bound is represented by the dollar
 sign.  */
  spec += "$";
- tpbnds = tree_cons (NULL_TREE, nelts, tpbnds);
+ tpbnds = tree_cons (NULL_TREE, nelts_minus_one, tpbnds);
}
}
  tpbnds = nreverse (tpbnds);
diff --git a/gcc/c/c-fold.cc b/gcc/c/c-fold.cc
index 57b67c74bd8..9ea174f79c4 100644
--- a/gcc/c/c-fold.cc
+++ b/gcc/c/c-fold.cc
@@ -73,11 +73,12 @@ c_fold_array_ref (tree type, tree ary, tree index)
   unsigned elem_nchars = (TYPE_PRECISION (elem_type)
  / TYPE_PRECISION (char_type_node));
   unsigned len = (unsigned) TREE_STRING_LENGTH (ary) / elem_nchars;
-  tree nelts = array_type_nelts (TREE_TYPE (ary));
+  tree nelts_minus_one = array_type_nelts_minus_one (TREE_TYPE (ary));
   bool dummy1 = true, dummy2 = true;
-  nelts = c_fully_fold_internal (nelts, true, &dummy1, &dummy2, false, false);
+  nelts_minus_one = c_fully_fold_internal (nelts_minus_one, true, &dummy1,
+  &dummy2, false, false);
   unsigned HOST_WIDE_INT i = tree_to_uhwi (index);
-  if (!tree_int_cst_le (index, nelts)
+  if (!tree_int_cst_le (index, nelts_minus_one)
   || i >= len
   || i + elem_nchars > len)
 return NULL_TREE;
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 2ac5a22c848..a757796afcf 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1083,7 +1083,7 @@ pu

[PATCH v9 2/3] Merge definitions of array_type_nelts_top()

2024-08-14 Thread Alejandro Colomar

There were two identical definitions, and none of them are available
where they are needed for implementing __elementsof__.  Merge them, and
provide the single definition in gcc/tree.{h,cc}, where it's available
for __elementsof__, which will be added in the following commit.

gcc/ChangeLog:

* tree.h (array_type_nelts_top):
* tree.cc (array_type_nelts_top):
Define function (moved from gcc/cp/).

gcc/cp/ChangeLog:

* cp-tree.h (array_type_nelts_top):
* tree.cc (array_type_nelts_top):
Remove function (move to gcc/).

gcc/rust/ChangeLog:

* backend/rust-tree.h (array_type_nelts_top):
* backend/rust-tree.cc (array_type_nelts_top):
Remove function.

Signed-off-by: Alejandro Colomar 
---
 gcc/cp/cp-tree.h  |  1 -
 gcc/cp/tree.cc| 13 -
 gcc/rust/backend/rust-tree.cc | 13 -
 gcc/rust/backend/rust-tree.h  |  2 --
 gcc/tree.cc   | 13 +
 gcc/tree.h|  1 +
 6 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b1693051231..76d7bc34577 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8100,7 +8100,6 @@ extern tree build_exception_variant   (tree, 
tree);
 extern void fixup_deferred_exception_variants   (tree, tree);
 extern tree bind_template_template_parm(tree, tree);
 extern tree array_type_nelts_total (tree);
-extern tree array_type_nelts_top   (tree);
 extern bool array_of_unknown_bound_p   (const_tree);
 extern tree break_out_target_exprs (tree, bool = false);
 extern tree build_ctor_subob_ref   (tree, tree, tree);
diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 040136c70ab..7d179491476 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -3079,19 +3079,6 @@ cxx_print_statistics (void)
 depth_reached);
 }
 
-/* Return, as an INTEGER_CST node, the number of elements for TYPE
-   (which is an ARRAY_TYPE).  This counts only elements of the top
-   array.  */
-
-tree
-array_type_nelts_top (tree type)
-{
-  return fold_build2_loc (input_location,
- PLUS_EXPR, sizetype,
- array_type_nelts_minus_one (type),
- size_one_node);
-}
-
 /* Return, as an INTEGER_CST node, the number of elements for TYPE
(which is an ARRAY_TYPE).  This one is a recursive count of all
ARRAY_TYPEs that are clumped together.  */
diff --git a/gcc/rust/backend/rust-tree.cc b/gcc/rust/backend/rust-tree.cc
index 8d32e5203ae..3dc6b076711 100644
--- a/gcc/rust/backend/rust-tree.cc
+++ b/gcc/rust/backend/rust-tree.cc
@@ -859,19 +859,6 @@ is_empty_class (tree type)
   return CLASSTYPE_EMPTY_P (type);
 }
 
-// forked from gcc/cp/tree.cc array_type_nelts_top
-
-/* Return, as an INTEGER_CST node, the number of elements for TYPE
-   (which is an ARRAY_TYPE).  This counts only elements of the top
-   array.  */
-
-tree
-array_type_nelts_top (tree type)
-{
-  return fold_build2_loc (input_location, PLUS_EXPR, sizetype,
- array_type_nelts_minus_one (type), size_one_node);
-}
-
 // forked from gcc/cp/tree.cc builtin_valid_in_constant_expr_p
 
 /* Test whether DECL is a builtin that may appear in a
diff --git a/gcc/rust/backend/rust-tree.h b/gcc/rust/backend/rust-tree.h
index 26c8b653ac6..e597c3ab81d 100644
--- a/gcc/rust/backend/rust-tree.h
+++ b/gcc/rust/backend/rust-tree.h
@@ -2993,8 +2993,6 @@ extern location_t rs_expr_location (const_tree);
 extern int
 is_empty_class (tree type);
 
-extern tree array_type_nelts_top (tree);
-
 extern bool
 is_really_empty_class (tree, bool);
 
diff --git a/gcc/tree.cc b/gcc/tree.cc
index ed0a766016a..cedf95cc222 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -3729,6 +3729,19 @@ array_type_nelts_minus_one (const_tree type)
  ? max
  : fold_build2 (MINUS_EXPR, TREE_TYPE (max), max, min));
 }
+
+/* Return, as an INTEGER_CST node, the number of elements for TYPE
+   (which is an ARRAY_TYPE).  This counts only elements of the top
+   array.  */
+
+tree
+array_type_nelts_top (tree type)
+{
+  return fold_build2_loc (input_location,
+ PLUS_EXPR, sizetype,
+ array_type_nelts_minus_one (type),
+ size_one_node);
+}
 
 /* If arg is static -- a reference to an object in static storage -- then
return the object.  This is not the same as the C meaning of `static'.
diff --git a/gcc/tree.h b/gcc/tree.h
index 69d40bb4f04..9061dafd027 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4922,6 +4922,7 @@ extern tree build_method_type (tree, tree);
 extern tree build_offset_type (tree, tree);
 extern tree build_complex_type (tree, bool named = false);
 extern tree array_type_nelts_minus_one (const_tree);
+extern tree array_type_nelts_top (tree);
 
 extern tree value_member (tree, tree);
 extern tree purpose_member (const_tree, tree);
-- 
2.45.2



signature.asc

1 2 >

1 - 100 of 124 matches

Mail list logo