Re: [patch,avr,testsuite,applied] gcc.c-torture/execute/memcpy-a*.c

2024-12-01 Thread Georg-Johann Lay

Am 01.12.24 um 05:45 schrieb Maciej W. Rozycki:

On Sat, 30 Nov 2024, Georg-Johann Lay wrote:


The gcc.c-torture/execute/memcpy-a[1248].c tests consumed more time
than the whole rest of the test suite, just to come up with
a "memory full" even at -Os.  Skipped thusly.


As a matter of interest, is the timeout/memory exhaustion observed with
host compilation or target execution?


It happens during link, when the linker observes that the memory regions
won't fit:

.../avr/bin/ld: memcpy-a8.elf section `.text' will not fit in region `text'
.../avr/bin/ld: address 0x82c174 of memcpy-a8.elf section `.data' is not 
within region `data'
.../avr/bin/ld: address 0x82c17c of memcpy-a8.elf section `.bss' is not 
within region `data'

.../avr/bin/ld: region `text' overflowed by 245074 bytes
collect2: error: ld returned 1 exit status


From my observation the optimisation level does not matter much for
compilation times unless you use -O0, reducing the number of passes and
code transformations applied.  So whether it's at -O2 or -Os the time
consumed does not change much.

The resulting executables themselves take ~560KiB with a VAX target, the


The AVR device used has a program memory of 128 KiB.  Other AVR devices
have less program memory.  The simulator assigns 32 KiB of RAM or less.
So both memories are exhausted by quite a margin.

Johann


ultimate CISC machine which also has memory move and memory set hardware
instructions contributing to size reduction, and ~1200KiB with a POWER9
target, a 64-bit RISC machine.  I suppose the sizes will lie somewhere in
between for the majority of our targets, and with smaller embedded ones
the executables may not fit in the address space available.

   Maciej


Stage 1 patch ping (D, C++, libstdc++, toplevel, driver)

2024-12-01 Thread Arsen Arsenović
Hi,

I'd like to ping the following patches sent in stage1:

https://inbox.sourceware.org/20240414001113.1698685-1-ar...@aarsen.me
- Area: Toplevel
- Subject: Recover in-tree libiconv build support

https://inbox.sourceware.org/20240918210202.192478-1-ar...@aarsen.me
- Area: C++, libstdc++
- Subject: Support for coroutine frames with new-extended alignment

https://inbox.sourceware.org/20240903144108.417053-2-ar...@aarsen.me
- Area: D, driver
- Subject: d,ada/spec: only sub nostd{inc,lib} rather than
  nostd{inc,lib}* 
- Status: Ada part was OK'd, needs D approval.

https://inbox.sourceware.org/20240821180101.3976132-1-ar...@aarsen.me
- Area: C++
- Subject: c++: improve location of parsed RETURN_EXPRs
- Version 3
- Waiting on:
  https://inbox.sourceware.org/49471de8-ca78-493c-810d-8c2be99f3...@redhat.com/

https://inbox.sourceware.org/gcc-patches/86y14ptvdi@aarsen.me/
- Area: C++
- Subject: warn-access: ignore template parameters when matching
  operator new/delete [PR109224]
- Original version of the patch is in
  https://inbox.sourceware.org/20240802211503.3992610-2-ar...@aarsen.me
  the link above (86y14ptvdi@aarsen.me) is for the revised version
  of the patch

Thanks in advance!

Have a lovely day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] x86: Add pcmpeq splitters

2024-12-01 Thread Uros Bizjak
On Sat, Nov 30, 2024 at 11:00 PM H.J. Lu  wrote:
>
> Add pcmpeq splitters to split
>
> (insn 5 3 7 2 (set (reg:V4SI 100)
> (eq:V4SI (reg:V4SI 98)
> (reg:V4SI 98))) 7910 {*sse2_eqv4si3}
>  (expr_list:REG_DEAD (reg:V4SI 98)
> (expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [
> (const_int -1 [0x]) repeated x4
> ])
> (const_vector:V4SI [
> (const_int -1 [0x]) repeated x4
> ]))
> (nil
>
> to
>
> (insn 8 3 7 2 (set (reg:V4SI 100)
> (const_vector:V4SI [
> (const_int -1 [0x]) repeated x4
> ])) -1
>  (nil))

IMO, middle-end should handle these cases, and I'm surprised that it
doesn't. These RTXes are not unspecs.

OTOH, splitters should handle only nonmemory operands. Memory operands
can be volatile and we shouldn't remove these at will.

Uros.

> gcc/
>
> PR target/117863
> * config/i386/sse.md: Add pcmpeq splitters.
>
> gcc/testsuite/
>
> PR target/117863
> * gcc.dg/rtl/i386/vector_eq-2.c: New test.
>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/config/i386/sse.md  | 33 ++
>  gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c | 71 +
>  2 files changed, 104 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 498a42d6e1e..e2ce0781cb4 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17943,6 +17943,17 @@ (define_insn "*avx2_eq3"
> (set_attr "prefix" "vex")
> (set_attr "mode" "OI")])
>
> +(define_split
> +  [(set (match_operand:VI_256 0 "register_operand")
> +   (eq:VI_256
> + (match_operand:VI_256 1 "nonimmediate_operand")
> + (match_operand:VI_256 2 "nonimmediate_operand")))]
> +  "TARGET_AVX2 && rtx_equal_p (operands[1], operands[2])"
> +  [(set (match_dup 0) (match_dup 1))]
> +{
> +  operands[1] = CONSTM1_RTX (mode);
> +})
> +
>  (define_insn_and_split "*avx2_pcmp3_1"
>   [(set (match_operand:VI_128_256  0 "register_operand")
> (vec_merge:VI_128_256
> @@ -18227,6 +18238,17 @@ (define_insn "*sse4_1_eqv2di3"
> (set_attr "prefix" "orig,orig,vex")
> (set_attr "mode" "TI")])
>
> +(define_split
> +  [(set (match_operand:V2DI 0 "register_operand")
> +   (eq:V2DI
> + (match_operand:V2DI 1 "vector_operand")
> + (match_operand:V2DI 2 "vector_operand")))]
> +  "TARGET_SSE4_1 && rtx_equal_p (operands[1], operands[2])"
> +  [(set (match_dup 0) (match_dup 1))]
> +{
> +  operands[1] = CONSTM1_RTX (V2DImode);
> +})
> +
>  (define_insn "*sse2_eq3"
>[(set (match_operand:VI124_128 0 "register_operand" "=x,x")
> (eq:VI124_128
> @@ -18243,6 +18265,17 @@ (define_insn "*sse2_eq3"
> (set_attr "prefix" "orig,vex")
> (set_attr "mode" "TI")])
>
> +(define_split
> +  [(set (match_operand:VI124_128 0 "register_operand")
> +   (eq:VI124_128
> + (match_operand:VI124_128 1 "vector_operand")
> + (match_operand:VI124_128 2 "vector_operand")))]
> +  "TARGET_SSE2 && rtx_equal_p (operands[1], operands[2])"
> +  [(set (match_dup 0) (match_dup 1))]
> +{
> +  operands[1] = CONSTM1_RTX (mode);
> +})
> +
>  (define_insn "sse4_2_gtv2di3"
>[(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
> (gt:V2DI
> diff --git a/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c 
> b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
> new file mode 100644
> index 000..871d489b730
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
> @@ -0,0 +1,71 @@
> +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> +/* { dg-additional-options "-O2 -march=x86-64-v3" } */
> +
> +typedef int v4si __attribute__((vector_size(16)));
> +typedef int v8si __attribute__((vector_size(32)));
> +typedef int v2di __attribute__((vector_size(16)));
> +
> +v4si __RTL (startwith ("vregs1")) foo1 (void)
> +{
> +(function "foo1"
> +  (insn-chain
> +(block 2
> +  (edge-from entry (flags "FALLTHRU"))
> +  (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
> +  (cnote 2 NOTE_INSN_FUNCTION_BEG)
> +  (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int -1) 
> (const_int -1) (const_int -1) (const_int -1)])))
> +  (cinsn 4 (set (reg:V4SI <1>) (const_vector:V4SI [(const_int -1) 
> (const_int -1) (const_int -1) (const_int -1)])))
> +  (cinsn 5 (set (reg:V4SI <2>)
> +   (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>
> +  (cinsn 6 (set (reg:V4SI <3>) (reg:V4SI <2>)))
> +  (cinsn 7 (set (reg:V4SI xmm0) (reg:V4SI <3>)))
> +  (edge-to exit (flags "FALLTHRU"))
> +)
> +  )
> + (crtl (return_rtx (reg/i:V4SI xmm0)))
> +)
> +}
> +
> +v8si __RTL (startwith ("vregs1")) foo2 (void)
> +{
> +(function "foo2"
> +  (insn-chain
> +(block 2
> +  (edge-from entry (flags "FALLTHRU"))
> +  (cnote 1 [bb 2

Re: PING: [PATCH v4 1/7] Honor TARGET_PROMOTE_PROTOTYPES during RTL expand

2024-12-01 Thread H.J. Lu
On Mon, Dec 2, 2024 at 6:15 AM Jeff Law  wrote:
>
>
>
> On 11/27/24 3:34 PM, H.J. Lu wrote:
> > On Thu, Nov 21, 2024, 2:02 PM H.J. Lu  > > wrote:
> >
> > Promote integer arguments smaller than int if TARGET_PROMOTE_PROTOTYPES
> > returns true.
> >
> >  PR middle-end/14907
> >  * calls.c (initialize_argument_information): Promote small
> > integer
> >  arguments if TARGET_PROMOTE_PROTOTYPES returns true.
> This doesn't look right.  Promotions are primarily driven by the target
> files, in particular TARGET_PROMOTE_FUNCTION_MODE.
>
> PROMOTE_PROTOTYPES is more of a language front-end hook and it doesn't
> seem appropriate to be testing it in calls.cc.

TARGET_PROMOTE_PROTOTYPES isn't used by all frontends since
it isn't required by the ABI.  It is an option to extend the upper bits in
the 32-bit outgoing integer slots when the argument type is smaller
than int.   Since it is for outgoing arguments, it is appropriate to do it
when expanding the call.

>
>
> Jeff

-- 
H.J.


Re: [PATCH] Add new hardreg PRE pass

2024-12-01 Thread Andrew Pinski
On Sun, Dec 1, 2024 at 2:36 PM Jeff Law  wrote:
>
>
>
> On 11/12/24 3:42 PM, Richard Sandiford wrote:
>
> >> +
> >> +bool
> >> +pass_hardreg_pre::gate (function *fun)
> >> +{
> >> +#ifdef HARDREG_PRE_REGNOS
> >> +  return optimize > 0
> >> +&& !fun->calls_setjmp;
> >
> > Huh.  It looks like these setjmp exclusions go back to 1998.  I wouldn't
> > have expected them to be needed now, since the modern cfg framework
> > should represent setjmp correctly.  Jeff, do you agree?  I'll try
> > removing them and see what breaks...
> So back in '98 our CFG wasn't accurate (IIRC this code was a lot of what
> motivated making the CFG available before flow).  In addition to not
> being accurate, I don't think we had all of rth's bits to kill
> expressions on abnormal edges which saves us from trying to split an
> abnormal critical edge.
>
> I'd think that if our CFG is accurately representing that abnormal edge
> that we'd be OK these days.  But it's been a long time and there may
> always be something lurking.

I think for RTL CFG we still have issues with setjmp;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57067  .

Thanks,
Andrew

>
> Jeff
>


Re: [PATCH] Exclude the last named argument for non-variadic function

2024-12-01 Thread H.J. Lu
On Mon, Dec 2, 2024 at 6:20 AM Jeff Law  wrote:
>
>
>
> On 11/1/24 12:53 AM, H.J. Lu wrote:
> > expand_call has
> >
> >   /* Now possibly adjust the number of named args.
> >   Normally, don't include the last named arg if anonymous args follow.
> >   We do include the last named arg if
> >   targetm.calls.strict_argument_naming() returns nonzero.
> >   (If no anonymous args follow, the result of list_length is actually
> >   one too large.  This is harmless.)
> >
> >   If targetm.calls.pretend_outgoing_varargs_named() returns
> >   nonzero, and targetm.calls.strict_argument_naming() returns zero,
> >   this machine will be able to place unnamed args that were passed
> >   in registers into the stack.  So treat all args as named.  This
> >   allows the insns emitting for a specific argument list to be
> >   independent of the function declaration.
> >
> >   If targetm.calls.pretend_outgoing_varargs_named() returns zero,
> >   we do not have any reliable way to pass unnamed args in
> >   registers, so we must force them into memory.  */
> >
> >if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> >&& targetm.calls.strict_argument_naming (args_so_far))
> >  ;
> >
> > For non-variadic function, the number of named args is one too large.
> > Don't include the last named argument for non-variadic function so that
> > the accurate number of named args can be used.
> >
> > PR middle-end/117387
> > * calls.cc (expand_call): Don't include the last named argument
> > for non-variadic function.
> I'm not comfortable changing this in stage3.  I realize the patch was
> submitted while we were still in stage1, but just barely.  I'd like to
> see this resubmitted early in gcc-16 stage1 and with testing beyond just
> x86 since there's potentially ABI implications here.
>
> jeff

I am dropping this patch since it is no longer needed.

Thanks.

-- 
H.J.


Re: [PATCH] Add new hardreg PRE pass

2024-12-01 Thread Jeff Law




On 11/13/24 12:03 PM, Richard Sandiford wrote:

Andrew Carlotti  writes:




I think this is mostly my ignorance of the code, and would be obvious
if I tried it out locally, but: why do we need to do this after
computing the kills bitmap?  For mode-switching, the kills bitmap
is the inverse of the transparency bitmap, but it sounds like here
you want the kills bitmap to be more selective.


I had to work through the entire LCM algorithm before I understood how these
bitmaps were being used (and I intend to update the documentation to make this
more obvious).  In summary, the kills and avail bitmaps indicate whether the
result of an earlier expression is still available and up-to-date, whereas the
transparent and anticipatable bitmaps indicate whether a later assignment can
be moved earlier.


Right.  That part is pretty standard.


For the existing hoist/PRE passes these are the same - this is because new
pseduoregs are used to hold the result of relocated computations, so the only
obstruction is if the values of the inputs to the expression are changed.

For the new hardreg PRE pass the bitmaps are different in one case - if the
content of the hardreg is used, then the result of the expression remains
available after the use, but it isn't possible to anticipate a future
assignment by moving that assignment before the earlier use.


But what I meant was: doesn't an assignment to the hard register block
movement/reuse in both directions?  We can't move R:=X up through a block B
that requires R==Y (so X is not transparent in B).  We also can't
reuse R:=X after a block that requires R==Y (because B kills X).

That's why I was expecting the kill set to be updated too, not just the
transparency set.
In general, yes, I would expect transparency and kill to be inverses of 
each other.


I suspect (but would have to do a fair amount of archaeology to be sure) 
that we probably had kills computed for some other problem (classic gcse 
 or const/copy propagation perhaps) and we just inverted it to work 
with the LCM algorithm which wants to query transparency.  Flipping 
kills once into transparency seems better than using kills and having to 
flip it every time we visit a block during the global propagation step.


jeff


Re: PING: [PATCH v4 1/7] Honor TARGET_PROMOTE_PROTOTYPES during RTL expand

2024-12-01 Thread H.J. Lu
On Mon, Dec 2, 2024 at 6:39 AM H.J. Lu  wrote:
>
> On Mon, Dec 2, 2024 at 6:15 AM Jeff Law  wrote:
> >
> >
> >
> > On 11/27/24 3:34 PM, H.J. Lu wrote:
> > > On Thu, Nov 21, 2024, 2:02 PM H.J. Lu  > > > wrote:
> > >
> > > Promote integer arguments smaller than int if 
> > > TARGET_PROMOTE_PROTOTYPES
> > > returns true.
> > >
> > >  PR middle-end/14907
> > >  * calls.c (initialize_argument_information): Promote small
> > > integer
> > >  arguments if TARGET_PROMOTE_PROTOTYPES returns true.
> > This doesn't look right.  Promotions are primarily driven by the target
> > files, in particular TARGET_PROMOTE_FUNCTION_MODE.
> >
> > PROMOTE_PROTOTYPES is more of a language front-end hook and it doesn't
> > seem appropriate to be testing it in calls.cc.
>
> TARGET_PROMOTE_PROTOTYPES isn't used by all frontends since
> it isn't required by the ABI.  It is an option to extend the upper bits in
> the 32-bit outgoing integer slots when the argument type is smaller
> than int.   Since it is for outgoing arguments, it is appropriate to do it
> when expanding the call.

TARGET_PROMOTE_FUNCTION_MODE is done after
TARGET_PROMOTE_PROTOTYPES in calls.cc:

 unsignedp = TYPE_UNSIGNED (type);
  arg.type = type;
  arg.mode
= promote_function_mode (type, TYPE_MODE (type), &unsignedp,
 fndecl ? TREE_TYPE (fndecl) : fntype, 0);

where type is promoted by TARGET_PROMOTE_PROTOTYPES if
it returns true.

>
> >
> >
> > Jeff

>
> --
> H.J.



--
H.J.


[PATCH v3 1/7] middle-end: Handle resized PHI nodes in loop_version()

2024-12-01 Thread Lewis Hyatt
This patch is my new way to handle what was previously done in v2 patch
04/14, discussed at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669527.html

The only places I have found running into trouble with reallocated PHI nodes
(at least, the only places revealed by changing the size of location_t) are
those which remember a gphi object before a call to loop_version() and then
try to use it after. I fixed three affected call sites in this new patch.

-- >8 --

While testing upcoming support for 64-bit location_t, I came across some
test failures on sparc (32-bit) that trigger when location_t is changed to
be 64-bit. The reason is that several call sites that make use of
loop_version() for performing loop optimizations assume that a gphi*
obtained prior to calling loop_version() will remain valid afterwards, but
this is not the case for a PHI that needs to be resized. It doesn't happen
usually, because PHI nodes usually have room for at least 4 arguments and
this is usually more than are needed, but this is not guaranteed.

Fix the affected callers by avoiding the assumption that a PHI node pointer
remains valid. For most cases, this is done by remembering instead the
gphi->result pointer, which contains a pointer back to the PHI node that is
kept up to date when the PHI is moved to a new address.

gcc/ChangeLog:

* tree-parloops.cc (struct reduction_info): Store the result of the
reduction PHI rather than the PHI itself.
(reduction_info::reduc_phi): New member function.
(reduction_hasher::equal): Adapt to the change in struct reduction_info.
(reduction_phi): Likewise.
(initialize_reductions): Likewise.
(create_call_for_reduction_1): Likewise.
(transform_to_exit_first_loop_alt): Likewise.
(transform_to_exit_first_loop): Likewise.
(build_new_reduction): Likewise.
(set_reduc_phi_uids): Likewise.
(try_create_reduction_list): Likewise.
* tree-ssa-loop-split.cc (split_loop): Remember the PHI result
variable so that the PHI can be found in case it is resized and move
to a new address.
* tree-vect-loop-manip.cc (vect_loop_versioning): After calling
loop_version(), fix up stored PHI pointers in case they have
changed.
* tree-vectorizer.cc (vec_info::resync_stmt_addr): New function.
* tree-vectorizer.h (vec_info::resync_stmt_addr): Declare.
---
 gcc/tree-parloops.cc| 40 +++--
 gcc/tree-ssa-loop-split.cc  |  7 +++
 gcc/tree-vect-loop-manip.cc |  8 
 gcc/tree-vectorizer.cc  | 20 +++
 gcc/tree-vectorizer.h   |  1 +
 5 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index 13d8e84bc8f..8427c287a6a 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -895,7 +895,7 @@ parloops_force_simple_reduction (loop_vec_info loop_info, 
stmt_vec_info phi_info
 struct reduction_info
 {
   gimple *reduc_stmt;  /* reduction statement.  */
-  gimple *reduc_phi;   /* The phi node defining the reduction.  */
+  tree reduc_phi_name; /* The result of the phi node defining the 
reduction.  */
   enum tree_code reduction_code;/* code for the reduction operation.  */
   unsigned reduc_version;  /* SSA_NAME_VERSION of original reduc_phi
   result.  */
@@ -910,6 +910,12 @@ struct reduction_info
   will be passed to the atomic operation.  
Represents
   the local result each thread computed for 
the reduction
   operation.  */
+
+  gphi *
+  reduc_phi () const
+  {
+return as_a (SSA_NAME_DEF_STMT (reduc_phi_name));
+  }
 };
 
 /* Reduction info hashtable helpers.  */
@@ -925,7 +931,7 @@ struct reduction_hasher : free_ptr_hash 
 inline bool
 reduction_hasher::equal (const reduction_info *a, const reduction_info *b)
 {
-  return (a->reduc_phi == b->reduc_phi);
+  return (a->reduc_phi_name == b->reduc_phi_name);
 }
 
 inline hashval_t
@@ -949,10 +955,10 @@ reduction_phi (reduction_info_table_type *reduction_list, 
gimple *phi)
   || gimple_uid (phi) == 0)
 return NULL;
 
-  tmpred.reduc_phi = phi;
+  tmpred.reduc_phi_name = gimple_phi_result (phi);
   tmpred.reduc_version = gimple_uid (phi);
   red = reduction_list->find (&tmpred);
-  gcc_assert (red == NULL || red->reduc_phi == phi);
+  gcc_assert (red == NULL || red->reduc_phi () == phi);
 
   return red;
 }
@@ -1294,7 +1300,7 @@ initialize_reductions (reduction_info **slot, class loop 
*loop)
  from the preheader with the reduction initialization value.  */
 
   /* Initialize the reduction.  */
-  type = TREE_TYPE (PHI_RESULT (reduc->reduc_phi));
+  type = TREE_TYPE (reduc->reduc_phi_name);
   init = omp_reduction_init_op (gimple_location (reduc->reduc_stmt),
reduc->reduction_code, type

Re: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2024-12-01 Thread Hongtao Liu
On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu  wrote:
>
> For all different modes of all 0s/1s vectors, we can use the single widest
> all 0s/1s vector register for all 0s/1s vector uses in the whole function.
> Add a pass to generate a single widest all 0s/1s vector set instruction at
> entry of the nearest common dominator for basic blocks with all 0s/1s
> vector uses.  On Linux/x86-64, in cc1plus, this patch reduces the number
> of vector xor instructions from 4803 to 4714 and pcmpeq instructions from
> 144 to 142.
I'm worried that it will affect the rematerialisation of RA and thus
increase register pressure, can we push to GCC16?
>
> This change causes a regression:
>
> FAIL: gcc.dg/rtl/x86_64/vector_eq.c
>
> without the fix for
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117863
The fix for this PR looks like the risk is lower and suitable for GCC15.
>
> NB: PR target/92080 and PR target/117839 aren't same.  PR target/117839
> is for vectors of all 0s and all 1s with different sizes and different
> components.  PR target/92080 is for broadcast of the same component to
> different vector sizes.  This patch covers only all 0s and all 1s cases
> of PR target/92080.
>
> gcc/
>
> PR target/92080
> PR target/117839
> * config/i386/i386-features.cc (ix86_rrvl_gate): New.
> (ix86_place_single_vector_set): Likewise.
> (ix86_get_vector_load_mode): Likewise.
> (remove_redundant_vector_load): Likewise.
> (pass_data_remove_redundant_vector_load): Likewise.
> (pass_remove_redundant_vector_load): Likewise.
> (make_pass_remove_redundant_vector_load): Likewise.
> * config/i386/i386-passes.def: Add
> pass_remove_redundant_vector_load after
> pass_remove_partial_avx_dependency.
> * config/i386/i386-protos.h
> (make_pass_remove_redundant_vector_load): New.
>
> gcc/testsuite/
>
> PR target/92080
> PR target/117839
> * gcc.target/i386/pr117839-1a.c: New test.
> * gcc.target/i386/pr117839-1b.c: Likewise.
> * gcc.target/i386/pr117839-2.c: Likewise.
> * gcc.target/i386/pr92080-1.c: Likewise.
> * gcc.target/i386/pr92080-2.c: Likewise.
> * gcc.target/i386/pr92080-3.c: Likewise.
>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/config/i386/i386-features.cc| 308 
>  gcc/config/i386/i386-passes.def |   1 +
>  gcc/config/i386/i386-protos.h   |   2 +
>  gcc/testsuite/gcc.target/i386/pr117839-1a.c |  35 +++
>  gcc/testsuite/gcc.target/i386/pr117839-1b.c |   5 +
>  gcc/testsuite/gcc.target/i386/pr117839-2.c  |  40 +++
>  gcc/testsuite/gcc.target/i386/pr92080-1.c   |  54 
>  gcc/testsuite/gcc.target/i386/pr92080-2.c   |  59 
>  gcc/testsuite/gcc.target/i386/pr92080-3.c   |  48 +++
>  9 files changed, 552 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr117839-1a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr117839-1b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr117839-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-3.c
>
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index 003b003e09c..7d8d260750d 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -3288,6 +3288,314 @@ make_pass_remove_partial_avx_dependency (gcc::context 
> *ctxt)
>return new pass_remove_partial_avx_dependency (ctxt);
>  }
>
> +static bool
> +ix86_rrvl_gate ()
> +{
> +  return (TARGET_SSE2
> + && optimize
> + && optimize_function_for_speed_p (cfun));
> +}
> +
> +/* Generate a vector set, DEST = SRC, at entry of the nearest dominator
> +   for basic block map BBS, which is in the fake loop that contains the
> +   whole function, so that there is only a single vector set in the
> +   whole function.   */
> +
> +static void
> +ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs)
> +{
> +  basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs);
> +  while (bb->loop_father->latch
> +!= EXIT_BLOCK_PTR_FOR_FN (cfun))
> +bb = get_immediate_dominator (CDI_DOMINATORS,
> + bb->loop_father->header);
> +
> +  rtx set = gen_rtx_SET (dest, src);
> +
> +  rtx_insn *insn = BB_HEAD (bb);
> +  while (insn && !NONDEBUG_INSN_P (insn))
> +{
> +  if (insn == BB_END (bb))
> +   {
> + insn = NULL;
> + break;
> +   }
> +  insn = NEXT_INSN (insn);
> +}
> +
> +  rtx_insn *set_insn;
> +  if (insn == BB_HEAD (bb))
> +set_insn = emit_insn_before (set, insn);
> +  else
> +set_insn = emit_insn_after (set,
> +   insn ? PREV_INSN (insn) : BB_END (bb));
> +  df_insn_rescan (set_insn);
> +}
> +
> +/* Return a machine mode suitable for vector SIZE.  */
> +

Re: [patch,avr,testsuite,applied] gcc.c-torture/execute/memcpy-a*.c

2024-12-01 Thread Dimitar Dimitrov
On Sun, Dec 01, 2024 at 12:32:55PM +0100, Georg-Johann Lay wrote:
> Am 01.12.24 um 05:45 schrieb Maciej W. Rozycki:
> > On Sat, 30 Nov 2024, Georg-Johann Lay wrote:
> > 
> > > The gcc.c-torture/execute/memcpy-a[1248].c tests consumed more time
> > > than the whole rest of the test suite, just to come up with
> > > a "memory full" even at -Os.  Skipped thusly.
> > 
> > As a matter of interest, is the timeout/memory exhaustion observed with
> > host compilation or target execution?
> 
> It happens during link, when the linker observes that the memory regions
> won't fit:
> 
> .../avr/bin/ld: memcpy-a8.elf section `.text' will not fit in region `text'
> .../avr/bin/ld: address 0x82c174 of memcpy-a8.elf section `.data' is not
> within region `data'
> .../avr/bin/ld: address 0x82c17c of memcpy-a8.elf section `.bss' is not
> within region `data'
> .../avr/bin/ld: region `text' overflowed by 245074 bytes
> collect2: error: ld returned 1 exit status

The memory overflow should be caught by ${tool}_check_unsupported_p.
Even without this patch, the testsuite should mark the tests as
UNSUPPORTED and not FAIL for avr.

Compilation takes much host time for other targets too.
On native x86_64-pc-linux-gnu:
  $ time make check-gcc-c RUNTESTFLAGS="execute.exp=memcpy-a*.c"
  # of expected passes  56

  real  8m37,778s
  user  8m29,895s
  sys   0m5,805s

Should these tests instead be gated by "run_expensive_tests"?

Regards
Dimitar


Re: [PATCH v2] phi-opt: Add missed optimization for "(cond | (a != b)) ? b : a"

2024-12-01 Thread Jeff Law



[ Thanks for your patience.  It's been a long month ;-) ]


On 11/4/24 6:23 AM, Jovan Vukic wrote:

On 11/02/24, Jeff Law wrote:

This is well understood.  The key in my mind is that for AND we always
select the FALSE arm.  For IOR we always select the TRUE arm.


Yes, I agree.


   e = (code == NE_EXPR ? true_edge : false_edge);

If I understand everything correctly your assertion is that we'll only
get here for AND/EQ_EXPR and IOR/NE_EXPR.  There's no way to get here
for AND/NE_EXPR or IOR/EQ_EXPR?


If we examine the patch step by step, we can see that the function
rhs_is_fed_for_value_replacement enters the if block exclusively for
the combinations BIT_AND_EXPR/EQ_EXPR and BIT_IOR_EXPR/NE_EXPR. It is
only at this point that it returns true and sets the value of *code. This is
evident in the code:
Thanks.  I was just looking for a yes/no to verify that my understanding 
was correct.  It's been ~20 years since I looked at this code in any 
significant way.


Your comments and re-reviewing the patch have addressed my concerns.





Also, Mr. Pinski left a comment
(https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667258.html)
and offered some suggestions about the patch.

He also mentioned that he is working on integrating the affected code into
match-and-simplify, so the rewrite is on the way. We can either move forward
with this patch or stop it. Either way, I’m fine with any decision made.
I'd prefer to move forward.  We're definitely moving a lot of 
transformations into the match.pd framework and this one probably fits 
in there reasonably well.


It'll make a bit more work for Andrew as part of his desire to move this 
stuff into match.pd, but I don't see his work landing in this cycle and 
he's indicated he doesn't have a problem with your patch going forward 
at this time.



I'll push it into the tree after some additional light testing given 
its' been a month since the last discussion on this patch.


jeff


[committed] testsuite: Silence gcc.dg/pr117806.c for default_packed

2024-12-01 Thread Dimitar Dimitrov
On default_packed targets like PRU, spurious warnings are emitted:
  ...workspace/gcc/gcc/testsuite/gcc.dg/pr117806.c:5:3: warning: 'packed' 
attribute ignored for field of type 'double' [-Wattributes]

Fix by annotating the excess warnings for default_packed targets.

Pushed to trunk as obvious.

gcc/testsuite/ChangeLog:

* gcc.dg/pr117806.c: Test can spill excess
errors for default_packed targets.

Cc: Martin Uecker 
Signed-off-by: Dimitar Dimitrov 
---
 gcc/testsuite/gcc.dg/pr117806.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/pr117806.c b/gcc/testsuite/gcc.dg/pr117806.c
index bc2c8c665e7..a01278cdc15 100644
--- a/gcc/testsuite/gcc.dg/pr117806.c
+++ b/gcc/testsuite/gcc.dg/pr117806.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-excess-errors "warnings about ignored 'packed' attribute" { target 
default_packed } } */
 /* { dg-options "-std=c23" } */
 
 struct Test {
-- 
2.47.1



[committed] Patches 8-12 of Mariam & Matevos's CRC optimization work

2024-12-01 Thread Jeff Law

[ Resent with compressed patches to get undre the 400k limit. ]

This is the bulk of Mariam's & Matevos's work on CRC optimization.  It 
covers detection, validation and final transformation of a bitwise CRC 
loop into an IFN.  Once in IFN form the expansion code from earlier this 
week can expand it into a table lookup, carryless multiply sequence or a 
CRC instruction.


I've spot checked each of patches in this push to verify that the tree 
remains bisectable at each point and made one trivial adjustment as result.


The first patch identifies loops that may compute a CRC; it does a 
series of relatively inexpensive tests to identify candidate loops based 
on the operations inside the loop.


The second patch introduces a symbolic execution engine that can do 
bitwise tracking of state.  We use this capability to verify that a 
candidate loop is actually a CRC computation.


The third patch adds code to use the symbolic execution engine to 
validate that a candidate loop is a CRC computation suitable for 
optimization into a table lookup, carryless multiply sequence or CRC 
instruction.


The fourth patch actually replaces a CRC loop with an IFN.

The fifth patch is the generic bits of the testsuite.  This includes a 
combination of tests which are and are not CRC computations.  It 
includes some dump scanning tests as well as execution tests.  With the 
exception of two dump scanning tests (which fail due to early IL 
differences on some embedded targets), the tests should be working 
consistently on all our platforms.


As I mentioned, I've tested these patches one at a time to ensure trunk 
bisectability in my tester.   They've also been bootstrapped and 
regression tested on x86_64.


There's still some risc-v tests to add now that the main bits are in. 
And there's aarch64 & x86 bits to take care of as well.  But the major 
work is committed.  I'll let any dust settle before I take care of these 
final items.


Thanks Mariam & Matevos!

Jeff

P1.gz
Description: application/gzip


P2.gz
Description: application/gzip


P3.gz
Description: application/gzip


P5.gz
Description: application/gzip


P4.gz
Description: application/gzip


Re: [PING] [contrib] validate_failures.py: fix python 3.12 escape sequence warnings

2024-12-01 Thread Jeff Law




On 11/22/24 11:19 AM, Sam James wrote:

Jeff Law  writes:


On 6/9/24 5:45 AM, Gabi Falk wrote:

Hi,
On Sat, Jun 08, 2024 at 03:34:02PM -0600, Jeff Law wrote:

On 5/14/24 8:12 AM, Gabi Falk wrote:

Hi,

This one still needs review:

https://inbox.sourceware.org/gcc-patches/20240415233833.104460-1-gabif...@gmx.com/

I think I just ACK'd an equivalent patch from someone else this week.

Looks like it hasn't been merged yet, and I couldn't find it in the
mailing list archive.
Anyway, I hope either one gets merged soon. :)

I'm sure it will.  The variant I asked is from someone with commit
privs, so they'll push it to the tree when convenient for them.


I still don't see that change having landed.
It landed, but fixed the same class of problem elsewhere in the contrib 
scripts (check_GNU_style_lib).


Gabi's patch is fine for the trunk.  I'll go ahead and install it.

jeff


Re: [patch,avr,testsuite,applied] gcc.c-torture/execute/memcpy-a*.c

2024-12-01 Thread Georg-Johann Lay

Am 01.12.24 um 19:15 schrieb Dimitar Dimitrov:

On Sun, Dec 01, 2024 at 12:32:55PM +0100, Georg-Johann Lay wrote:

Am 01.12.24 um 05:45 schrieb Maciej W. Rozycki:

On Sat, 30 Nov 2024, Georg-Johann Lay wrote:

The gcc.c-torture/execute/memcpy-a[1248].c tests consumed more time
than the whole rest of the test suite, just to come up with
a "memory full" even at -Os.  Skipped thusly.

As a matter of interest, is the timeout/memory exhaustion observed with
host compilation or target execution?

It happens during link, when the linker observes that the memory regions
won't fit:

.../avr/bin/ld: memcpy-a8.elf section `.text' will not fit in region `text'
.../avr/bin/ld: address 0x82c174 of memcpy-a8.elf section `.data' is not
within region `data'
.../avr/bin/ld: address 0x82c17c of memcpy-a8.elf section `.bss' is not
within region `data'
.../avr/bin/ld: region `text' overflowed by 245074 bytes
collect2: error: ld returned 1 exit status


The memory overflow should be caught by ${tool}_check_unsupported_p.
Even without this patch, the testsuite should mark the tests as
UNSUPPORTED and not FAIL for avr.


They ARE being reported as UNSUPPORTED.  But it takes ~40m to arrive at
these conclusions for all 5 tests.  A whole testsuite run takes
60m...70m, so adding 40m for a single test just to see one UNSUPPORTED
rushing by each minute is no fun.  It's known in advance that these
tests are pointless on AVR.


Compilation takes much host time for other targets too.


Ja, but with the difference that test ARE being conducted.


On native x86_64-pc-linux-gnu:
   $ time make check-gcc-c RUNTESTFLAGS="execute.exp=memcpy-a*.c"
   # of expected passes 56

   real 8m37,778s
   user 8m29,895s
   sys  0m5,805s

Should these tests instead be gated by "run_expensive_tests"?


"in addition" instead of "instead" would be fine for me.

Though I don't know anything about when a test on a current hardware is
deemed "expensive".  For AVR, they are pointless *and* are consuming
an offensive amount of time (otherwise I wouldn't care; there are many
other tests that are beyond the memory constraints of AVRs).

Johann


Regards
Dimitar


Re: [PATCH] defer test for limits.h existence to runtime [PR80677]

2024-12-01 Thread Jeff Law




On 4/30/24 12:45 PM, Helmut Grohne wrote:

The definition of LIMITS_H_TEST evaluates its existence in
BUILD_SYSTEM_HEADER_DIR, but we'd actually need it to check a target
version. Hence this check occasionally produces misdetections when build
and target differ. In some cases such as cygming, the header is only
installed after performing the build. Instead of resolving these
situations by guessing, defer the test to the time of use and check for
the header using __has_include_next which will use the correct include
search path.

2024-04-30  Helmut Grohne  

PR bootstrap/80677
 * gcc/limitx.h: Only #include syslimits.h when another 
  exists.
 * gcc/limity.h: Only #include limits.h when another 
  exists.
 * gcc/Makefile.in: Delete LIMITS_H_TEST default and always wrap
  limits.h with limitx.h and limity.h.
* Makefile.tpl: Drop forwarding of LIMITS_H_TEST
* Makefile.in: Regenerate.
* gcc/config/i386/t-cygming: Delete unused LIMITS_H_TEST.
* gcc/config/t-rtems: Likewise.
* gcc/config/t-vxworks: Likewise.
* gcc/config/vms/t-vms: Likewise.
As I noted in the related BZ.  I think there's something more 
fundamental going on here.  The claim that this stuff doesn't work for 
host != target isn't as general as one might think as we do those kind 
of builds every day.


I would start by first getting the debian multiarch patches upstreamed 
as I get a sense this is related to multiarch.


Jeff



Re: PING: [PATCH v4 1/7] Honor TARGET_PROMOTE_PROTOTYPES during RTL expand

2024-12-01 Thread Jeff Law




On 11/27/24 3:34 PM, H.J. Lu wrote:
On Thu, Nov 21, 2024, 2:02 PM H.J. Lu > wrote:


Promote integer arguments smaller than int if TARGET_PROMOTE_PROTOTYPES
returns true.

         PR middle-end/14907
         * calls.c (initialize_argument_information): Promote small
integer
         arguments if TARGET_PROMOTE_PROTOTYPES returns true.
This doesn't look right.  Promotions are primarily driven by the target 
files, in particular TARGET_PROMOTE_FUNCTION_MODE.


PROMOTE_PROTOTYPES is more of a language front-end hook and it doesn't 
seem appropriate to be testing it in calls.cc.



Jeff


Re: [PATCH] Exclude the last named argument for non-variadic function

2024-12-01 Thread Jeff Law




On 11/1/24 12:53 AM, H.J. Lu wrote:

expand_call has

  /* Now possibly adjust the number of named args.
  Normally, don't include the last named arg if anonymous args follow.
  We do include the last named arg if
  targetm.calls.strict_argument_naming() returns nonzero.
  (If no anonymous args follow, the result of list_length is actually
  one too large.  This is harmless.)

  If targetm.calls.pretend_outgoing_varargs_named() returns
  nonzero, and targetm.calls.strict_argument_naming() returns zero,
  this machine will be able to place unnamed args that were passed
  in registers into the stack.  So treat all args as named.  This
  allows the insns emitting for a specific argument list to be
  independent of the function declaration.

  If targetm.calls.pretend_outgoing_varargs_named() returns zero,
  we do not have any reliable way to pass unnamed args in
  registers, so we must force them into memory.  */

   if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
   && targetm.calls.strict_argument_naming (args_so_far))
 ;

For non-variadic function, the number of named args is one too large.
Don't include the last named argument for non-variadic function so that
the accurate number of named args can be used.

PR middle-end/117387
* calls.cc (expand_call): Don't include the last named argument
for non-variadic function.
I'm not comfortable changing this in stage3.  I realize the patch was 
submitted while we were still in stage1, but just barely.  I'd like to 
see this resubmitted early in gcc-16 stage1 and with testing beyond just 
x86 since there's potentially ABI implications here.


jeff


Re: [PATCH] Add new hardreg PRE pass

2024-12-01 Thread Jeff Law




On 11/12/24 3:42 PM, Richard Sandiford wrote:

Sorry for the slow review.  I think Jeff's much better placed to comment
on this than I am, but here's a stab.  Mostly it looks really good to me
FWIW.

Digging out.  I'll try to get a good looksie this afternoon/evening.

jeff



[PATCH v2] x86: Add pcmpeq splitters

2024-12-01 Thread H.J. Lu
Add pcmpeq splitters to split

(insn 5 3 7 2 (set (reg:V4SI 100)
(eq:V4SI (reg:V4SI 98)
(reg:V4SI 98))) 7910 {*sse2_eqv4si3}
 (expr_list:REG_DEAD (reg:V4SI 98)
(expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [
(const_int -1 [0x]) repeated x4
])
(const_vector:V4SI [
(const_int -1 [0x]) repeated x4
]))
(nil

to

(insn 8 3 7 2 (set (reg:V4SI 100)
(const_vector:V4SI [
(const_int -1 [0x]) repeated x4
])) -1
 (nil))

gcc/

PR target/117863
* config/i386/sse.md: Add pcmpeq splitters.

gcc/testsuite/

PR target/117863
* gcc.dg/rtl/i386/vector_eq-2.c: New test.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/sse.md  | 36 +++
 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c | 71 +
 2 files changed, 107 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 498a42d6e1e..4b19bc22a83 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17943,6 +17943,18 @@ (define_insn "*avx2_eq3"
(set_attr "prefix" "vex")
(set_attr "mode" "OI")])
 
+;; Don't remove memory operand to keep volatile memory.
+(define_split
+  [(set (match_operand:VI_256 0 "register_operand")
+   (eq:VI_256
+ (match_operand:VI_256 1 "register_operand")
+ (match_operand:VI_256 2 "register_operand")))]
+  "TARGET_AVX2 && rtx_equal_p (operands[1], operands[2])"
+  [(set (match_dup 0) (match_dup 1))]
+{
+  operands[1] = CONSTM1_RTX (mode);
+})
+
 (define_insn_and_split "*avx2_pcmp3_1"
  [(set (match_operand:VI_128_256  0 "register_operand")
(vec_merge:VI_128_256
@@ -18227,6 +18239,18 @@ (define_insn "*sse4_1_eqv2di3"
(set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "TI")])
 
+;; Don't remove memory operand to keep volatile memory.
+(define_split
+  [(set (match_operand:V2DI 0 "register_operand")
+   (eq:V2DI
+ (match_operand:V2DI 1 "register_operand")
+ (match_operand:V2DI 2 "register_operand")))]
+  "TARGET_SSE4_1 && rtx_equal_p (operands[1], operands[2])"
+  [(set (match_dup 0) (match_dup 1))]
+{
+  operands[1] = CONSTM1_RTX (V2DImode);
+})
+
 (define_insn "*sse2_eq3"
   [(set (match_operand:VI124_128 0 "register_operand" "=x,x")
(eq:VI124_128
@@ -18243,6 +18267,18 @@ (define_insn "*sse2_eq3"
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
+;; Don't remove memory operand to keep volatile memory.
+(define_split
+  [(set (match_operand:VI124_128 0 "register_operand")
+   (eq:VI124_128
+ (match_operand:VI124_128 1 "register_operand")
+ (match_operand:VI124_128 2 "register_operand")))]
+  "TARGET_SSE2 && rtx_equal_p (operands[1], operands[2])"
+  [(set (match_dup 0) (match_dup 1))]
+{
+  operands[1] = CONSTM1_RTX (mode);
+})
+
 (define_insn "sse4_2_gtv2di3"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
(gt:V2DI
diff --git a/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c 
b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
new file mode 100644
index 000..871d489b730
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
@@ -0,0 +1,71 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-O2 -march=x86-64-v3" } */
+
+typedef int v4si __attribute__((vector_size(16)));
+typedef int v8si __attribute__((vector_size(32)));
+typedef int v2di __attribute__((vector_size(16)));
+
+v4si __RTL (startwith ("vregs1")) foo1 (void)
+{
+(function "foo1"
+  (insn-chain
+(block 2
+  (edge-from entry (flags "FALLTHRU"))
+  (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
+  (cnote 2 NOTE_INSN_FUNCTION_BEG)
+  (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int -1) 
(const_int -1) (const_int -1) (const_int -1)])))
+  (cinsn 4 (set (reg:V4SI <1>) (const_vector:V4SI [(const_int -1) 
(const_int -1) (const_int -1) (const_int -1)])))
+  (cinsn 5 (set (reg:V4SI <2>)
+   (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>
+  (cinsn 6 (set (reg:V4SI <3>) (reg:V4SI <2>)))
+  (cinsn 7 (set (reg:V4SI xmm0) (reg:V4SI <3>)))
+  (edge-to exit (flags "FALLTHRU"))
+)
+  )
+ (crtl (return_rtx (reg/i:V4SI xmm0)))
+)
+}
+
+v8si __RTL (startwith ("vregs1")) foo2 (void)
+{
+(function "foo2"
+  (insn-chain
+(block 2
+  (edge-from entry (flags "FALLTHRU"))
+  (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
+  (cnote 2 NOTE_INSN_FUNCTION_BEG)
+  (cinsn 3 (set (reg:V8SI <0>) (const_vector:V8SI [(const_int -1) 
(const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) 
(const_int -1) (const_int -1)])))
+  (cinsn 4 (set (reg:V8SI <1>) (const_vector:V8SI [(const_int -1) 
(const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_i

Re: [PATCH] x86: Add pcmpeq splitters

2024-12-01 Thread H.J. Lu
On Sun, Dec 1, 2024 at 8:01 PM Uros Bizjak  wrote:
>
> On Sat, Nov 30, 2024 at 11:00 PM H.J. Lu  wrote:
> >
> > Add pcmpeq splitters to split
> >
> > (insn 5 3 7 2 (set (reg:V4SI 100)
> > (eq:V4SI (reg:V4SI 98)
> > (reg:V4SI 98))) 7910 {*sse2_eqv4si3}
> >  (expr_list:REG_DEAD (reg:V4SI 98)
> > (expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [
> > (const_int -1 [0x]) repeated x4
> > ])
> > (const_vector:V4SI [
> > (const_int -1 [0x]) repeated x4
> > ]))
> > (nil
> >
> > to
> >
> > (insn 8 3 7 2 (set (reg:V4SI 100)
> > (const_vector:V4SI [
> > (const_int -1 [0x]) repeated x4
> > ])) -1
> >  (nil))
>
> IMO, middle-end should handle these cases, and I'm surprised that it
> doesn't. These RTXes are not unspecs.
>
> OTOH, splitters should handle only nonmemory operands. Memory operands
> can be volatile and we shouldn't remove these at will.

Fixed in the v2 patch by using register_operand to keep the memory operand.

Thanks.

> Uros.
>
> > gcc/
> >
> > PR target/117863
> > * config/i386/sse.md: Add pcmpeq splitters.
> >
> > gcc/testsuite/
> >
> > PR target/117863
> > * gcc.dg/rtl/i386/vector_eq-2.c: New test.
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  gcc/config/i386/sse.md  | 33 ++
> >  gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c | 71 +
> >  2 files changed, 104 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 498a42d6e1e..e2ce0781cb4 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -17943,6 +17943,17 @@ (define_insn "*avx2_eq3"
> > (set_attr "prefix" "vex")
> > (set_attr "mode" "OI")])
> >
> > +(define_split
> > +  [(set (match_operand:VI_256 0 "register_operand")
> > +   (eq:VI_256
> > + (match_operand:VI_256 1 "nonimmediate_operand")
> > + (match_operand:VI_256 2 "nonimmediate_operand")))]
> > +  "TARGET_AVX2 && rtx_equal_p (operands[1], operands[2])"
> > +  [(set (match_dup 0) (match_dup 1))]
> > +{
> > +  operands[1] = CONSTM1_RTX (mode);
> > +})
> > +
> >  (define_insn_and_split "*avx2_pcmp3_1"
> >   [(set (match_operand:VI_128_256  0 "register_operand")
> > (vec_merge:VI_128_256
> > @@ -18227,6 +18238,17 @@ (define_insn "*sse4_1_eqv2di3"
> > (set_attr "prefix" "orig,orig,vex")
> > (set_attr "mode" "TI")])
> >
> > +(define_split
> > +  [(set (match_operand:V2DI 0 "register_operand")
> > +   (eq:V2DI
> > + (match_operand:V2DI 1 "vector_operand")
> > + (match_operand:V2DI 2 "vector_operand")))]
> > +  "TARGET_SSE4_1 && rtx_equal_p (operands[1], operands[2])"
> > +  [(set (match_dup 0) (match_dup 1))]
> > +{
> > +  operands[1] = CONSTM1_RTX (V2DImode);
> > +})
> > +
> >  (define_insn "*sse2_eq3"
> >[(set (match_operand:VI124_128 0 "register_operand" "=x,x")
> > (eq:VI124_128
> > @@ -18243,6 +18265,17 @@ (define_insn "*sse2_eq3"
> > (set_attr "prefix" "orig,vex")
> > (set_attr "mode" "TI")])
> >
> > +(define_split
> > +  [(set (match_operand:VI124_128 0 "register_operand")
> > +   (eq:VI124_128
> > + (match_operand:VI124_128 1 "vector_operand")
> > + (match_operand:VI124_128 2 "vector_operand")))]
> > +  "TARGET_SSE2 && rtx_equal_p (operands[1], operands[2])"
> > +  [(set (match_dup 0) (match_dup 1))]
> > +{
> > +  operands[1] = CONSTM1_RTX (mode);
> > +})
> > +
> >  (define_insn "sse4_2_gtv2di3"
> >[(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
> > (gt:V2DI
> > diff --git a/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c 
> > b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
> > new file mode 100644
> > index 000..871d489b730
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
> > @@ -0,0 +1,71 @@
> > +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> > +/* { dg-additional-options "-O2 -march=x86-64-v3" } */
> > +
> > +typedef int v4si __attribute__((vector_size(16)));
> > +typedef int v8si __attribute__((vector_size(32)));
> > +typedef int v2di __attribute__((vector_size(16)));
> > +
> > +v4si __RTL (startwith ("vregs1")) foo1 (void)
> > +{
> > +(function "foo1"
> > +  (insn-chain
> > +(block 2
> > +  (edge-from entry (flags "FALLTHRU"))
> > +  (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
> > +  (cnote 2 NOTE_INSN_FUNCTION_BEG)
> > +  (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int -1) 
> > (const_int -1) (const_int -1) (const_int -1)])))
> > +  (cinsn 4 (set (reg:V4SI <1>) (const_vector:V4SI [(const_int -1) 
> > (const_int -1) (const_int -1) (const_int -1)])))
> > +  (cinsn 5 (set (reg:V4SI <2>)
> > +   (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>)))

[PATCH] Fix non-aligned CodeView symbols

2024-12-01 Thread Mark Harmstone
CodeView symbols in PDB files are aligned to four-byte boundaries. It's
not really clear what logic MSVC uses to enforce this; sometimes the
symbols are padded in the object file, sometimes the linker seems to do
the work.

It makes more sense to do this in the compiler, so fix the two instances
where we can write symbols with a non-aligned length. S_FRAMEPROC is
unusually not a multiple of 4, so will always have 2 bytes padding.
S_INLINESITE is followed by variable-length "binary annotations", so
will also usually have padding.

gcc/
* dwarf2codeview.cc (write_s_frameproc): Align output.
(write_s_inlinesite): Align output.
---
 gcc/dwarf2codeview.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 19ec58d096e..a50fcdf9f7b 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -3208,6 +3208,8 @@ write_s_frameproc (void)
   fprint_whex (asm_out_file, 0);
   putc ('\n', asm_out_file);
 
+  ASM_OUTPUT_ALIGN (asm_out_file, 2);
+
   targetm.asm_out.internal_label (asm_out_file, SYMBOL_END_LABEL, label_num);
 }
 
@@ -3576,7 +3578,10 @@ write_s_inlinesite (dw_die_ref parent_func, dw_die_ref 
die)
   line_func = find_line_function (parent_func, die);
 
   if (line_func)
-write_binary_annotations (line_func, func_id);
+{
+  write_binary_annotations (line_func, func_id);
+  ASM_OUTPUT_ALIGN (asm_out_file, 2);
+}
 #else
   (void) line_func;
 #endif
-- 
2.45.2



Re: [PATCH] Add new hardreg PRE pass

2024-12-01 Thread Jeff Law




On 11/12/24 3:42 PM, Richard Sandiford wrote:


+
+bool
+pass_hardreg_pre::gate (function *fun)
+{
+#ifdef HARDREG_PRE_REGNOS
+  return optimize > 0
+&& !fun->calls_setjmp;


Huh.  It looks like these setjmp exclusions go back to 1998.  I wouldn't
have expected them to be needed now, since the modern cfg framework
should represent setjmp correctly.  Jeff, do you agree?  I'll try
removing them and see what breaks...
So back in '98 our CFG wasn't accurate (IIRC this code was a lot of what 
motivated making the CFG available before flow).  In addition to not 
being accurate, I don't think we had all of rth's bits to kill 
expressions on abnormal edges which saves us from trying to split an 
abnormal critical edge.


I'd think that if our CFG is accurately representing that abnormal edge 
that we'd be OK these days.  But it's been a long time and there may 
always be something lurking.


Jeff



[PATCH v3 5/7] Support for 64-bit location_t: Activate 64-bit location_t

2024-12-01 Thread Lewis Hyatt
With the codebase having already been prepared to handle it, change
location_t to be a 64-bit integer instead of a 32-bit integer.

libcpp/ChangeLog:

* include/cpplib.h (struct cpp_token): Adjust comment about the
struct size.
* include/line-map.h (location_t): Change typedef from 32-bit to 64-bit
integer.
(LINE_MAP_MAX_COLUMN_NUMBER): Increase size to be appropriate for
64-bit location_t.
(LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES): Likewise.
(LINE_MAP_MAX_LOCATION_WITH_COLS): Likewise.
(LINE_MAP_MAX_LOCATION): Likewise.
(MAX_LOCATION_T): Likewise.
(line_map_suggested_range_bits): Likewise.
(struct line_map): Adjust comment about the struct size.
(struct line_map_macro): Likewise.
(struct line_map_ordinary): Likewise. Rearrange fields to optimize
padding.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/pr77949.C: Adapt the test for 64-bit location_t,
when the previously expected failure doesn't actually happen.
* g++.dg/modules/loc-prune-4.C: Adjust the expected output for the
64-bit location_t case.
* gcc.dg/plugin/expensive_selftests_plugin.c: Don't try to test
the maximum supported column number in 64-bit location_t mode.
* gcc.dg/plugin/location_overflow_plugin.c: Adjust the base_location
so it can effectively test 64-bit location_t.
---
 libcpp/include/cpplib.h   |  6 ++-
 libcpp/include/line-map.h | 48 +++
 gcc/testsuite/g++.dg/diagnostic/pr77949.C |  5 +-
 gcc/testsuite/g++.dg/modules/loc-prune-4.C|  6 +--
 .../plugin/expensive_selftests_plugin.cc  | 10 ++--
 .../gcc.dg/plugin/location_overflow_plugin.cc |  5 ++
 6 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index e73f77e67d8..73dd97df747 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -255,8 +255,10 @@ struct GTY(()) cpp_identifier {
spelling;
 };
 
-/* A preprocessing token.  This has been carefully packed and should
-   occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts.  */
+/* A preprocessing token.  This occupies 32 bytes on a 64-bit host.  On a
+   32-bit host it occupies 20 or 24 bytes, depending whether a uint64_t
+   requires 4- or 8-byte alignment.  */
+
 struct GTY(()) cpp_token {
 
   /* Location of first char of token, together with range of full token.  */
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 96fdf60644f..19fd64b9363 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -291,7 +291,10 @@ enum lc_reason
 
To further see how location_t works in practice, see the
worked example in libcpp/location-example.txt.  */
-typedef unsigned int location_t;
+
+/* A 64-bit type to represent a location.  We only use 63 of the 64 bits, so
+   that two location_t can be safely subtracted and stored in an int64_t.  */
+typedef uint64_t location_t;
 typedef int64_t location_diff_t;
 
 /* Sometimes we need a type that has the same size as location_t but that does
@@ -302,24 +305,31 @@ typedef location_t line_map_uint_t;
 /* Do not track column numbers higher than this one.  As a result, the
range of column_bits is [12, 18] (or 0 if column numbers are
disabled).  */
-const unsigned int LINE_MAP_MAX_COLUMN_NUMBER = (1U << 12);
+const unsigned int LINE_MAP_MAX_COLUMN_NUMBER = (1U << 31) - 1;
 
 /* Do not pack ranges if locations get higher than this.
If you change this, update:
  gcc.dg/plugin/location-overflow-test-*.c.  */
-const location_t LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES = 0x5000;
+const location_t LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES
+  = location_t (0x5000) << 31;
 
 /* Do not track column numbers if locations get higher than this.
If you change this, update:
  gcc.dg/plugin/location-overflow-test-*.c.  */
-const location_t LINE_MAP_MAX_LOCATION_WITH_COLS = 0x6000;
+const location_t LINE_MAP_MAX_LOCATION_WITH_COLS
+  = location_t (0x6000) << 31;
+
+/* Highest possible source location encoded within an ordinary map.  Higher
+   values up to MAX_LOCATION_T represent macro virtual locations.  */
+const location_t LINE_MAP_MAX_LOCATION = location_t (0x7000) << 31;
 
-/* Highest possible source location encoded within an ordinary map.  */
-const location_t LINE_MAP_MAX_LOCATION = 0x7000;
+/* This is the highest possible source location encoded within an
+   ordinary or macro map.  */
+const location_t MAX_LOCATION_T = location_t (-1) >> 2;
 
 /* This is the number of range bits suggested to enable, if range tracking is
desired.  */
-const int line_map_suggested_range_bits = 5;
+const int line_map_suggested_range_bits = 7;
 
 /* A range of source locations.
 
@@ -397,7 +407,7 @@ typedef size_t (*line_map_round_alloc_size_func) (size_t);
 struct GTY((tag ("0"), desc ("MAP_ORDINAR

[PATCH v3 2/7] final: Fix call to INSN_LOCATION on a NOTE rtl

2024-12-01 Thread Lewis Hyatt
This patch was previously discussed at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/670354.html

I have attempted to fix it per the feedback in the above thread. Note that
this version is a change in behavior. In my v2 patch, I changed it so as to
preserve the exact existing behavior -- if a NOTE_INSN is encountered with a
null this_block, then it ends up calling change_scope(). In v3, following
the suggestions, it now just continues on instead. Please let me know if I
misunderstood anything there.

-- >8 --

This function has a code path that calls INSN_LOCATION on an rtl note. For a
note, this returns the note type enum rather than a location, but it runs
without complaint even with --enable-checking=rtl because both are stored in
the rt_int member of the rtunion. A subsequent commit will add a new rtl
format code specifically for locations, in which case attempting to call
INSN_LOCATION on a note will trigger an error. Fix it up by handling the
case of a note missing a location separately.

gcc/ChangeLog:

* final.cc (reemit_insn_block_notes): Don't try to call
INSN_LOCATION on a NOTE rtl object. Don't call change_scope () for a
NOTE missing a location.
---
 gcc/final.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/final.cc b/gcc/final.cc
index ea08c4956fb..1fe34c55853 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -1513,6 +1513,8 @@ reemit_insn_block_notes (void)
  case NOTE_INSN_BEGIN_STMT:
  case NOTE_INSN_INLINE_ENTRY:
this_block = LOCATION_BLOCK (NOTE_MARKER_LOCATION (insn));
+   if (!this_block)
+ continue;
goto set_cur_block_to_this_block;
 
  default:
@@ -1538,7 +1540,6 @@ reemit_insn_block_notes (void)
this_block = choose_inner_scope (this_block,
 insn_scope (body->insn (i)));
}
-set_cur_block_to_this_block:
   if (! this_block)
{
  if (INSN_LOCATION (insn) == UNKNOWN_LOCATION)
@@ -1547,6 +1548,7 @@ reemit_insn_block_notes (void)
this_block = DECL_INITIAL (cfun->decl);
}
 
+set_cur_block_to_this_block:
   if (this_block != cur_block)
{
  change_scope (insn, cur_block, this_block);


[PATCH v3 0/7] Support for 64-bit location_t

2024-12-01 Thread Lewis Hyatt
Hello-

Here is v3 of the 64-bit location_t series. Many of the v2 patches have
already been approved and pushed (those that are preparatory and don't
change any functionality.) In this series, patches 3 and 7 have already been
acked and should not need another review. Patches 1, 2, and 6 are revised
portions of v2 patches that hopefully have correctly addressed the feedback
received. Patch 4 is new to v3 as libgdiagnostics has been merged since v2
was prepared. Finally, patch 5 actually makes the change to 64-bit and still
needs to be reviewed.

Thanks for the help so far!

-Lewis

zero *: was approved in v2
one * : was reviewed in v2, revisions in v3 need review
two * : not reviewed yet

*  1/7: middle-end: Handle resized PHI nodes in loop_version()
*  2/7: final: Fix call to INSN_LOCATION on a NOTE rtl
   3/7: Support for 64-bit location_t: RTL parts
** 4/7: Support for 64-bit location_t: libdiagnostics parts
** 5/7: Support for 64-bit location_t: Activate 64-bit location_t
*  6/7: Support for 64-bit location_t: gimple parts
   7/7: Support for 64-bit location_t: Remove -flarge-source-files


[PATCH v3 4/7] Support for 64-bit location_t: libgdiagnostics parts

2024-12-01 Thread Lewis Hyatt
This patch is new in v3 and is a small change to libgdiagnostics similar to
other changes required by 64-bit location_t.

-- >8 --

Tweak libgdiagnostics.cc, which is necessarily sensitive to line-map
internals, to support 64-bit location_t as well.

gcc/ChangeLog:

* libgdiagnostics.cc (struct diagnostic_manager): Use location_t(-1)
instead of UINT_MAX to support 64-bit location_t as well.
(diagnostic_manager::diagnostic_manager): Change hard-coded "5" to
line_map_suggested_range_bits.
---
 gcc/libgdiagnostics.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/libgdiagnostics.cc b/gcc/libgdiagnostics.cc
index e5cee0958f9..53a8423f904 100644
--- a/gcc/libgdiagnostics.cc
+++ b/gcc/libgdiagnostics.cc
@@ -320,7 +320,7 @@ public:
 linemap_init (&m_line_table, BUILTINS_LOCATION);
 m_line_table.m_reallocator = xrealloc;
 m_line_table.m_round_alloc_size = round_alloc_size;
-m_line_table.default_range_bits = 5;
+m_line_table.default_range_bits = line_map_suggested_range_bits;
   }
   ~diagnostic_manager ()
   {
@@ -500,7 +500,7 @@ private:
   impl_client_version_info m_client_version_info;
   std::vector> m_sinks;
   hash_map m_str_to_file_map;
-  hash_map,
+  hash_map,
   diagnostic_physical_location *> m_location_t_map;
   std::vector> m_logical_locs;
   const diagnostic *m_current_diag;


[PATCH v3 7/7] Support for 64-bit location_t: Remove -flarge-source-files

2024-12-01 Thread Lewis Hyatt
This patch was already approved in v2. I have included it here because

a) It should not be applied until after the rest of the series, since
   the option is useful as long as location_t is 32-bit.

b) The previous version neglected to regenerate common.opt.urls, which I
   have corrected here.

-- >8 --

The option -flarge-source-files became unnecessary with 64-bit location_t
and harms performance compared to the new default setting, so silently
ignore it.

gcc/ChangeLog:

* common.opt: Mark -flarge-source-files as Ignored.
* common.opt.urls: Regenerate.
* doc/invoke.texi: Remove -flarge-source-files.
* toplev.cc (process_options): Remove support for
-flarge-source-files.
---
 gcc/common.opt  |  5 ++---
 gcc/common.opt.urls |  3 ---
 gcc/doc/invoke.texi | 17 +
 gcc/toplev.cc   |  3 ---
 4 files changed, 3 insertions(+), 25 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index bb226ac61e6..79f51be1d60 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1808,9 +1808,8 @@ Common Undocumented Var(flag_keep_gc_roots_live) 
Optimization
 ; Always keep a pointer to a live memory block
 
 flarge-source-files
-Common Var(flag_large_source_files) Init(0)
-Improve GCC's ability to track column numbers in large source files,
-at the expense of slower compilation.
+Common Ignore
+Does nothing.  Preserved for backward compatibility.
 
 flate-combine-instructions
 Common Var(flag_late_combine_instructions) Optimization Init(0)
diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index e9e818d86de..43255a7d12a 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -721,9 +721,6 @@ 
UrlSuffix(gcc/Optimize-Options.html#index-fgraphite-identity)
 fhoist-adjacent-loads
 UrlSuffix(gcc/Optimize-Options.html#index-fhoist-adjacent-loads)
 
-flarge-source-files
-UrlSuffix(gcc/Preprocessor-Options.html#index-flarge-source-files)
-
 flate-combine-instructions
 UrlSuffix(gcc/Optimize-Options.html#index-flate-combine-instructions)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fa2532f437b..7023b30ec15 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -693,7 +693,7 @@ Objective-C and Objective-C++ Dialects}.
 -dD  -dI  -dM  -dN  -dU
 -fdebug-cpp  -fdirectives-only  -fdollars-in-identifiers
 -fexec-charset=@var{charset}  -fextended-identifiers
--finput-charset=@var{charset}  -flarge-source-files
+-finput-charset=@var{charset}
 -fmacro-prefix-map=@var{old}=@var{new} -fmax-include-depth=@var{depth}
 -fno-canonical-system-headers  -fpch-deps  -fpch-preprocess
 -fpreprocessed  -ftabstop=@var{width}  -ftrack-macro-expansion
@@ -18748,21 +18748,6 @@ This option may be useful in conjunction with the 
@option{-B} or
 perform additional processing of the program source between
 normal preprocessing and compilation.
 
-@opindex flarge-source-files
-@item -flarge-source-files
-Adjust GCC to expect large source files, at the expense of slower
-compilation and higher memory usage.
-
-Specifically, GCC normally tracks both column numbers and line numbers
-within source files and it normally prints both of these numbers in
-diagnostics.  However, once it has processed a certain number of source
-lines, it stops tracking column numbers and only tracks line numbers.
-This means that diagnostics for later lines do not include column numbers.
-It also means that options like @option{-Wmisleading-indentation} cease to work
-at that point, although the compiler prints a note if this happens.
-Passing @option{-flarge-source-files} significantly increases the number
-of source lines that GCC can process before it stops tracking columns.
-
 @end table
 
 @node Assembler Options
diff --git a/gcc/toplev.cc b/gcc/toplev.cc
index d4a4add29f9..370d7f39f21 100644
--- a/gcc/toplev.cc
+++ b/gcc/toplev.cc
@@ -1765,9 +1765,6 @@ process_options ()
 hash_table_sanitize_eq_limit
   = param_hash_table_verification_limit;
 
-  if (flag_large_source_files)
-line_table->default_range_bits = 0;
-
   diagnose_options (&global_options, &global_options_set, UNKNOWN_LOCATION);
 
   /* Please don't change global_options after this point, those changes won't


[PATCH 2/2] RISC-V: Add intrinsics testcases for SiFive Xsfvfnrclipxfqf extensions.

2024-12-01 Thread shiyulong
From: yulong 

This commit adds testcases for Xsfvfnrclipxfqf.

Co-Authored by: Jiawei Chen 
Co-Authored by: Shihua Liao 
Co-Authored by: Yixuan Chen 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_xu_f_qf.c: New test.
---
 .../riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c  | 606 ++
 .../riscv/rvv/xsfvector/sf_vfnrclip_xu_f_qf.c | 605 +
 2 files changed, 1211 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_xu_f_qf.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
new file mode 100644
index 000..813f7860f64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
@@ -0,0 +1,606 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_xsfvfnrclipxfqf -mabi=lp64d -O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "riscv_vector.h"
+
+/*
+** test_sf_vfnrclip_x_f_qf_i8mf8_vint8mf8_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+vint8mf8_t test_sf_vfnrclip_x_f_qf_i8mf8_vint8mf8_t(vfloat32mf2_t vs2, float 
rs1, size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf_i8mf8(vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_i8mf4_vint8mf4_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+vint8mf4_t test_sf_vfnrclip_x_f_qf_i8mf4_vint8mf4_t(vfloat32m1_t vs2, float 
rs1, size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf_i8mf4(vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_i8mf2_vint8mf2_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+vint8mf2_t test_sf_vfnrclip_x_f_qf_i8mf2_vint8mf2_t(vfloat32m2_t vs2, float 
rs1, size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf_i8mf2(vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_i8m1_vint8m1_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+vint8m1_t test_sf_vfnrclip_x_f_qf_i8m1_vint8m1_t(vfloat32m4_t vs2, float rs1, 
size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf_i8m1(vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_i8m2_vint8m2_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+vint8m2_t test_sf_vfnrclip_x_f_qf_i8m2_vint8m2_t(vfloat32m8_t vs2, float rs1, 
size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf_i8m2(vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_i8mf8_m_vint8mf8_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+,v0.t
+** ...
+*/
+vint8mf8_t test_sf_vfnrclip_x_f_qf_i8mf8_m_vint8mf8_t(vbool64_t mask, 
vfloat32mf2_t vs2, float rs1, size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf_i8mf8_m(mask, vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_i8mf4_m_vint8mf4_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+,v0.t
+** ...
+*/
+vint8mf4_t test_sf_vfnrclip_x_f_qf_i8mf4_m_vint8mf4_t(vbool32_t mask, 
vfloat32m1_t vs2, float rs1, size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf_i8mf4_m(mask, vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_i8mf2_m_vint8mf2_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+,v0.t
+** ...
+*/
+vint8mf2_t test_sf_vfnrclip_x_f_qf_i8mf2_m_vint8mf2_t(vbool16_t mask, 
vfloat32m2_t vs2, float rs1, size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf_i8mf2_m(mask, vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_i8m1_m_vint8m1_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+,v0.t
+** ...
+*/
+vint8m1_t test_sf_vfnrclip_x_f_qf_i8m1_m_vint8m1_t(vbool8_t mask, vfloat32m4_t 
vs2, float rs1, size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf_i8m1_m(mask, vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_i8m2_m_vint8m2_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+,v0.t
+** ...
+*/
+vint8m2_t test_sf_vfnrclip_x_f_qf_i8m2_m_vint8m2_t(vbool4_t mask, vfloat32m8_t 
vs2, float rs1, size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf_i8m2_m(mask, vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_vint8mf8_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+vint8mf8_t test_sf_vfnrclip_x_f_qf_vint8mf8_t(vfloat32mf2_t vs2, float rs1, 
size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf(vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_vint8mf4_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+vint8mf4_t test_sf_vfnrclip_x_f_qf_vint8mf4_t(vfloat32m1_t vs2, float rs1, 
size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf(vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_vint8mf2_t:
+** ...
+** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
+** ...
+*/
+vint8mf2_t test_sf_vfnrclip_x_f_qf_vint8mf2_t(vfloat32m2_t vs2, float rs1, 
size_t vl) {
+return __riscv_sf_vfnrclip_x_f_qf(vs2, rs1, vl);
+}
+
+/*
+** test_sf_vfnrclip_x_f_qf_vint8m1_t:
+** ..

[PATCH 0/2] RISC-V: Add intrinsics support and testcases for SiFive Xsfvfnrclipxfqf extension.

2024-12-01 Thread shiyulong
From: yulong 

This patch implements the Sifvie vendor extension Xsfvfnrclipxfqf[1]
 support to gcc. Providing support for FP32-to-int8 Ranged Clip
 instrctions.

[1] https://www.sifive.com/document-file/fp32-to-int8-ranged-clip-instructions

Co-Authored by: Jiawei Chen 
Co-Authored by: Shihua Liao 
Co-Authored by: Yixuan Chen 

yulong (2):
  RISC-V: Add intrinsics support for SiFive Xsfvfnrclipxfqf extensions.
  RISC-V: Add intrinsics testcases for SiFive Xsfvfnrclipxfqf
extensions.

 gcc/config/riscv/generic-vector-ooo.md|   2 +-
 gcc/config/riscv/genrvv-type-indexer.cc   |  10 +
 .../riscv/riscv-vector-builtins-bases.cc  |   6 -
 .../riscv/riscv-vector-builtins-bases.h   |   6 +
 .../riscv/riscv-vector-builtins-shapes.cc |  28 +
 .../riscv/riscv-vector-builtins-shapes.h  |   1 +
 gcc/config/riscv/riscv-vector-builtins.cc |  51 +-
 gcc/config/riscv/riscv-vector-builtins.def|  31 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   7 +
 gcc/config/riscv/riscv.md |   3 +-
 .../riscv/sifive-vector-builtins-bases.cc |  52 ++
 .../riscv/sifive-vector-builtins-bases.h  |   2 +
 .../sifive-vector-builtins-functions.def  |   4 +
 gcc/config/riscv/sifive-vector.md |  20 +
 gcc/config/riscv/vector-iterators.md  |  30 +-
 .../riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c  | 606 ++
 .../riscv/rvv/xsfvector/sf_vfnrclip_xu_f_qf.c | 605 +
 17 files changed, 1425 insertions(+), 39 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_xu_f_qf.c

-- 
2.34.1



[PATCH 1/2] RISC-V: Add intrinsics support for SiFive Xsfvfnrclipxfqf extensions.

2024-12-01 Thread shiyulong
From: yulong 

This commit adds intrinsics support for XXsfvfnrclipxfqf. We also redefine
 the enum type frm_op_type in riscv-vector-builtins-bases.h file, because it
 be used in sifive-vector-builtins-bases.cc file.

Co-Authored by: Jiawei Chen 
Co-Authored by: Shihua Liao 
Co-Authored by: Yixuan Chen 

gcc/ChangeLog:

* config/riscv/generic-vector-ooo.md: New reservation.
* config/riscv/genrvv-type-indexer.cc (main): New type.
* config/riscv/riscv-vector-builtins-bases.cc (enum frm_op_type): 
Delete it.
* config/riscv/riscv-vector-builtins-bases.h (enum frm_op_type): 
Redefine in h file.
* config/riscv/riscv-vector-builtins-shapes.cc (struct 
sf_vfnrclip_def): New function.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TYPE_INDEX): New 
builtins def.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE_INDEX): New base 
def.
(signed_eew8_index): Ditto.
* config/riscv/riscv-vector-builtins.h (enum required_ext): New 
extension.
(required_ext_to_isa_name): Ditto.
(required_extensions_specified): Ditto.
(struct function_group_info): Ditto.
* config/riscv/riscv.md: New attr.
* config/riscv/sifive-vector-builtins-bases.cc (class 
sf_vfnrclip_x_f_qf): New function.
(class sf_vfnrclip_xu_f_qf): Ditto.
(BASE): New base_name.
* config/riscv/sifive-vector-builtins-bases.h: New function_base.
* config/riscv/sifive-vector-builtins-functions.def 
(REQUIRED_EXTENSIONS): New intrinsics def.
(sf_vfnrclip_x_f_qf): Ditto.
(sf_vfnrclip_xu_f_qf): Ditto.
* config/riscv/sifive-vector.md (@pred_sf_vfnrclip_x_f_qf): 
New RTL mode.
* config/riscv/vector-iterators.md: New iterator.

---
 gcc/config/riscv/generic-vector-ooo.md|  2 +-
 gcc/config/riscv/genrvv-type-indexer.cc   | 10 
 .../riscv/riscv-vector-builtins-bases.cc  |  6 ---
 .../riscv/riscv-vector-builtins-bases.h   |  6 +++
 .../riscv/riscv-vector-builtins-shapes.cc | 28 ++
 .../riscv/riscv-vector-builtins-shapes.h  |  1 +
 gcc/config/riscv/riscv-vector-builtins.cc | 51 --
 gcc/config/riscv/riscv-vector-builtins.def| 31 +--
 gcc/config/riscv/riscv-vector-builtins.h  |  7 +++
 gcc/config/riscv/riscv.md |  3 +-
 .../riscv/sifive-vector-builtins-bases.cc | 52 +++
 .../riscv/sifive-vector-builtins-bases.h  |  2 +
 .../sifive-vector-builtins-functions.def  |  4 ++
 gcc/config/riscv/sifive-vector.md | 20 +++
 gcc/config/riscv/vector-iterators.md  | 30 ++-
 15 files changed, 214 insertions(+), 39 deletions(-)

diff --git a/gcc/config/riscv/generic-vector-ooo.md 
b/gcc/config/riscv/generic-vector-ooo.md
index 132ab039822..bcad36c1a36 100644
--- a/gcc/config/riscv/generic-vector-ooo.md
+++ b/gcc/config/riscv/generic-vector-ooo.md
@@ -69,7 +69,7 @@
 
 ;; Vector float multiplication and FMA.
 (define_insn_reservation "vec_fmul" 6
-  (eq_attr "type" "vfmul,vfwmul,vfmuladd,vfwmuladd,vfwmaccbf16,sf_vqmacc")
+  (eq_attr "type" 
"vfmul,vfwmul,vfmuladd,vfwmuladd,vfwmaccbf16,sf_vqmacc,sf_vfnrclip")
   "vxu_ooo_issue,vxu_ooo_alu")
 
 ;; Vector crypto, assumed to be a generic operation for now.
diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index 8822e101c53..e1eee34237a 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ b/gcc/config/riscv/genrvv-type-indexer.cc
@@ -250,6 +250,7 @@ main (int argc, const char **argv)
   fprintf (fp, "  /*MASK*/ %s,\n", mode.str ().c_str ());
   fprintf (fp, "  /*SIGNED*/ INVALID,\n");
   fprintf (fp, "  /*UNSIGNED*/ INVALID,\n");
+  fprintf (fp, "  /*SIGNED_EEW8_INDEX*/ INVALID,\n");
   for (unsigned eew : {8, 16, 32, 64})
fprintf (fp, "  /*EEW%d_INDEX*/ INVALID,\n", eew);
   fprintf (fp, "  /*SHIFT*/ INVALID,\n");
@@ -316,6 +317,10 @@ main (int argc, const char **argv)
 inttype (sew, lmul_log2, /*unsigned_p*/ false).c_str ());
fprintf (fp, "  /*UNSIGNED*/ %s,\n",
 inttype (sew, lmul_log2, /*unsigned_p*/ true).c_str ());
+   fprintf (fp, "  /*SIGNED_EEW8_INDEX*/ %s,\n",
+same_ratio_eew_type (sew, lmul_log2, 8,
+ /*unsigned_p*/ false, false)
+  .c_str ());
for (unsigned eew : {8, 16, 32, 64})
  fprintf (fp, "  /*EEW%d_INDEX*/ %s,\n", eew,
   same_ratio_eew_type (sew, lmul_log2, eew,
@@ -432,6 +437,7 @@ main (int argc, const char **argv)
 inttype (16, lmul_log2, /*unsigned_p*/ false).c_str ());
fprintf (fp, "  /*UNSIGNED*/ %s,\n",
 inttype (16, lmul_log2, /*unsigned_p*/ true).c_str ());
+   fprintf (fp, "  /*SIGNED_EEW8_INDEX*/ INVALID,

[PATCH v3 6/7] Support for 64-bit location_t: gimple parts

2024-12-01 Thread Lewis Hyatt
This patch was previously discussed here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669437.html

This version addresses that feedback, and I have also moved it in the patch
ordering to be after the change to 64-bit location_t, since it would be
inaccurate prior to that.

-- >8 --

The size of struct gimple increased by 8 bytes with the change in size of
location_t from 32- to 64-bit; adjust the WORD markings in the comments
accordingly. It seems that most of the WORD markings were off by one already,
probably not having been updated after a previous reduction in the size of a
gimple, so they have become retroactively correct again, and only a couple
needed adjustment actually.

Also add an explicit 32-bit padding member to struct gimple for clarity.

gcc/ChangeLog:

* gimple.h (struct gphi): Update word marking comments to reflect
the new size of location_t.
(struct gimple): Likewise. Add gimple::pad2 explicit padding member.
---
 gcc/gimple.h | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/gimple.h b/gcc/gimple.h
index 039ed66eab5..1eb880ab60a 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -259,23 +259,27 @@ struct GTY((desc ("gimple_statement_structure (&%h)"), 
tag ("GSS_BASE"),
  in there.  */
   unsigned int subcode : 16;
 
-  /* UID of this statement.  This is used by passes that want to
+  /* Unused padding.  */
+  unsigned int pad2 : 32;
+
+  /* [ WORD 2 ]
+ UID of this statement.  This is used by passes that want to
  assign IDs to statements.  It must be assigned and used by each
  pass.  By default it should be assumed to contain garbage.  */
   unsigned uid;
 
-  /* [ WORD 2 ]
- Locus information for debug info.  */
-  location_t location;
-
   /* Number of operands in this tuple.  */
   unsigned num_ops;
 
   /* [ WORD 3 ]
+ Locus information for debug info.  */
+  location_t location;
+
+  /* [ WORD 4 ]
  Basic block holding this statement.  */
   basic_block bb;
 
-  /* [ WORD 4-5 ]
+  /* [ WORD 5-6 ]
  Linked lists of gimple statements.  The next pointers form
  a NULL terminated list, the prev pointers are a cyclic list.
  A gimple statement is hence also a double-ended list of
@@ -479,7 +483,7 @@ struct GTY((tag("GSS_PHI")))
   /* [ WORD 8 ]  */
   tree result;
 
-  /* [ WORD 9 ]  */
+  /* [ WORD 9-14 ]  */
   struct phi_arg_d GTY ((length ("%h.nargs"))) args[1];
 };
 


[PATCH v3 3/7] Support for 64-bit location_t: RTL parts

2024-12-01 Thread Lewis Hyatt
This patch was previously approved in v2. I have included it unchanged here
because it cannot be applied until the issue with reemit_insn_block_notes()
addressed by patch v3 2/7 is resolved first.

-- >8 --

Some RTL objects need to store a location_t. Currently, they store it in the
rt_int field of union rtunion, but in a world where location_t could be
64-bit, they need to store it in a larger variable. Unfortunately, rtunion
does not currently have a 64-bit int type for that purpose, so add one. In
order to avoid increasing any overhead when 64-bit locations are not in use,
the new field is dedicated for location_t storage only and has type
"location_t" so it will only be 64-bit if necessary. This necessitates
adding a new RTX format code 'L' for locations. There are very many switch
statements in the codebase that inspect the RTX format code. I took the
approach of finding all of them that handle code 'i' or 'n' and making sure
they handle 'L' too. I am sure that some of these call sites can never see
an 'L' code, but I thought it would be safer and more future-proof to handle
as many as possible, given it's just a line or two to add in most cases.

gcc/ChangeLog:

* rtl.def (DEBUG_INSN): Use new format code 'L' for location_t fields.
(INSN): Likewise.
(JUMP_INSN): Likewise.
(CALL_INSN): Likewise.
(ASM_INPUT): Likewise.
(ASM_OPERANDS): Likewise.
* rtl.h (union rtunion): Add new location_t RT_LOC member for use by
the 'L' format.
(struct rtx_debug_insn): Adjust comment.
(struct rtx_nonjump_insn): Adjust comment.
(struct rtx_call_insn): Adjust comment.
(XLOC): New accessor macro for rtunion::rt_loc.
(X0LOC): Likewise.
(XCLOC): Likewise.
(INSN_LOCATION): Use XLOC instead of XUINT to retrieve a location_t.
(NOTE_MARKER_LOCATION): Likewise for XCUINT -> XCLOC.
(ASM_OPERANDS_SOURCE_LOCATION): Likewise.
(ASM_INPUT_SOURCE_LOCATION):Likewise.
(gen_rtx_ASM_INPUT): Adjust to use sL format instead of si.
(gen_rtx_INSN): Adjust prototype to use location_r rather than int
for the location.
* cfgrtl.cc (force_nonfallthru_and_redirect): Change type of LOC
local variable from int to location_t.
* rtlhash.cc (add_rtx): Support 'L' format in the switch statement.
* var-tracking.cc (loc_cmp): Likewise.
* alias.cc (rtx_equal_for_memref_p): Likewise.
* config/alpha/alpha.cc (summarize_insn): Likewise.
* config/ia64/ia64.cc (rtx_needs_barrier): Likewise.
* config/rs6000/rs6000.cc (rs6000_hash_constant): Likewise.
* cse.cc (hash_rtx): Likewise.
(exp_equiv_p): Likewise.
* cselib.cc (rtx_equal_for_cselib_1): Likewise.
(cselib_hash_rtx): Likewise.
(cselib_expand_value_rtx_1): Likewise.
* emit-rtl.cc (copy_insn_1): Likewise.
(gen_rtx_INSN): Change the location argument from int to location_t,
and call the corresponding gen_rtf_fmt_* function.
* final.cc (leaf_renumber_regs_insn): Support 'L' format in the
switch statement.
* genattrtab.cc (attr_rtx_1): Likewise.
* genemit.cc (gen_exp): Likewise.
* gengenrtl.cc (type_from_format): Likewise.
(accessor_from_format): Likewise.
* gengtype.cc (adjust_field_rtx_def): Likewise.
* genpeep.cc (match_rtx): Likewise; just mark gcc_unreachable() for
now.
* genrecog.cc (find_operand): Support 'L' format in the switch 
statement.
(find_matching_operand): Likewise.
(validate_pattern): Likewise.
* gensupport.cc (subst_pattern_match): Likewise.
(get_alternatives_number): Likewise.
(collect_insn_data): Likewise.
(alter_predicate_for_insn): Likewise.
(alter_constraints): Likewise.
(subst_dup): Likewise.
* jump.cc (rtx_renumbered_equal_p): Likewise.
* loop-invariant.cc (hash_invariant_expr_1): Likewise.
* lra-constraints.cc (operands_match_p): Likewise.
* lra.cc (lra_rtx_hash): Likewise.
* print-rtl.cc (rtx_writer::print_rtx_operand_code_i): Refactor
location_t-relevant code to...
(rtx_writer::print_rtx_operand_code_L): ...new function here.
(rtx_writer::print_rtx_operand): Support 'L' format in the switch 
statement.
* print-rtl.h (rtx_writer::print_rtx_operand_code_L): Add prototype
for new function.
* read-rtl-function.cc (function_reader::read_rtx_operand): Support
'L' format in the switch statement.
(function_reader::read_rtx_operand_i_or_n): Rename to...
(function_reader::read_rtx_operand_inL): ...this, and support 'L' as
well.
* read-rtl.cc (apply_int_iterator): Support 'L' format in the switch
statement.
(rtx_reader::read_rtx_operand): Likewise.
* reload.cc (operands_match_p): Likewise.
* rtl.cc (rtx_for

Re: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2024-12-01 Thread H.J. Lu
On Mon, Dec 2, 2024, 11:16 AM Hongtao Liu  wrote:

> On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu  wrote:
> >
> > For all different modes of all 0s/1s vectors, we can use the single
> widest
> > all 0s/1s vector register for all 0s/1s vector uses in the whole
> function.
> > Add a pass to generate a single widest all 0s/1s vector set instruction
> at
> > entry of the nearest common dominator for basic blocks with all 0s/1s
> > vector uses.  On Linux/x86-64, in cc1plus, this patch reduces the number
> > of vector xor instructions from 4803 to 4714 and pcmpeq instructions from
> > 144 to 142.
> I'm worried that it will affect the rematerialisation of RA and thus
> increase register pressure, can we push to GCC16?
>

Sure.

>
> > This change causes a regression:
> >
> > FAIL: gcc.dg/rtl/x86_64/vector_eq.c
> >
> > without the fix for
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117863
> The fix for this PR looks like the risk is lower and suitable for GCC15.
> >
> > NB: PR target/92080 and PR target/117839 aren't same.  PR target/117839
> > is for vectors of all 0s and all 1s with different sizes and different
> > components.  PR target/92080 is for broadcast of the same component to
> > different vector sizes.  This patch covers only all 0s and all 1s cases
> > of PR target/92080.
> >
> > gcc/
> >
> > PR target/92080
> > PR target/117839
> > * config/i386/i386-features.cc (ix86_rrvl_gate): New.
> > (ix86_place_single_vector_set): Likewise.
> > (ix86_get_vector_load_mode): Likewise.
> > (remove_redundant_vector_load): Likewise.
> > (pass_data_remove_redundant_vector_load): Likewise.
> > (pass_remove_redundant_vector_load): Likewise.
> > (make_pass_remove_redundant_vector_load): Likewise.
> > * config/i386/i386-passes.def: Add
> > pass_remove_redundant_vector_load after
> > pass_remove_partial_avx_dependency.
> > * config/i386/i386-protos.h
> > (make_pass_remove_redundant_vector_load): New.
> >
> > gcc/testsuite/
> >
> > PR target/92080
> > PR target/117839
> > * gcc.target/i386/pr117839-1a.c: New test.
> > * gcc.target/i386/pr117839-1b.c: Likewise.
> > * gcc.target/i386/pr117839-2.c: Likewise.
> > * gcc.target/i386/pr92080-1.c: Likewise.
> > * gcc.target/i386/pr92080-2.c: Likewise.
> > * gcc.target/i386/pr92080-3.c: Likewise.
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  gcc/config/i386/i386-features.cc| 308 
> >  gcc/config/i386/i386-passes.def |   1 +
> >  gcc/config/i386/i386-protos.h   |   2 +
> >  gcc/testsuite/gcc.target/i386/pr117839-1a.c |  35 +++
> >  gcc/testsuite/gcc.target/i386/pr117839-1b.c |   5 +
> >  gcc/testsuite/gcc.target/i386/pr117839-2.c  |  40 +++
> >  gcc/testsuite/gcc.target/i386/pr92080-1.c   |  54 
> >  gcc/testsuite/gcc.target/i386/pr92080-2.c   |  59 
> >  gcc/testsuite/gcc.target/i386/pr92080-3.c   |  48 +++
> >  9 files changed, 552 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr117839-1a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr117839-1b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr117839-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-3.c
> >
> > diff --git a/gcc/config/i386/i386-features.cc
> b/gcc/config/i386/i386-features.cc
> > index 003b003e09c..7d8d260750d 100644
> > --- a/gcc/config/i386/i386-features.cc
> > +++ b/gcc/config/i386/i386-features.cc
> > @@ -3288,6 +3288,314 @@ make_pass_remove_partial_avx_dependency
> (gcc::context *ctxt)
> >return new pass_remove_partial_avx_dependency (ctxt);
> >  }
> >
> > +static bool
> > +ix86_rrvl_gate ()
> > +{
> > +  return (TARGET_SSE2
> > + && optimize
> > + && optimize_function_for_speed_p (cfun));
> > +}
> > +
> > +/* Generate a vector set, DEST = SRC, at entry of the nearest dominator
> > +   for basic block map BBS, which is in the fake loop that contains the
> > +   whole function, so that there is only a single vector set in the
> > +   whole function.   */
> > +
> > +static void
> > +ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs)
> > +{
> > +  basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS,
> bbs);
> > +  while (bb->loop_father->latch
> > +!= EXIT_BLOCK_PTR_FOR_FN (cfun))
> > +bb = get_immediate_dominator (CDI_DOMINATORS,
> > + bb->loop_father->header);
> > +
> > +  rtx set = gen_rtx_SET (dest, src);
> > +
> > +  rtx_insn *insn = BB_HEAD (bb);
> > +  while (insn && !NONDEBUG_INSN_P (insn))
> > +{
> > +  if (insn == BB_END (bb))
> > +   {
> > + insn = NULL;
> > + break;
> > +   }
> > +  insn = NEXT_INSN (insn);
> > +}
> > +
> > +  rtx_insn *set_insn;
> > +

[PATCH] x86: Correct comments for pass_apx_nf_convert

2024-12-01 Thread H.J. Lu
Change pass_rpad to pass_apx_nf_convert in pass_apx_nf_convert comments.

* config/i386/i386-features.cc (pass_apx_nf_convert): Change
pass_rpad to pass_apx_nf_convert in comments.

-- 
H.J.
From e4dd0075aa998d522edd0da552d60a942eaae78a Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 2 Dec 2024 13:10:46 +0800
Subject: [PATCH] x86: Correct comments for pass_apx_nf_convert

Change pass_rpad to pass_apx_nf_convert in pass_apx_nf_convert comments.

	* config/i386/i386-features.cc (pass_apx_nf_convert): Change
	pass_rpad to pass_apx_nf_convert in comments.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386-features.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 003b003e09c..814c233e95d 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -3440,7 +3440,7 @@ public:
 {
   return ix86_apx_nf_convert ();
 }
-}; // class pass_rpad
+}; // class pass_apx_nf_convert
 
 } // anon namespace
 
-- 
2.47.1



[PATCH] RISC-V: Introduce vector lowering of VEC_PERM_EXPR for large vector types

2024-12-01 Thread Dusan Stojkovic
This patch introduces partial vectorization support for VEC_PERM_EXPR
when vector types specified are not supported on some architectures
because of their size.

Take, for instance, this vector type:
typedef int32_t vnx32si __attribute__ ((vector_size (128)));

For -march=rv64gcv_zvl256b GCC doesn't vectorize the following code:
__attribute__ ((noipa)) void permute_vnx32si (vnx32si values1,
   vnx32si values2,
   vnx32si *out) {
   vnx32si v =
__builtin_shufflevector (values1, values2, 1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 1, 1, 1, 1);
*(vnx32si *) out = v;
}

GCC produces 1xlw and 32xsw instructions, reading values1[1]
and storing it using a scalar instruction.

With this patch applied the resulting assembly would look like:
permute_vnx32si:
vsetivlizero,8,e32,m1,ta,ma
vle32.v v2,0(a0)
addisp,sp,-128
addia3,a2,32
addia4,a2,64
addia5,a2,96
vrgather.vi v1,v2,1
vse32.v v1,0(a2)
vse32.v v1,0(a3)
vse32.v v1,0(a4)
vse32.v v1,0(a5)
addisp,sp,128
jr  ra

With this patch vectorization of VEC_PERM_EXPR is possible for indexes
which are bounded by split_type. split_type is the biggest vector type
which is supported based on can_vec_perm_var_p. In this case it is
8 elements * 32 element size = 256 bits.

The optimization works by iterating through the vector and sorting
element values in two groups: values which are in the
[0, split_elements) range they can be vectorized by reading from the
first address of vector and ones which are greater than split_elements.
Since there are two vectors to choose from for lowering the valid values
for the second vector are [elements, split_elements).
The values which are not in their respectable range are handled
using scalar instructions.

Values out of range are set to 0 in the original vector and are written
to twice: first when the vector store instruction writes the value 0
initially, second when the scalar store corrects the error.

There is a condition, however, when this approach produces poor code.
The worst case is when all indexes are out of range. This would produce
something like:
vsetivlizero,8,e32,m1,ta,ma # config
vle32.v v2,0(a0)  # One load for start of vector
addia3,a2,off # off = i*split_elements,
  # i   = [0,elements/split_elemets - 1)
  # totaling div - 1  addi for vse32
vrgather.vi   # v1,v2,1 # singular
vse32.v v1,0(a[i])  # store for each
# i = [0,elements/split_elemets - 1)
# totaling div  vse32

+ a scalar load and multiple stores for each element.

Counting up the vector code inserted together: 2 * div + 1
if all insns cost 1. This is the reasoning behind arbitrary constraint:
s_assignments.length () / 2 > elements - (2 * div + 1)
For if there are a greater number of scalar assignments the code would
produce redundant vector instructions.

Tested on risc-v 64 and risc-v 32, no regressions.

PS. In the vector instruction example there are two add instructions
working on the stack pointer register. I'm not quite sure about the
purpose of these instructions.

2024-31-11  Dusan Stojkovic  

  PR target/116425

gcc/ChangeLog:

  * tree-vect-generic.cc (split_lower_vec_perm): New function.
  (lower_vec_perm): New if condition.

gcc/testsuite/ChangeLog:

  * gcc.target/riscv/rvv/autovec/pr116425-run.c: New test.
  * gcc.target/riscv/rvv/autovec/pr116425.c: New test.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 .../riscv/rvv/autovec/pr116425-run.c  |  53 +
 .../gcc.target/riscv/rvv/autovec/pr116425.c   |  54 +
 gcc/tree-vect-generic.cc  | 196 ++
 3 files changed, 303 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116425-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116425.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116425-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116425-run.c
new file mode 100644
index 000..05a7f5a3824
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116425-run

Re: [PATCH] gcc: configure: Fix the optimization flags cleanup

2024-12-01 Thread Jeff Law




On 2/2/24 9:02 AM, Slava Barinov wrote:

Currently sed command in flag cleanup removes all the -O[0-9] flags, ignoring
the context. This leads to issues when the optimization flags is passed to
linker:

CFLAGS="-Os -Wl,-O1 -Wl,--hash-style=gnu"
is converted into
CFLAGS="-Os -Wl,-Wl,--hash-style=gnu"

Which leads to configure failure with ld: unrecognized option '-Wl,-Wl'.

gcc/
* configure.ac: Only remove -O[0-9] if not preceded with comma
* configure: Regenerated
Thanks.  I've bootstrapped and regression tested this on x86 and pushed 
it to the trunk.  Sorry for the insanely long delay.


Jeff



Re: [PATCH] defer test for limits.h existence to runtime [PR80677]

2024-12-01 Thread Helmut Grohne
Hi Jeff,

Thanks for looking at my patch.

On Sun, Dec 01, 2024 at 02:40:47PM -0700, Jeff Law wrote:
> On 4/30/24 12:45 PM, Helmut Grohne wrote:
> > The definition of LIMITS_H_TEST evaluates its existence in
> > BUILD_SYSTEM_HEADER_DIR, but we'd actually need it to check a target
> > version. Hence this check occasionally produces misdetections when build
> > and target differ. In some cases such as cygming, the header is only
> > installed after performing the build. Instead of resolving these
> > situations by guessing, defer the test to the time of use and check for
> > the header using __has_include_next which will use the correct include
> > search path.
> > 
> > 2024-04-30  Helmut Grohne  
> > 
> > PR bootstrap/80677
> >  * gcc/limitx.h: Only #include syslimits.h when another 
> >   exists.
> >  * gcc/limity.h: Only #include limits.h when another 
> >   exists.
> >  * gcc/Makefile.in: Delete LIMITS_H_TEST default and always wrap
> >   limits.h with limitx.h and limity.h.
> > * Makefile.tpl: Drop forwarding of LIMITS_H_TEST
> > * Makefile.in: Regenerate.
> > * gcc/config/i386/t-cygming: Delete unused LIMITS_H_TEST.
> > * gcc/config/t-rtems: Likewise.
> > * gcc/config/t-vxworks: Likewise.
> > * gcc/config/vms/t-vms: Likewise.
> As I noted in the related BZ.  I think there's something more fundamental
> going on here.  The claim that this stuff doesn't work for host != target
> isn't as general as one might think as we do those kind of builds every day.

I agree that the failure is uncommon, but that doesn't make it any less
wrong. There are two ways in which this test can fail.

1. It detects presence of limits.h when it really is absent.
2. It detects absence of limits.h when it really is present.

We see that the test is overridden for rtems, vxworks and vms, hinting
that such misdetections (both forms) actually occur in practice.

To understand what happens when, we need to consider a few different
configurations. When cross building a compiler, BUILD_SYSTEM_HEADER_DIR
becomes CROSS_SYSTEM_HEADER_DIR and things tend to work. In case of a
non-canadian cross toolchain build, it depends on whether a sysroot is
provided. When it is, CROSS_SYSTEM_HEADER_DIR is being used and again
things tend to work. The interesting case is a building a non-canadian
cross toolchain without a sysroot.

Essentially in order to build such a toolchain, you must separate the
headers for different architectures in to different,
architecture-dependent prefixes. That's precisely what Debian's
multiarch is. I'm not sure who else but Debian builds cross toolchains
without a sysroot.

> I would start by first getting the debian multiarch patches upstreamed as I
> get a sense this is related to multiarch.

As a result, your hint a multiarch indeed is relevant here. Still what
happens in that case is that the cross toolchain build looks into
$(BUILD_SYSTEM_HEADER_DIR)/limits.h (which happens to be
$(NATIVE_SYSTEM_HEADER_DIR)/limits.h) to figure out whether the target
architecture has a limits.h. As much as you try to twist this around, it
quite simply is and remains wrong. It just happens that everyone else
does have a limits.h whenever their cross toolchain expects one there or
uses a sysroot, in which case the lookup is redirected to
CROSS_SYSTEM_HEADER_DIR and does something reasonable.

There is a secondary failure going on here. It is the expectation that
your libc installs limits.h directly into *_SYSTEM_HEADER_DIR. While
that is the case in almost all situations (even on Debian presently), I
am in the process of changing this and intend to move limits.h to a
multiarch location, because surprisingly glibc's limits.h is different
from musl's limits.h and as a result, we cannot install them into the
same place /usr/include/limits.h. So your libc may provide limits.h on
the compiler's default search path in a directory that happens not to be
*_SYSTEM_HEADER_DIR. When that happens, LIMITS_H_TEST will detect
absence and we have a false negative detection.

As a result, fixing LIMITS_H_TEST becomes a prerequisite for upstreaming
the multiarch patches.

I also am of the opinion that upstreaming multiarch into gcc would be
good as I had to rebase those patches more than once. Those patches
predate my involvement and I do not understand why they have been
rejected earlier. Given how difficult it is to get even the most simple
patches (e.g. fixing an ICE in __has_include_next) into gcc. I do not
dare to drive this process without a mentor. In case someone else wants,
I offer my support with the Debian side. But then, we very much need
this patch first, so that's kind a a chicken&egg problem now.

Helmut