Re: [PATCH 1/2] bpf: Implementation of BPF CO-RE builtins

2023-08-03 Thread Jose E. Marchesi via Gcc-patches


> This patch updates the support for the BPF CO-RE builtins
> __builtin_preserve_access_index and __builtin_preserve_field_info,
> and adds support for the CO-RE builtins __builtin_btf_type_id,
> __builtin_preserve_type_info and __builtin_preserve_enum_value.
>
> These CO-RE relocations are now converted to __builtin_core_reloc which
> abstracts all of the original builtins in a polymorphic relocation
> specific builtin.
>
> The builtin processing is now split in 2 stages, the first (pack) is
> executed right after the front-end and the second (process) right before
> the asm output.
>
> In expand pass the __builtin_core_reloc is converted to a
> unspec:UNSPEC_CORE_RELOC rtx entry.
>
> The data required to process the builtin is now collected in the packing
> stage (after front-end), not allowing the compiler to optimize any of
> the relevant information required to compose the relocation when
> necessary.
> At expansion, that information is recovered and CTF/BTF is queried to
> construct the information that will be used in the relocation.
> At this point the relocation is added to specific section and the
> builtin is expanded to the expected default value for the builtin.
>
> In order to process __builtin_preserve_enum_value, it was necessary to
> hook the front-end to collect the original enum value reference.
> This is needed since the parser folds all the enum values to its
> integer_cst representation.
>
> More details can be found within the core-builtins.cc.
>
> Regtested in host x86_64-linux-gnu and target bpf-unknown-none.
> ---
>  gcc/config.gcc|4 +-
>  gcc/config/bpf/bpf-passes.def |   20 -
>  gcc/config/bpf/bpf-protos.h   |4 +-
>  gcc/config/bpf/bpf.cc |  817 +-
>  gcc/config/bpf/bpf.md |   17 +
>  gcc/config/bpf/core-builtins.cc   | 1397 +
>  gcc/config/bpf/core-builtins.h|   36 +
>  gcc/config/bpf/coreout.cc |   50 +-
>  gcc/config/bpf/coreout.h  |   13 +-
>  gcc/config/bpf/t-bpf  |6 +-
>  gcc/doc/extend.texi   |   51 +
>  ...core-builtin-fieldinfo-const-elimination.c |   29 +
>  12 files changed, 1639 insertions(+), 805 deletions(-)
>  delete mode 100644 gcc/config/bpf/bpf-passes.def
>  create mode 100644 gcc/config/bpf/core-builtins.cc
>  create mode 100644 gcc/config/bpf/core-builtins.h
>  create mode 100644 
> gcc/testsuite/gcc.target/bpf/core-builtin-fieldinfo-const-elimination.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index eba69a463be0..c521669e78b1 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1597,8 +1597,8 @@ bpf-*-*)
>  use_collect2=no
>  extra_headers="bpf-helpers.h"
>  use_gcc_stdint=provide
> -extra_objs="coreout.o"
> -target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc"
> +extra_objs="coreout.o core-builtins.o"
> +target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc 
> \$(srcdir)/config/bpf/core-builtins.cc"
>  ;;
>  cris-*-elf | cris-*-none)
>   tm_file="elfos.h newlib-stdint.h ${tm_file}"
> diff --git a/gcc/config/bpf/bpf-passes.def b/gcc/config/bpf/bpf-passes.def
> deleted file mode 100644
> index deeaee988a01..
> --- a/gcc/config/bpf/bpf-passes.def
> +++ /dev/null
> @@ -1,20 +0,0 @@
> -/* Declaration of target-specific passes for eBPF.
> -   Copyright (C) 2021-2023 Free Software Foundation, Inc.
> -
> -   This file is part of GCC.
> -
> -   GCC is free software; you can redistribute it and/or modify it
> -   under the terms of the GNU General Public License as published by
> -   the Free Software Foundation; either version 3, or (at your option)
> -   any later version.
> -
> -   GCC is distributed in the hope that it will be useful, but
> -   WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   General Public License for more details.
> -
> -   You should have received a copy of the GNU General Public License
> -   along with GCC; see the file COPYING3.  If not see
> -   .  */
> -
> -INSERT_PASS_AFTER (pass_df_initialize_opt, 1, pass_bpf_core_attr);
> diff --git a/gcc/config/bpf/bpf-protos.h b/gcc/config/bpf/bpf-protos.h
> index b484310e8cbf..fbcf5111eb21 100644
> --- a/gcc/config/bpf/bpf-protos.h
> +++ b/gcc/config/bpf/bpf-protos.h
> @@ -30,7 +30,7 @@ extern void bpf_print_operand_address (FILE *, rtx);
>  extern void bpf_expand_prologue (void);
>  extern void bpf_expand_epilogue (void);
>  extern void bpf_expand_cbranch (machine_mode, rtx *);
> -
> -rtl_opt_pass * make_pass_bpf_core_attr (gcc::context *);
> +const char *bpf_add_core_reloc (rtx *operands, const char *templ);
> +void bpf_process_move_operands (rtx *operands);
>  
>  #endif /* ! GCC_BPF_PROTOS_H */
> diff 

Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.

2023-08-03 Thread Richard Biener via Gcc-patches
On Wed, Aug 2, 2023 at 4:08 PM Manolis Tsamis  wrote:
>
> Hi all,
>
> I'm pinging to discuss again if we want to move this forward for GCC14.
>
> I did some testing again and I haven't been able to find obvious
> regressions, including testing the code from PR86270 and PR70359 that
> Richard mentioned.
> I still believe that zero can be considered a special case even for
> hardware that doesn't directly benefit in the comparison.
> For example it happens that the testcase from the commit compiles to
> one instruction less in x86:
>
> .LFB0:
> movl(%rdi), %eax
> leal1(%rax), %edx
> movl%edx, (%rdi)
> testl%eax, %eax
> je.L4
> ret
> .L4:
> jmpg
>
> vs
>
> .LFB0:
> movl(%rdi), %eax
> addl$1, %eax
> movl%eax, (%rdi)
> cmpl$1, %eax
> je.L4
> ret
> .L4:
> xorl%eax, %eax
> jmpg
>
> (The xorl is not emitted  when testl is used. LLVM uses testl but also
> does xor eax, eax :) )
> Although this is accidental, I believe it also showcases that zero is
> a preferential value in various ways.
>
> I'm running benchmarks comparing the effects of this change and I'm
> also still looking for testcases that result in problematic
> regressions.
> Any feedback or other concerns about this are appreciated!

My comment from Apr 24th still holds, IMO this is something for
instruction selection (aka the ISEL pass) or the out-of-SSA tweaks
we do during RTL expansion (see insert_backedge_copies)

Richard.

> Thanks,
> Manolis
>
> On Wed, Apr 26, 2023 at 9:43 AM Richard Biener
>  wrote:
> >
> > On Wed, Apr 26, 2023 at 4:30 AM Jeff Law  wrote:
> > >
> > >
> > >
> > > On 4/25/23 01:21, Richard Biener wrote:
> > > > On Tue, Apr 25, 2023 at 1:05 AM Jeff Law  wrote
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On 4/24/23 02:06, Richard Biener via Gcc-patches wrote:
> > > >>> On Fri, Apr 21, 2023 at 11:01 PM Philipp Tomsich
> > > >>>  wrote:
> > > 
> > >  Any guidance on the next steps for this patch?
> > > >>>
> > > >>> I think we want to perform this transform later, in particular when
> > > >>> the test is a loop exit test we do not want to do it as it prevents
> > > >>> coalescing of the IV on the backedge at out-of-SSA time.
> > > >>>
> > > >>> That means doing the transform in folding and/or before inlining
> > > >>> (the test could become a loop exit test) would be a no-go.  In fact
> > > >>> for SSA coalescing we'd want the reverse transform in some cases, see
> > > >>> PRs 86270 and 70359.
> > > >>>
> > > >>> If we can reliably undo for the loop case I suppose we can do the
> > > >>> canonicalization to compare against zero.  In any case please split
> > > >>> up the patch (note
> > > >> I've also
> > > >>> hoped we could eventually get rid of that part of
> > > >>> tree-ssa-forwprop.cc
> > > >> in favor
> > > >>> of match.pd patterns since it uses GENERIC folding :/).
> > > >>>
> > > >> Do we have enough information to do this at expansion time?  That would
> > > >> avoid introducing the target dependencies to drive this in gimple.
> > > >
> > > > I think so, but there isn't any convenient place to do this I think.  I 
> > > > suppose
> > > > there's no hope to catch it during RTL opts?
> > > Combine would be the most natural place in the RTL pipeline, but it'd be
> > > a 2->2 combination which would be rejected.
> > >
> > > We could possibly do it as a define_insn_and_split, but the gimple->RTL
> > > interface seems like a better fit to me.  If TER has done its job, we
> > > should see a complex enough tree node to do the right thing.
> >
> > Of course we'd want to get rid of TER in favor of ISEL
> >
> > Richard.
> >
> > > jeff


Re: [PATCH] gcc-13/changes.html: Add and fix URL to -fstrict-flex-array option.

2023-08-03 Thread Richard Biener via Gcc-patches
On Wed, 2 Aug 2023, Qing Zhao wrote:

> Ping.
> 
> This is a very simple patch to correct a URL address in GCC13?s changes.html.
> Currently, it?s pointing to a wrong address.
> 
> Okay for committing? 

OK

> > On Jul 21, 2023, at 3:02 PM, Qing Zhao  wrote:
> > 
> > Hi,
> > 
> > In the current GCC13 release note, the URL to the option -fstrict-flex-array
> > is wrong (pointing to -Wstrict-flex-array).
> > This is the change to correct the URL and also add the URL in another place
> > where -fstrict-flex-array is mentioned.
> > 
> > I have checked the resulting HTML file, works well.
> > 
> > Okay for committing?
> > 
> > thanks.
> > 
> > Qing
> > ---
> > htdocs/gcc-13/changes.html | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > index 68e8c5cc..39b63a84 100644
> > --- a/htdocs/gcc-13/changes.html
> > +++ b/htdocs/gcc-13/changes.html
> > @@ -46,7 +46,7 @@ You may also want to check out our
> >   will no longer issue warnings for out of
> >   bounds accesses to trailing struct members of one-element array type
> >   anymore. Instead it diagnoses accesses to trailing arrays according to
> > -  -fstrict-flex-arrays. 
> > +   > href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C-Dialect-Options.html#index-fstrict-flex-arrays";>-fstrict-flex-arrays.
> >  
> >  > href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Static-Analyzer-Options.html";>-fanalyzer
> >   is still only suitable for analyzing C code.
> >   In particular, using it on C++ is unlikely to give meaningful 
> > output.
> > @@ -213,7 +213,7 @@ You may also want to check out our
> >  flexible array member for the purpose of accessing the elements of such
> >  an array. By default, all trailing arrays in aggregates are treated as
> >  flexible array members. Use the new command-line option
> > -  > href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Warning-Options.html#index-Wstrict-flex-arrays";>-fstrict-flex-arrays
> > +  > href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C-Dialect-Options.html#index-fstrict-flex-arrays";>-fstrict-flex-arrays
> >  to control which array members are treated as flexible arrays.
> >  
> > 
> > -- 
> > 2.31.1
> > 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[x86 PATCH] Split SUBREGs of SSE vector registers into vec_select insns.

2023-08-03 Thread Roger Sayle

This patch is the final piece in the series to improve the ABI issues
affecting PR 88873.  The previous patches tackled inserting DFmode
values into V2DFmode registers, by introducing insvti_{low,high}part
patterns.  This patch improves the extraction of DFmode values from
v2DFmode registers via TImode intermediates.

I'd initially thought this would require new extvti_{low,high}part
patterns to be defined, but all that's required is to recognize that
the SUBREG idioms produced by combine are equivalent to (forms of)
vec_select patterns.  The target-independent middle-end can't be sure
that the appropriate vec_select instruction exists on the target,
hence doesn't canonicalize a SUBREG of a vector mode as a vec_select,
but the backend can provide a define_split stating where and when
this is useful, for example, considering whether the operand is in
memory, or whether !TARGET_SSE_MATH and the destination is i387.

For pr88873.c, gcc -O2 -march=cascadelake currently generates:

foo:vpunpcklqdq %xmm3, %xmm2, %xmm7
vpunpcklqdq %xmm1, %xmm0, %xmm6
vpunpcklqdq %xmm5, %xmm4, %xmm2
vmovdqa %xmm7, -24(%rsp)
vmovdqa %xmm6, %xmm1
movq-16(%rsp), %rax
vpinsrq $1, %rax, %xmm7, %xmm4
vmovapd %xmm4, %xmm6
vfmadd132pd %xmm1, %xmm2, %xmm6
vmovapd %xmm6, -24(%rsp)
vmovsd  -16(%rsp), %xmm1
vmovsd  -24(%rsp), %xmm0
ret

with this patch, we now generate:

foo:vpunpcklqdq %xmm1, %xmm0, %xmm6
vpunpcklqdq %xmm3, %xmm2, %xmm7
vpunpcklqdq %xmm5, %xmm4, %xmm2
vmovdqa %xmm6, %xmm1
vfmadd132pd %xmm7, %xmm2, %xmm1
vmovsd  %xmm1, %xmm1, %xmm0
vunpckhpd   %xmm1, %xmm1, %xmm1
ret

The improvement is even more dramatic when compared to the original
29 instructions shown in comment #8.  GCC 13, for example, required
12 transfers to/from memory.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-08-03  Roger Sayle  

gcc/ChangeLog
* config/i386/sse.md (define_split): Convert highpart:DF extract
from V2DFmode register into a sse2_storehpd instruction.
(define_split): Likewise, convert lowpart:DF extract from V2DF
register into a sse2_storelpd instruction.

gcc/testsuite/ChangeLog
* gcc.target/i386/pr88873.c: Tweak to check for improved code.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 35fd66e..bc419ff 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -13554,6 +13554,14 @@
   [(set_attr "type" "ssemov")
(set_attr "mode" "V2SF,V4SF,V2SF")])
 
+;; Convert highpart SUBREG in sse2_storehpd or *vec_extractv2df_1_sse.
+(define_split
+  [(set (match_operand:DF 0 "register_operand")
+   (subreg:DF (match_operand:V2DF 1 "register_operand") 8))]
+  "TARGET_SSE"
+  [(set (match_dup 0)
+   (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))])
+
 ;; Avoid combining registers from different units in a single alternative,
 ;; see comment above inline_secondary_memory_needed function in i386.cc
 (define_insn "sse2_storelpd"
@@ -13599,6 +13607,14 @@
   [(set_attr "type" "ssemov")
(set_attr "mode" "V2SF,V4SF,V2SF")])
 
+;; Convert lowpart SUBREG into sse2_storelpd or *vec_extractv2df_0_sse.
+(define_split
+  [(set (match_operand:DF 0 "register_operand")
+   (subreg:DF (match_operand:V2DF 1 "register_operand") 0))]
+  "TARGET_SSE"
+  [(set (match_dup 0)
+   (vec_select:DF (match_dup 1) (parallel [(const_int 0)])))])
+
 (define_expand "sse2_loadhpd_exp"
   [(set (match_operand:V2DF 0 "nonimmediate_operand")
(vec_concat:V2DF
diff --git a/gcc/testsuite/gcc.target/i386/pr88873.c 
b/gcc/testsuite/gcc.target/i386/pr88873.c
index d893aac..a3a7ef2 100644
--- a/gcc/testsuite/gcc.target/i386/pr88873.c
+++ b/gcc/testsuite/gcc.target/i386/pr88873.c
@@ -9,3 +9,5 @@ s_t foo (s_t a, s_t b, s_t c)
 }
 
 /* { dg-final { scan-assembler-times "vpunpcklqdq" 3 } } */
+/* { dg-final { scan-assembler "vunpckhpd" } } */
+/* { dg-final { scan-assembler-not "rsp" } } */


Re: [PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2023-08-03 Thread Richard Biener via Gcc-patches
On Mon, Jul 10, 2023 at 9:12 PM Qing Zhao via Gcc-patches
 wrote:
>
> Hi,
>
> This is the change for the GCC14 releaes Notes on the deprecating of a C
> extension about flexible array members.
>
> Okay for committing?
>
> thanks.
>
> Qing
>
> 
>
> *htdocs/gcc-14/changes.html (Caveats): Add notice about deprecating a C
> extension about flexible array members.
> ---
>  htdocs/gcc-14/changes.html | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index 3f797642..c7f2ce4d 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -30,7 +30,15 @@ a work-in-progress.
>  
>  Caveats
>  
> -  ...
> +  C:
> +  Support for the GCC extension, a structure containing a C99 flexible 
> array
> +  member, or a union containing such a structure, is not the last field 
> of
> +  another structure, is deprecated. Refer to
> +  https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html";>
> +  Zero Length Arrays.
> +  Any code relying on this extension should be modifed to ensure that
> +  C99 flexible array members only end up at the ends of structures.

If it's deprecated any use should be diagnosed by default with a
warning, can you
mention that and make sure we do so?  What would be the most surprising
example of code that's going to be rejected?

Richard.

> +  
>  
>
>
> --
> 2.31.1
>


Introduce -msmp to select /lib_smp/ on ppc-vx6

2023-08-03 Thread Alexandre Oliva via Gcc-patches


The .spec files used for linking on ppc-vx6, when the rtp-smp runtime
is selected, add -L flags for /lib_smp/ and /lib/.

There was a problem, though: although /lib_smp/ and /lib/ were to be
searched in this order, and the specs files do that correctly, the
compiler would search /lib/ first regardless, because
STARTFILE_PREFIX_SPEC said so, and specs files cannot override that.

With this patch, we arrange for the presence of -msmp to affect
STARTFILE_PREFIX_SPEC, so that the compiler searches /lib_smp/ rather
than /lib/ for crt files.  A separate patch for GNAT ensures that when
the rtp-smp runtime is selected, -msmp is passed to the compiler
driver for linking, along with the --specs flags.

Preapproved by Olivier Hainque.  I'm checking this in.


for  gcc/ChangeLog

* config/vxworks-smp.opt: New.  Introduce -msmp.
* config.gcc: Enable it on powerpc* vxworks prior to 7r*.
* config/rs6000/vxworks.h (STARTFILE_PREFIX_SPEC): Choose
lib_smp when -msmp is present in the command line.
* doc/invoke.texi: Document it.
---
 gcc/config.gcc  |2 +-
 gcc/config/rs6000/vxworks.h |2 +-
 gcc/config/vxworks-smp.opt  |   25 +
 gcc/doc/invoke.texi |9 -
 4 files changed, 35 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/vxworks-smp.opt

diff --git a/gcc/config.gcc b/gcc/config.gcc
index eba69a463be0d..7438e0be4a2c0 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3046,7 +3046,7 @@ powerpc*-wrs-vxworks7r*)
 powerpc-wrs-vxworks*)
tm_file="${tm_file} elfos.h gnu-user.h freebsd-spec.h rs6000/sysv4.h"
tmake_file="${tmake_file} rs6000/t-fprules rs6000/t-ppccomm 
rs6000/t-vxworks"
-   extra_options="${extra_options} rs6000/sysv4.opt"
+   extra_options="${extra_options} rs6000/sysv4.opt vxworks-smp.opt"
extra_headers="${extra_headers} ppc-asm.h"
case ${target} in
   *-vxworksmils*)
diff --git a/gcc/config/rs6000/vxworks.h b/gcc/config/rs6000/vxworks.h
index 690e92439b94f..f38b4bd1dff2f 100644
--- a/gcc/config/rs6000/vxworks.h
+++ b/gcc/config/rs6000/vxworks.h
@@ -206,7 +206,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #undef  STARTFILE_PREFIX_SPEC
 #define STARTFILE_PREFIX_SPEC  \
- "%{mrtp:%{!shared:/lib/usr/lib/ppc/PPC32/common}}"
+ "%{mrtp:%{!shared:/lib%{msmp:_smp}/usr/lib/ppc/PPC32/common}}"
 
 /* For aggregates passing, use the same, consistent ABI as Linux.  */
 #define AGGREGATE_PADDING_FIXED 0
diff --git a/gcc/config/vxworks-smp.opt b/gcc/config/vxworks-smp.opt
new file mode 100644
index 0..5ef1521634ab5
--- /dev/null
+++ b/gcc/config/vxworks-smp.opt
@@ -0,0 +1,25 @@
+; Options for VxWorks configurations where smp/!smp variants of the
+; system libraries are installed in separate locations.
+;
+; Copyright (C) 2023 Free Software Foundation, Inc.
+; Contributed by AdaCore.
+;
+; This file is part of GCC.
+;
+; GCC is free software; you can redistribute it and/or modify it under
+; the terms of the GNU General Public License as published by the Free
+; Software Foundation; either version 3, or (at your option) any later
+; version.
+;
+; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+; FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+; for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with GCC; see the file COPYING3.  If not see
+; .
+
+msmp
+Driver Target RejectNegative
+Select VxWorks SMP C runtimes for linking.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 104766f446d11..adb10a3528dae 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1403,7 +1403,7 @@ See RS/6000 and PowerPC Options.
 -mpointer-size=@var{size}}
 
 @emph{VxWorks Options}
-@gccoptlist{-mrtp  -non-static  -Bstatic  -Bdynamic
+@gccoptlist{-mrtp  -msmp  -non-static  -Bstatic  -Bdynamic
 -Xbind-lazy  -Xbind-now}
 
 @emph{x86 Options}
@@ -32442,6 +32442,13 @@ GCC can generate code for both VxWorks kernels and 
real time processes
 (RTPs).  This option switches from the former to the latter.  It also
 defines the preprocessor macro @code{__RTP__}.
 
+@opindex msmp
+@item -msmp
+Select SMP runtimes for linking.  Not available on architectures other
+than PowerPC, nor on VxWorks version 7 or later, in which the selection
+is part of the VxWorks build configuration and the library paths are the
+same for either choice.
+
 @opindex non-static
 @item -non-static
 Link an RTP executable against shared libraries rather than static


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 

Re: [PATCH] s390: Enable vect_bswap test cases

2023-08-03 Thread Andreas Krebbel via Gcc-patches
On 8/3/23 08:48, Stefan Schulze Frielinghaus wrote:
> This enables the following tests which rely on instruction vperm which
> is available since z13 with the initial vector support.
> 
> testsuite/gcc.dg/vect/vect-bswap16.c
> 42:/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
> target { vect_bswap || sse4_runtime } } } } */
> 
> testsuite/gcc.dg/vect/vect-bswap32.c
> 42:/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
> target { vect_bswap || sse4_runtime } } } } */
> 
> testsuite/gcc.dg/vect/vect-bswap64.c
> 42:/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
> target { vect_bswap || sse4_runtime } } } } */
> 
> Ok for mainline?

Ok. Thanks!

Andreas

> 
> gcc/testsuite/ChangeLog:
> 
>   * lib/target-supports.exp (check_effective_target_vect_bswap):
>   Add s390.
> ---
>  gcc/testsuite/lib/target-supports.exp | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 4d04df2a709..2ccc0291442 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -7087,9 +7087,11 @@ proc check_effective_target_whole_vector_shift { } {
>  
>  proc check_effective_target_vect_bswap { } {
>  return [check_cached_effective_target_indexed vect_bswap {
> -  expr { [istarget aarch64*-*-*]
> -  || [is-effective-target arm_neon]
> -  || [istarget amdgcn-*-*] }}]
> +  expr { ([istarget aarch64*-*-*]
> +   || [is-effective-target arm_neon]
> +   || [istarget amdgcn-*-*])
> +  || ([istarget s390*-*-*]
> +  && [check_effective_target_s390_vx]) }}]
>  }
>  
>  # Return 1 if the target supports comparison of bool vectors for at



Re: [PATCH] s390: Try to emit vlbr/vstbr instead of vperm et al.

2023-08-03 Thread Andreas Krebbel via Gcc-patches
On 8/3/23 08:51, Stefan Schulze Frielinghaus wrote:
> Bootstrapped and regtested on s390x.  Ok for mainline?
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.cc (expand_perm_as_a_vlbr_vstbr_candidate):
>   New function which handles bswap patterns for vec_perm_const.
>   (vectorize_vec_perm_const_1): Call new function.
>   * config/s390/vector.md (*bswap): Fix operands in output
>   template.
>   (*vstbr): New insn.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/s390.exp: Add subdirectory vxe2.
>   * gcc.target/s390/vxe2/vlbr-1.c: New test.
>   * gcc.target/s390/vxe2/vstbr-1.c: New test.
>   * gcc.target/s390/vxe2/vstbr-2.c: New test.

Ok. Thanks!

Andreas


> ---
>  gcc/config/s390/s390.cc  | 55 
>  gcc/config/s390/vector.md| 16 --
>  gcc/testsuite/gcc.target/s390/s390.exp   |  3 ++
>  gcc/testsuite/gcc.target/s390/vxe2/vlbr-1.c  | 29 +++
>  gcc/testsuite/gcc.target/s390/vxe2/vstbr-1.c | 29 +++
>  gcc/testsuite/gcc.target/s390/vxe2/vstbr-2.c | 42 +++
>  6 files changed, 170 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/vxe2/vlbr-1.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/vxe2/vstbr-1.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/vxe2/vstbr-2.c
> 
> diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> index d9f10542473..91eb9232b10 100644
> --- a/gcc/config/s390/s390.cc
> +++ b/gcc/config/s390/s390.cc
> @@ -17698,6 +17698,58 @@ expand_perm_with_vstbrq (const struct 
> expand_vec_perm_d &d)
>return false;
>  }
>  
> +/* Try to emit vlbr/vstbr.  Note, this is only a candidate insn since
> +   TARGET_VECTORIZE_VEC_PERM_CONST operates on vector registers only.  Thus,
> +   either fwprop, combine et al. "fixes" one of the input/output operands 
> into
> +   a memory operand or a splitter has to reverse this into a general vperm
> +   operation.  */
> +
> +static bool
> +expand_perm_as_a_vlbr_vstbr_candidate (const struct expand_vec_perm_d &d)
> +{
> +  static const char perm[4][MAX_VECT_LEN]
> += { { 1,  0,  3,  2,  5,  4,  7, 6, 9,  8,  11, 10, 13, 12, 15, 14 },
> + { 3,  2,  1,  0,  7,  6,  5, 4, 11, 10, 9,  8,  15, 14, 13, 12 },
> + { 7,  6,  5,  4,  3,  2,  1, 0, 15, 14, 13, 12, 11, 10, 9,  8  },
> + { 15, 14, 13, 12, 11, 10, 9, 8, 7,  6,  5,  4,  3,  2,  1,  0  } };
> +
> +  if (!TARGET_VXE2 || d.vmode != V16QImode || d.op0 != d.op1)
> +return false;
> +
> +  if (memcmp (d.perm, perm[0], MAX_VECT_LEN) == 0)
> +{
> +  rtx target = gen_rtx_SUBREG (V8HImode, d.target, 0);
> +  rtx op0 = gen_rtx_SUBREG (V8HImode, d.op0, 0);
> +  emit_insn (gen_bswapv8hi (target, op0));
> +  return true;
> +}
> +
> +  if (memcmp (d.perm, perm[1], MAX_VECT_LEN) == 0)
> +{
> +  rtx target = gen_rtx_SUBREG (V4SImode, d.target, 0);
> +  rtx op0 = gen_rtx_SUBREG (V4SImode, d.op0, 0);
> +  emit_insn (gen_bswapv4si (target, op0));
> +  return true;
> +}
> +
> +  if (memcmp (d.perm, perm[2], MAX_VECT_LEN) == 0)
> +{
> +  rtx target = gen_rtx_SUBREG (V2DImode, d.target, 0);
> +  rtx op0 = gen_rtx_SUBREG (V2DImode, d.op0, 0);
> +  emit_insn (gen_bswapv2di (target, op0));
> +  return true;
> +}
> +
> +  if (memcmp (d.perm, perm[3], MAX_VECT_LEN) == 0)
> +{
> +  rtx target = gen_rtx_SUBREG (V1TImode, d.target, 0);
> +  rtx op0 = gen_rtx_SUBREG (V1TImode, d.op0, 0);
> +  emit_insn (gen_bswapv1ti (target, op0));
> +  return true;
> +}
> +
> +  return false;
> +}
>  
>  /* Try to find the best sequence for the vector permute operation
> described by D.  Return true if the operation could be
> @@ -17720,6 +17772,9 @@ vectorize_vec_perm_const_1 (const struct 
> expand_vec_perm_d &d)
>if (expand_perm_with_rot (d))
>  return true;
>  
> +  if (expand_perm_as_a_vlbr_vstbr_candidate (d))
> +return true;
> +
>return false;
>  }
>  
> diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> index 21bec729efa..f0e9ed3d263 100644
> --- a/gcc/config/s390/vector.md
> +++ b/gcc/config/s390/vector.md
> @@ -47,6 +47,7 @@
>  (define_mode_iterator VI_HW [V16QI V8HI V4SI V2DI])
>  (define_mode_iterator VI_HW_QHS [V16QI V8HI V4SI])
>  (define_mode_iterator VI_HW_HSD [V8HI  V4SI V2DI])
> +(define_mode_iterator VI_HW_HSDT [V8HI V4SI V2DI V1TI TI])
>  (define_mode_iterator VI_HW_HS  [V8HI  V4SI])
>  (define_mode_iterator VI_HW_QH  [V16QI V8HI])
>  
> @@ -2876,12 +2877,12 @@
>   (use (match_dup 2))])]
>"TARGET_VX"
>  {
> -  static char p[4][16] =
> +  static const char p[4][16] =
>  { { 1,  0,  3,  2,  5,  4,  7, 6, 9,  8,  11, 10, 13, 12, 15, 14 },   /* 
> H */
>{ 3,  2,  1,  0,  7,  6,  5, 4, 11, 10, 9,  8,  15, 14, 13, 12 },   /* 
> S */
>{ 7,  6,  5,  4,  3,  2,  1, 0, 15, 14, 13, 12, 11, 10, 9,  8  },   /* 
> D */
>{ 15, 14, 13, 12, 11, 10, 9, 8, 7,  6,  5,  4,  3, 

Re: [PATCH] Fix PR 110874: infinite loop in gimple_bitwise_inverted_equal_p with fre

2023-08-03 Thread Richard Biener via Gcc-patches
On Thu, Aug 3, 2023 at 4:34 AM Andrew Pinski via Gcc-patches
 wrote:
>
> So I didn't expect valueization to cause calling gimple_nop_convert
> to iterate between 2 different SSA names causing an infinite loop
> in gimple_bitwise_inverted_equal_p.
> So we should cause a bound on gimple_bitwise_inverted_equal_p calling
> gimple_nop_convert and only look through one rather than always.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
> PR tree-optimization/110874
> * gimple-match-head.cc (gimple_bitwise_inverted_equal_p):
> Add new argument, again with default value of true.
> Don't try gimple_nop_convert if again is false.
> Update call to gimple_bitwise_inverted_equal_p for
> new argument.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/110874
> * gcc.c-torture/compile/pr110874-a.c: New test.
> ---
>  gcc/gimple-match-head.cc| 14 +-
>  .../gcc.c-torture/compile/pr110874-a.c  | 17 +
>  2 files changed, 26 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr110874-a.c
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index b1e96304d7c..e91aaab86dd 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -273,7 +273,7 @@ gimple_bitwise_equal_p (tree expr1, tree expr2, tree 
> (*valueize) (tree))
>  /* Helper function for bitwise_equal_p macro.  */
>
>  static inline bool
> -gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, tree (*valueize) 
> (tree))
> +gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, tree (*valueize) 
> (tree), bool again = true)
>  {
>if (expr1 == expr2)
>  return false;
> @@ -285,12 +285,16 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree 
> expr2, tree (*valueize) (tree)
>  return false;
>
>tree other;
> -  if (gimple_nop_convert (expr1, &other, valueize)
> -  && gimple_bitwise_inverted_equal_p (other, expr2, valueize))
> +  if (again
> +  && gimple_nop_convert (expr1, &other, valueize)
> +  && other != expr1
> +  && gimple_bitwise_inverted_equal_p (other, expr2, valueize, false))
>  return true;
>
> -  if (gimple_nop_convert (expr2, &other, valueize)
> -  && gimple_bitwise_inverted_equal_p (expr1, other, valueize))
> +  if (again
> +  && gimple_nop_convert (expr2, &other, valueize)
> +  && other != expr2
> +  && gimple_bitwise_inverted_equal_p (expr1, other, valueize, false))
>  return true;

Hmm, I don't think this tests all three relevant combinations?  I think the way
gimple_bitwise_equal_p handles this is better (not recursing).  I'd split out
the "tail" matching the BIT_NOT to another helper, I suppose that could
even be a (match ...) pattern here.

>if (TREE_CODE (expr1) != SSA_NAME
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr110874-a.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr110874-a.c
> new file mode 100644
> index 000..b314410a892
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr110874-a.c
> @@ -0,0 +1,17 @@
> +struct S1 {
> +  unsigned f0;
> +};
> +static int g_161;
> +void func_109(unsigned g_227, unsigned t) {
> +  struct S1 l_178;
> +  int l_160 = 0x1FAE99D5L;
> +  int *l_230[] = {&l_160};
> +  if (l_160) {
> +for (l_178.f0 = -7; l_178.f0;) {
> +  ++g_227;
> +  break;
> +}
> +(g_161) = g_227;
> +  }
> +  (g_161) &= t;
> +}
> --
> 2.31.1
>


Re: [COMMITTEDv3] tree-optimization: [PR100864] `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-08-03 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 3 Aug 2023 at 02:54, Andrew Pinski  wrote:
>
> On Wed, Aug 2, 2023 at 10:14 AM Andrew Pinski  wrote:
> >
> > On Wed, Aug 2, 2023 at 10:13 AM Prathamesh Kulkarni via Gcc-patches
> >  wrote:
> > >
> > > On Mon, 31 Jul 2023 at 22:39, Andrew Pinski via Gcc-patches
> > >  wrote:
> > > >
> > > > This is a new version of the patch.
> > > > Instead of doing the matching of inversion comparison directly inside
> > > > match, creating a new function (bitwise_inverted_equal_p) to do it.
> > > > It is very similar to bitwise_equal_p that was added in 
> > > > r14-2751-g2a3556376c69a1fb
> > > > but instead it says `expr1 == ~expr2`. A follow on patch, will
> > > > use this function in other patterns where we try to match `@0` and 
> > > > `(bit_not @0)`.
> > > >
> > > > Changed the name bitwise_not_equal_p to bitwise_inverted_equal_p.
> > > >
> > > > Committed as approved after a Bootstrapped and test on x86_64-linux-gnu 
> > > > with no regressions.
> > > Hi Andrew,
> > > Unfortunately, this patch (committed in
> > > 2bae476b511dc441bf61da8a49cca655575e7dd6) causes
> > > segmentation fault for pr33133.c on aarch64-linux-gnu because of
> > > infinite recursion.
> >
> > A similar issue is recorded as PR 110874 which I am debugging right now.
>
> Yes the issue is the same and is solved by the same patch.
That's great, thanks for the heads up!

Thanks,
Prathamesh
>
> Thanks,
> Andrew
>
> >
> > Thanks,
> > Andrew
> >
> > >
> > > Running the test under gdb shows:
> > > Program received signal SIGSEGV, Segmentation fault.
> > > operand_compare::operand_equal_p (this=0x29dc680
> > > , arg0=0xf7789a68, arg1=0xf7789f30,
> > > flags=16) at ../../gcc/gcc/fold-const.cc:3088
> > > 3088{
> > > (gdb) bt
> > > #0  operand_compare::operand_equal_p (this=0x29dc680
> > > , arg0=0xf7789a68, arg1=0xf7789f30,
> > > flags=16) at ../../gcc/gcc/fold-const.cc:3088
> > > #1  0x00a90394 in operand_compare::verify_hash_value
> > > (this=this@entry=0x29dc680 ,
> > > arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
> > > flags=flags@entry=0, ret=ret@entry=0xfc000157)
> > > at ../../gcc/gcc/fold-const.cc:4074
> > > #2  0x00a9351c in operand_compare::verify_hash_value
> > > (ret=0xfc000157, flags=0, arg1=0xf7789f30,
> > > arg0=0xf7789a68, this=0x29dc680 ) at
> > > ../../gcc/gcc/fold-const.cc:4072
> > > #3  operand_compare::operand_equal_p (this=this@entry=0x29dc680
> > > , arg0=arg0@entry=0xf7789a68,
> > > arg1=arg1@entry=0xf7789f30, flags=flags@entry=0) at
> > > ../../gcc/gcc/fold-const.cc:3090
> > > #4  0x00a9791c in operand_equal_p
> > > (arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
> > > flags=flags@entry=0) at ../../gcc/gcc/fold-const.cc:4105
> > > #5  0x01d38dd0 in gimple_bitwise_inverted_equal_p
> > > (expr1=0xf7789a68, expr2=0xf7789f30, valueize=
> > > 0x112d698 ) at
> > > ../../gcc/gcc/gimple-match-head.cc:284
> > > #6  0x01d38e80 in gimple_bitwise_inverted_equal_p
> > > (expr1=0xf7789a68, expr2=0xf77d0240,
> > > valueize=0x112d698 ) at
> > > ../../gcc/gcc/gimple-match-head.cc:296
> > > #7  0x01d38e80 in gimple_bitwise_inverted_equal_p
> > > (expr1=0xf7789a68, expr2=0xf7789f30,
> > > valueize=0x112d698 ) at
> > > ../../gcc/gcc/gimple-match-head.cc:296
> > > #8  0x01d38e80 in gimple_bitwise_inverted_equal_p
> > > (expr1=0xf7789a68, expr2=0xf77d0240,
> > > ...
> > >
> > > It seems to recurse cyclically with expr2=0xf7789f30 ->
> > > expr2=0xf77d0240 eventually leading to segfault.
> > > while expr1=0xf7789a68 remains same throughout the stack frames.
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > PR tree-optimization/100864
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * generic-match-head.cc (bitwise_inverted_equal_p): New 
> > > > function.
> > > > * gimple-match-head.cc (bitwise_inverted_equal_p): New macro.
> > > > (gimple_bitwise_inverted_equal_p): New function.
> > > > * match.pd ((~x | y) & x): Use bitwise_inverted_equal_p
> > > > instead of direct matching bit_not.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/tree-ssa/bitops-3.c: New test.
> > > > ---
> > > >  gcc/generic-match-head.cc| 42 ++
> > > >  gcc/gimple-match-head.cc | 71 
> > > >  gcc/match.pd |  5 +-
> > > >  gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c | 67 ++
> > > >  4 files changed, 183 insertions(+), 2 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> > > >
> > > > diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> > > > index a71c0727b0b..ddaf22f2179 100644
> > > > --- a/gcc/generic-match-head.cc
> > > > +++ b/gcc/generic-match-head.cc
> > > > @@ -121,3 +121,45 @@ bitwise_equal_p (tree expr1, tree expr2)

[PATCH 00/10] x86: (mainly) "prefix_extra" adjustments

2023-08-03 Thread Jan Beulich via Gcc-patches
Having noticed various bogus uses, I thought I'd go through and audit
them all. This is the result, with some other attributes also adjusted
as noticed in the process. (I think this tidying also is a good thing
to have ahead of APX further complicating insn length calculations.)

01: "prefix_extra" tidying
02: "sse4arg" adjustments
03: "ssemuladd" adjustments
04: "prefix_extra" can't really be "2"
05: replace/correct bogus "prefix_extra"
06: drop stray "prefix_extra"
07: add (adjust) XOP insn attributes
08: add missing "prefix" attribute to VF{,C}MULC
09: correct "length_immediate" in a few cases
10: drop redundant "prefix_data16" attributes

Jan


[PATCH 01/10] x86: "prefix_extra" tidying

2023-08-03 Thread Jan Beulich via Gcc-patches
Drop SSE5 leftovers from both its comment and its default calculation.
A value of 2 simply cannot occur anymore. Instead extend the comment to
mention the use of the attribute in "length_vex", clarifying why
"prefix_extra" can actually be meaningful on VEX-encoded insns despite
those not having any real prefixes except possibly segment overrides.

gcc/

* config/i386/i386.md (prefix_extra): Correct comment. Fold
cases yielding 2 into ones yielding 1.
---
I question the 3DNow! aspect here: There's no extra prefix there. It's
an immediate instead which "sub-divides" major opcode 0f0f.

--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -620,13 +620,11 @@
(const_int 0)))
 
 ;; There are also additional prefixes in 3DNOW, SSSE3.
-;; ssemuladd,sse4arg default to 0f24/0f25 and DREX byte,
-;; sseiadd1,ssecvt1 to 0f7a with no DREX byte.
 ;; 3DNOW has 0f0f prefix, SSSE3 and SSE4_{1,2} 0f38/0f3a.
+;; While generally inapplicable to VEX/XOP/EVEX encodings, "length_vex" uses
+;; the attribute evaluating to zero to know that VEX2 encoding may be usable.
 (define_attr "prefix_extra" ""
-  (cond [(eq_attr "type" "ssemuladd,sse4arg")
-  (const_int 2)
-(eq_attr "type" "sseiadd1,ssecvt1")
+  (cond [(eq_attr "type" "ssemuladd,sse4arg,sseiadd1,ssecvt1")
   (const_int 1)
]
(const_int 0)))



[PATCH 02/10] x86: "sse4arg" adjustments

2023-08-03 Thread Jan Beulich via Gcc-patches
Record common properties in other attributes' default calculations:
There's always a 1-byte immediate, and they're always encoded in a VEX3-
like manner (note that "prefix_extra" already evaluates to 1 in this
case). The drop now (or already previously) redundant explicit
attributes, adding "mode" ones where they were missing.

Furthermore use "sse4arg" consistently for all VPCOM* insns; so far
signed comparisons did use it, while unsigned ones used "ssecmp". Note
that while they have (not counting the explicit or implicit immediate
operand) they really only have 3 operands, the operator is also counted
in those patterns. That's relevant for establishing the "memory"
attribute's value, and at the same time benign when there are only
register operands.

Note that despite also having 4 operands, multiply-add insns aren't
affected by this change, as they use "ssemuladd" for "type".

gcc/

* config/i386/i386.md (length_immediate): Handle "sse4arg".
(prefix): Likewise.
(*xop_pcmov_): Add "mode" attribute.
* config/i386/mmx.md (*xop_maskcmp3): Drop "prefix_data16",
"prefix_rep", "prefix_extra", and "length_immediate" attributes.
(*xop_maskcmp_uns3): Likewise. Switch "type" to "sse4arg".
(*xop_pcmov_): Add "mode" attribute.
* config/i386/sse.md (xop_pcmov_): Add "mode"
attribute.
(xop_maskcmp3): Drop "prefix_data16", "prefix_rep",
"prefix_extra", and "length_immediate" attributes.
(xop_maskcmp_uns3): Likewise. Switch "type" to "sse4arg".
(xop_maskcmp_uns23): Drop "prefix_data16", "prefix_extra",
and "length_immediate" attributes. Switch "type" to "sse4arg".
(xop_pcom_tf3): Likewise.
(xop_vpermil23): Drop "length_immediate" attribute.

--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -536,6 +536,8 @@
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
  bitmanip,imulx,msklog,mskmov")
   (const_int 0)
+(eq_attr "type" "sse4arg")
+  (const_int 1)
 (eq_attr "unit" "i387,sse,mmx")
   (const_int 0)
 (eq_attr "type" "alu,alu1,negnot,imovx,ishift,ishiftx,ishift1,
@@ -635,6 +637,8 @@
(const_string "vex")
  (eq_attr "mode" "XI,V16SF,V8DF")
(const_string "evex")
+(eq_attr "type" "sse4arg")
+  (const_string "vex")
 ]
 (const_string "orig")))
 
@@ -23286,7 +23290,8 @@
  (match_operand:MODEF 3 "register_operand" "x")))]
   "TARGET_XOP"
   "vpcmov\t{%1, %3, %2, %0|%0, %2, %3, %1}"
-  [(set_attr "type" "sse4arg")])
+  [(set_attr "type" "sse4arg")
+   (set_attr "mode" "TI")])
 
 ;; These versions of the min/max patterns are intentionally ignorant of
 ;; their behavior wrt -0.0 and NaN (via the commutative operand mark).
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2909,10 +2909,6 @@
   "TARGET_XOP"
   "vpcom%Y1\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr "type" "sse4arg")
-   (set_attr "prefix_data16" "0")
-   (set_attr "prefix_rep" "0")
-   (set_attr "prefix_extra" "2")
-   (set_attr "length_immediate" "1")
(set_attr "mode" "TI")])
 
 (define_insn "*xop_maskcmp3"
@@ -2923,10 +2919,6 @@
   "TARGET_XOP"
   "vpcom%Y1\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr "type" "sse4arg")
-   (set_attr "prefix_data16" "0")
-   (set_attr "prefix_rep" "0")
-   (set_attr "prefix_extra" "2")
-   (set_attr "length_immediate" "1")
(set_attr "mode" "TI")])
 
 (define_insn "*xop_maskcmp_uns3"
@@ -2936,11 +2928,7 @@
  (match_operand:MMXMODEI 3 "register_operand" "x")]))]
   "TARGET_XOP"
   "vpcom%Y1u\t{%3, %2, %0|%0, %2, %3}"
-  [(set_attr "type" "ssecmp")
-   (set_attr "prefix_data16" "0")
-   (set_attr "prefix_rep" "0")
-   (set_attr "prefix_extra" "2")
-   (set_attr "length_immediate" "1")
+  [(set_attr "type" "sse4arg")
(set_attr "mode" "TI")])
 
 (define_insn "*xop_maskcmp_uns3"
@@ -2950,11 +2938,7 @@
  (match_operand:VI_16_32 3 "register_operand" "x")]))]
   "TARGET_XOP"
   "vpcom%Y1u\t{%3, %2, %0|%0, %2, %3}"
-  [(set_attr "type" "ssecmp")
-   (set_attr "prefix_data16" "0")
-   (set_attr "prefix_rep" "0")
-   (set_attr "prefix_extra" "2")
-   (set_attr "length_immediate" "1")
+  [(set_attr "type" "sse4arg")
(set_attr "mode" "TI")])
 
 (define_expand "vec_cmp"
@@ -3144,7 +3128,8 @@
   (match_operand:MMXMODE124 2 "register_operand" "x")))]
   "TARGET_XOP && TARGET_MMX_WITH_SSE"
   "vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "type" "sse4arg")])
+  [(set_attr "type" "sse4arg")
+   (set_attr "mode" "TI")])
 
 (define_insn "*xop_pcmov_"
   [(set (match_operand:VI_16_32 0 "register_operand" "=x")
@@ -3154,7 +3139,8 @@
   (match_operand:VI_16_32 2 "register_operand" "x")))]
   "TARGET_XOP"
   "vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "type" "sse4arg")])
+  [(set_attr "type" "sse4arg")
+   (set_attr "mode" "TI")])
 
 ;; XOP permute instructions
 (define_insn "mmx_pp

[PATCH 03/10] x86: "ssemuladd" adjustments

2023-08-03 Thread Jan Beulich via Gcc-patches
They're all VEX3- (also covering XOP) or EVEX-encoded. Express that in
the default calculation of "prefix". FMA4 insns also all have a 1-byte
immediate operand.

Where the default calculation is not sufficient / applicable, add
explicit "prefix" attributes. While there also add a "mode" attribute to
fma___pair.

gcc/

* config/i386/i386.md (isa): Move up.
(length_immediate): Handle "fma4".
(prefix): Handle "ssemuladd".
* config/i386/sse.md (*fma_fmadd_): Add "prefix" attribute.
(fma_fmadd_):
Likewise.
(_fmadd__mask): Likewise.
(_fmadd__mask3): Likewise.
(fma_fmsub_):
Likewise.
(_fmsub__mask): Likewise.
(_fmsub__mask3): Likewise.
(*fma_fnmadd_): Likewise.
(fma_fnmadd_):
Likewise.
(_fnmadd__mask): Likewise.
(_fnmadd__mask3): Likewise.
(fma_fnmsub_):
Likewise.
(_fnmsub__mask): Likewise.
(_fnmsub__mask3): Likewise.
(fma_fmaddsub_):
Likewise.
(_fmaddsub__mask): Likewise.
(_fmaddsub__mask3): Likewise.
(fma_fmsubadd_):
Likewise.
(_fmsubadd__mask): Likewise.
(_fmsubadd__mask3): Likewise.
(*fmai_fmadd_): Likewise.
(*fmai_fmsub_): Likewise.
(*fmai_fnmadd_): Likewise.
(*fmai_fnmsub_): Likewise.
(avx512f_vmfmadd__mask): Likewise.
(avx512f_vmfmadd__mask3): Likewise.
(avx512f_vmfmadd__maskz_1): Likewise.
(*avx512f_vmfmsub__mask): Likewise.
(avx512f_vmfmsub__mask3): Likewise.
(*avx512f_vmfmsub__maskz_1): Likewise.
(avx512f_vmfnmadd__mask): Likewise.
(avx512f_vmfnmadd__mask3): Likewise.
(avx512f_vmfnmadd__maskz_1): Likewise.
(*avx512f_vmfnmsub__mask): Likewise.
(*avx512f_vmfnmsub__mask3): Likewise.
(*avx512f_vmfnmsub__maskz_1): Likewise.
(*fma4i_vmfmadd_): Likewise.
(*fma4i_vmfmsub_): Likewise.
(*fma4i_vmfnmadd_): Likewise.
(*fma4i_vmfnmsub_): Likewise.
(fma__): Likewise.
(___mask): Likewise.

(avx512fp16_fma_sh_v8hf):
Likewise.
(avx512fp16_sh_v8hf_mask): Likewise.
(xop_p): Likewise.
(xop_pdql): Likewise.
(xop_pdqh): Likewise.
(xop_pwd): Likewise.
(xop_pwd): Likewise.
(fma___pair): Likewise. Add "mode" attribute.

--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -531,12 +531,23 @@
   (const_string "unknown")]
 (const_string "integer")))
 
+;; Used to control the "enabled" attribute on a per-instruction basis.
+(define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
+   x64_avx,x64_avx512bw,x64_avx512dq,aes,
+   sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
+   avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
+   avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
+   avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16,avxifma,
+   avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl"
+  (const_string "base"))
+
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
  bitmanip,imulx,msklog,mskmov")
   (const_int 0)
-(eq_attr "type" "sse4arg")
+(ior (eq_attr "type" "sse4arg")
+ (eq_attr "isa" "fma4"))
   (const_int 1)
 (eq_attr "unit" "i387,sse,mmx")
   (const_int 0)
@@ -637,6 +648,10 @@
(const_string "vex")
  (eq_attr "mode" "XI,V16SF,V8DF")
(const_string "evex")
+(eq_attr "type" "ssemuladd")
+  (if_then_else (eq_attr "isa" "fma4")
+(const_string "vex")
+(const_string "maybe_evex"))
 (eq_attr "type" "sse4arg")
   (const_string "vex")
 ]
@@ -842,16 +857,6 @@
 ;; Define attribute to indicate unaligned ssemov insns
 (define_attr "movu" "0,1" (const_string "0"))
 
-;; Used to control the "enabled" attribute on a per-instruction basis.
-(define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
-   x64_avx,x64_avx512bw,x64_avx512dq,aes,
-   sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
-   avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
-   avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
-   avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16,avxifma,
-   avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl"
-  (const_string "base"))
-
 ;; Define instruction set of MMX instructions
 (define_attr "mmx_isa" "base,native,sse,sse_noavx,avx"
   (const_string "base"))
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5422,6 +5422,7 @@
vfmadd213\t{%3, %2

[PATCH 04/10] x86: "prefix_extra" can't really be "2"

2023-08-03 Thread Jan Beulich via Gcc-patches
In the three remaining instances separate "prefix_0f" and "prefix_rep"
are what is wanted instead.

gcc/

* config/i386/i386.md (rdbase): Add "prefix_0f" and
"prefix_rep". Drop "prefix_extra".
(wrbase): Likewise.
(ptwrite): Likewise.

--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -25914,7 +25914,8 @@
   "TARGET_64BIT && TARGET_FSGSBASE"
   "rdbase\t%0"
   [(set_attr "type" "other")
-   (set_attr "prefix_extra" "2")])
+   (set_attr "prefix_0f" "1")
+   (set_attr "prefix_rep" "1")])
 
 (define_insn "wrbase"
   [(unspec_volatile [(match_operand:SWI48 0 "register_operand" "r")]
@@ -25922,7 +25923,8 @@
   "TARGET_64BIT && TARGET_FSGSBASE"
   "wrbase\t%0"
   [(set_attr "type" "other")
-   (set_attr "prefix_extra" "2")])
+   (set_attr "prefix_0f" "1")
+   (set_attr "prefix_rep" "1")])
 
 (define_insn "ptwrite"
   [(unspec_volatile [(match_operand:SWI48 0 "nonimmediate_operand" "rm")]
@@ -25930,7 +25932,8 @@
   "TARGET_PTWRITE"
   "ptwrite\t%0"
   [(set_attr "type" "other")
-   (set_attr "prefix_extra" "2")])
+   (set_attr "prefix_0f" "1")
+   (set_attr "prefix_rep" "1")])
 
 (define_insn "@rdrand"
   [(set (match_operand:SWI248 0 "register_operand" "=r")



[PATCH 07/10] x86: add (adjust) XOP insn attributes

2023-08-03 Thread Jan Beulich via Gcc-patches
Many were lacking "prefix" and "prefix_extra", some had a bogus value of
2 for "prefix_extra" (presumably inherited from their SSE5 counterparts,
which are long gone) and a meaningless "prefix_data16" one. Where
missing, "mode" attributes are also added. (Note that "sse4arg" and
"ssemuladd" ones don't need further adjustment in this regard.)

gcc/

* config/i386/sse.md (xop_phaddbw): Add "prefix",
"prefix_extra", and "mode" attributes.
(xop_phaddbd): Likewise.
(xop_phaddbq): Likewise.
(xop_phaddwd): Likewise.
(xop_phaddwq): Likewise.
(xop_phadddq): Likewise.
(xop_phsubbw): Likewise.
(xop_phsubwd): Likewise.
(xop_phsubdq): Likewise.
(xop_rotl3): Add "prefix" and "prefix_extra" attributes.
(xop_rotr3): Likewise.
(xop_frcz2): Likewise.
(*xop_vmfrcz2): Likewise.
(xop_vrotl3): Add "prefix" attribute. Change
"prefix_extra" to 1.
(xop_sha3): Likewise.
(xop_shl3): Likewise.

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -24897,7 +24897,10 @@
  (const_int 13) (const_int 15)])]
   "TARGET_XOP"
   "vphaddbw\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sseiadd1")])
+  [(set_attr "type" "sseiadd1")
+   (set_attr "prefix" "vex")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
 
 (define_insn "xop_phaddbd"
   [(set (match_operand:V4SI 0 "register_operand" "=x")
@@ -24926,7 +24929,10 @@
   (const_int 11) (const_int 15)]))]
   "TARGET_XOP"
   "vphaddbd\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sseiadd1")])
+  [(set_attr "type" "sseiadd1")
+   (set_attr "prefix" "vex")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
 
 (define_insn "xop_phaddbq"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
@@ -24971,7 +24977,10 @@
 (parallel [(const_int 7) (const_int 15)])))]
   "TARGET_XOP"
   "vphaddbq\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sseiadd1")])
+  [(set_attr "type" "sseiadd1")
+   (set_attr "prefix" "vex")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
 
 (define_insn "xop_phaddwd"
   [(set (match_operand:V4SI 0 "register_operand" "=x")
@@ -24988,7 +24997,10 @@
  (const_int 5) (const_int 7)])]
   "TARGET_XOP"
   "vphaddwd\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sseiadd1")])
+  [(set_attr "type" "sseiadd1")
+   (set_attr "prefix" "vex")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
 
 (define_insn "xop_phaddwq"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
@@ -25013,7 +25025,10 @@
(parallel [(const_int 3) (const_int 7)]))]
   "TARGET_XOP"
   "vphaddwq\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sseiadd1")])
+  [(set_attr "type" "sseiadd1")
+   (set_attr "prefix" "vex")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
 
 (define_insn "xop_phadddq"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
@@ -25028,7 +25043,10 @@
   (parallel [(const_int 1) (const_int 3)])]
   "TARGET_XOP"
   "vphadddq\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sseiadd1")])
+  [(set_attr "type" "sseiadd1")
+   (set_attr "prefix" "vex")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
 
 (define_insn "xop_phsubbw"
   [(set (match_operand:V8HI 0 "register_operand" "=x")
@@ -25049,7 +25067,10 @@
  (const_int 13) (const_int 15)])]
   "TARGET_XOP"
   "vphsubbw\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sseiadd1")])
+  [(set_attr "type" "sseiadd1")
+   (set_attr "prefix" "vex")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
 
 (define_insn "xop_phsubwd"
   [(set (match_operand:V4SI 0 "register_operand" "=x")
@@ -25066,7 +25087,10 @@
  (const_int 5) (const_int 7)])]
   "TARGET_XOP"
   "vphsubwd\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sseiadd1")])
+  [(set_attr "type" "sseiadd1")
+   (set_attr "prefix" "vex")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
 
 (define_insn "xop_phsubdq"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
@@ -25081,7 +25105,10 @@
   (parallel [(const_int 1) (const_int 3)])]
   "TARGET_XOP"
   "vphsubdq\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sseiadd1")])
+  [(set_attr "type" "sseiadd1")
+   (set_attr "prefix" "vex")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
 
 ;; XOP permute instructions
 (define_insn "xop_pperm"
@@ -25209,6 +25236,8 @@
   "TARGET_XOP"
   "vprot\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseishft")
+   (set_attr "prefix" "vex")
+   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "mode" "TI")])
 
@@ -25224,6 +25253,8 @@
   return \"vprot\t{%3, %1, %0|%0, %1, %3}\";
 }
   [(set_attr "type" "sseishft")
+   (set_attr "prefix" "vex")
+   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "mode" "TI")])
 
@@ -25264,8 +25295,8 @@
   "TARGET_XOP && !(MEM_P (operands[1]) &&

[PATCH 09/10] x86: correct "length_immediate" in a few cases

2023-08-03 Thread Jan Beulich via Gcc-patches
When first added explicitly in 3ddffba914b2 ("i386.md
(sse4_1_round2): Add avx512f alternative"), "*" should not have
been used for the pre-existing alternative. The attribute was plain
missing. Subsequent changes adding more alternatives then generously
extended the bogus pattern.

Apparently something similar happened to the two mmx_pblendvb_* insns.

gcc/

* config/i386/i386.md (sse4_1_round2): Make
"length_immediate" uniformly 1.
* config/i386/mmx.md (mmx_pblendvb_v8qi): Likewise.
(mmx_pblendvb_): Likewise.

--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -21594,7 +21594,7 @@
vrndscale\t{%2, %1, %d0|%d0, %1, %2}"
   [(set_attr "type" "ssecvt")
(set_attr "prefix_extra" "1,1,1,*,*")
-   (set_attr "length_immediate" "*,*,*,1,1")
+   (set_attr "length_immediate" "1")
(set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
(set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
(set_attr "avx_partial_xmm_update" "false,false,true,false,true")
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -3094,7 +3094,7 @@
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "ssemov")
(set_attr "prefix_extra" "1")
-   (set_attr "length_immediate" "*,*,1")
+   (set_attr "length_immediate" "1")
(set_attr "prefix" "orig,orig,vex")
(set_attr "btver2_decode" "vector")
(set_attr "mode" "TI")])
@@ -3114,7 +3114,7 @@
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "ssemov")
(set_attr "prefix_extra" "1")
-   (set_attr "length_immediate" "*,*,1")
+   (set_attr "length_immediate" "1")
(set_attr "prefix" "orig,orig,vex")
(set_attr "btver2_decode" "vector")
(set_attr "mode" "TI")])



[PATCH 05/10] x86: replace/correct bogus "prefix_extra"

2023-08-03 Thread Jan Beulich via Gcc-patches
In the rdrand and rdseed cases "prefix_0f" is meant instead. For
mmx_floatv2siv2sf2 1 is correct only for the first alternative. For
the integer min/max cases 1 uniformly applies to legacy and VEX
encodings (the UB and SW variants are dealt with separately anyway).
Same for {,V}MOVNTDQA.

Unlike {,V}PEXTRW, which has two encoding forms, {,V}PINSRW only has
a single form in 0f space. (In *vec_extract note that the
dropped part if the condition also referenced non-existing alternative
2.)

Of the integer compare insns, only the 64-bit element forms are encoded
in 0f38 space.

gcc/

* config/i386/i386.md (@rdrand): Add "prefix_0f". Drop
"prefix_extra".
(@rdseed): Likewise.
* config/i386/mmx.md (3 [smaxmin and umaxmin cases]):
Adjust "prefix_extra".
* config/i386/sse.md (@vec_set_0): Likewise.
(*sse4_1_3): Likewise.
(*avx2_eq3): Likewise.
(avx2_gt3): Likewise.
(_pinsr): Likewise.
(*vec_extract): Likewise.
(_movntdqa): Likewise.

--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -25943,7 +25943,7 @@
   "TARGET_RDRND"
   "rdrand\t%0"
   [(set_attr "type" "other")
-   (set_attr "prefix_extra" "1")])
+   (set_attr "prefix_0f" "1")])
 
 (define_insn "@rdseed"
   [(set (match_operand:SWI248 0 "register_operand" "=r")
@@ -25953,7 +25953,7 @@
   "TARGET_RDSEED"
   "rdseed\t%0"
   [(set_attr "type" "other")
-   (set_attr "prefix_extra" "1")])
+   (set_attr "prefix_0f" "1")])
 
 (define_expand "pause"
   [(set (match_dup 0)
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2483,7 +2483,7 @@
vp\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "sseiadd")
-   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "TI")])
 
@@ -2532,7 +2532,7 @@
vpb\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "sseiadd")
-   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "TI")])
 
@@ -2561,7 +2561,7 @@
vp\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "sseiadd")
-   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "TI")])
 
@@ -2623,7 +2623,7 @@
vpw\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "sseiadd")
-   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "TI")])
 
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -11064,7 +11064,7 @@
   (const_string "1")
   (const_string "*")))
(set (attr "prefix_extra")
- (if_then_else (eq_attr "alternative" "5,6,7,8,9")
+ (if_then_else (eq_attr "alternative" "5,6,9")
   (const_string "1")
   (const_string "*")))
(set (attr "length_immediate")
@@ -16779,7 +16779,7 @@
vp\t{%2, %1, 
%0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "sseiadd")
-   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "TI")])
 
@@ -16813,7 +16813,10 @@
   "TARGET_AVX2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vpcmpeq\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "ssecmp")
-   (set_attr "prefix_extra" "1")
+   (set (attr "prefix_extra")
+ (if_then_else (eq (const_string "mode") (const_string "V4DImode"))
+  (const_string "1")
+  (const_string "*")))
(set_attr "prefix" "vex")
(set_attr "mode" "OI")])
 
@@ -17048,7 +17051,10 @@
   "TARGET_AVX2"
   "vpcmpgt\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "ssecmp")
-   (set_attr "prefix_extra" "1")
+   (set (attr "prefix_extra")
+ (if_then_else (eq (const_string "mode") (const_string "V4DImode"))
+  (const_string "1")
+  (const_string "*")))
(set_attr "prefix" "vex")
(set_attr "mode" "OI")])
 
@@ -18843,7 +18849,7 @@
(const_string "*")))
(set (attr "prefix_extra")
  (if_then_else
-   (and (not (match_test "TARGET_AVX"))
+   (ior (eq_attr "prefix" "evex")
(match_test "GET_MODE_NUNITS (mode) == 8"))
(const_string "*")
(const_string "1")))
@@ -20004,8 +20010,7 @@
(set_attr "prefix_data16" "1")
(set (attr "prefix_extra")
  (if_then_else
-   (and (eq_attr "alternative" "0,2")
-   (eq (const_string "mode") (const_string "V8HImode")))
+   (eq (const_string "mode") (const_string "V8HImode"))
(const_string "*")
(const_string "1")))
(set_attr "length_immediate" "1")
@@ -22349,7 +22354,7 @@
   "%vmovntdqa\t{%1, %0|%0, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "ssemov")
-   (set_a

[PATCH 06/10] x86: drop stray "prefix_extra"

2023-08-03 Thread Jan Beulich via Gcc-patches
While the attribute is relevant for legacy- and VEX-encoded insns, it is
of no relevance for EVEX-encoded ones.

While there in avx512dq_broadcast_1 add
the missing "length_immediate".

gcc/

* config/i386/sse.md
(*_eq3_1): Drop
"prefix_extra".
(avx512dq_vextract64x2_1_mask): Likewise.
(*avx512dq_vextract64x2_1): Likewise.
(avx512f_vextract32x4_1_mask): Likewise.
(*avx512f_vextract32x4_1): Likewise.
(vec_extract_lo__mask [AVX512 forms]): Likewise.
(vec_extract_lo_ [AVX512 forms]): Likewise.
(vec_extract_hi__mask [AVX512 forms]): Likewise.
(vec_extract_hi_ [AVX512 forms]): Likewise.
(@vec_extract_lo_ [AVX512 forms]): Likewise.
(@vec_extract_hi_ [AVX512 forms]): Likewise.
(vec_extract_lo_v64qi): Likewise.
(vec_extract_hi_v64qi): Likewise.
(*vec_widen_umult_even_v16si): Likewise.
(*vec_widen_smult_even_v16si): Likewise.
(*avx512f_3): Likewise.
(*vec_extractv4ti): Likewise.
(avx512bw_v32qiv32hi2): Likewise.
(avx512dq_broadcast_1): Likewise.
Add "length_immediate".

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4030,7 +4030,6 @@
vpcmpeq\t{%2, %1, 
%0|%0, %1, %2}
vptestnm\t{%1, %1, 
%0|%0, %1, %1}"
   [(set_attr "type" "ssecmp")
-   (set_attr "prefix_extra" "1")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
@@ -4128,7 +4127,6 @@
vpcmpeq\t{%2, %1, 
%0|%0, %1, %2}
vptestnm\t{%1, %1, 
%0|%0, %1, %1}"
   [(set_attr "type" "ssecmp")
-   (set_attr "prefix_extra" "1")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
@@ -11487,7 +11485,6 @@
   return "vextract64x2\t{%2, %1, %0%{%5%}%N4|%0%{%5%}%N4, %1, 
%2}";
 }
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -11506,7 +11503,6 @@
   return "vextract64x2\t{%2, %1, %0|%0, %1, %2}";
 }
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -11554,7 +11550,6 @@
   return "vextract32x4\t{%2, %1, %0%{%7%}%N6|%0%{%7%}%N6, %1, 
%2}";
 }
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -11577,7 +11572,6 @@
   return "vextract32x4\t{%2, %1, %0|%0, %1, %2}";
 }
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -11671,7 +11665,6 @@
&& (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))"
   "vextract64x4\t{$0x0, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x0}"
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,store")
(set_attr "prefix" "evex")
@@ -11691,7 +11684,6 @@
 return "#";
 }
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,store,load")
(set_attr "prefix" "evex")
@@ -11710,7 +11702,6 @@
&& (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))"
   "vextract64x4\t{$0x1, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x1}"
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -11724,7 +11715,6 @@
   "TARGET_AVX512F"
   "vextract64x4\t{$0x1, %1, %0|%0, %1, 0x1}"
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -11744,7 +11734,6 @@
&& (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))"
   "vextract32x8\t{$0x1, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x1}"
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -11762,7 +11751,6 @@
vextract32x8\t{$0x1, %1, %0|%0, %1, 0x1}
vextracti64x4\t{$0x1, %1, %0|%0, %1, 0x1}"
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
(set_attr "isa" "avx512dq,noavx512dq")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
@@ -11850,7 +11838,6 @@
&& (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))"
   "vextract32x8\t{$0x0, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x0}"
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,store")
(set_attr "prefix" "evex")
@@ -11880,7 +11867,6 @@
 return "#";
 }
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,load,store")
(set_attr "prefix" "evex")
@@ -11923,7 +11909,6 @@
&& (!MEM_P (o

[PATCH 08/10] x86: add missing "prefix" attribute to VF{,C}MULC

2023-08-03 Thread Jan Beulich via Gcc-patches
gcc/

* config/i386/sse.md
(__): Add
"prefix" attribute.

(avx512fp16_sh_v8hf):
Likewise.
---
Talking of "prefix": Shouldn't at least V32HF and V32BF have it also
default to "evex"? (It won't matter right here, but it may matter
elsewhere.)

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -6790,6 +6790,7 @@
   return "v\t{%2, %1, 
%0|%0, %1, %2}";
 }
   [(set_attr "type" "ssemul")
+   (set_attr "prefix" "evex")
(set_attr "mode" "")])
 
 (define_expand "avx512fp16_fmaddcsh_v8hf_maskz"
@@ -6993,6 +6994,7 @@
   return "vsh\t{%2, %1, 
%0|%0, %1, 
%2}";
 }
   [(set_attr "type" "ssemul")
+   (set_attr "prefix" "evex")
(set_attr "mode" "V8HF")])
 
 ;



[PATCH 10/10] x86: drop redundant "prefix_data16" attributes

2023-08-03 Thread Jan Beulich via Gcc-patches
The attribute defaults to 1 for TI-mode insns of type sselog, sselog1,
sseiadd, sseimul, and sseishft.

In *v8hi3 [smaxmin] and *v16qi3 [umaxmin] also drop the
similarly stray "prefix_extra" at this occasion. These two max/min
flavors are encoded in 0f space.

gcc/

* config/i386/mmx.md (*mmx_pinsrd): Drop "prefix_data16".
(*mmx_pinsrb): Likewise.
(*mmx_pextrb): Likewise.
(*mmx_pextrb_zext): Likewise.
(mmx_pshufbv8qi3): Likewise.
(mmx_pshufbv4qi3): Likewise.
(mmx_pswapdv2si2): Likewise.
(*pinsrb): Likewise.
(*pextrb): Likewise.
(*pextrb_zext): Likewise.
* config/i386/sse.md (*sse4_1_mulv2siv2di3): Likewise.
(*sse2_eq3): Likewise.
(*sse2_gt3): Likewise.
(_pinsr): Likewise.
(*vec_extract): Likewise.
(*vec_extract_zext): Likewise.
(*vec_extractv16qi_zext): Likewise.
(ssse3_phwv8hi3): Likewise.
(ssse3_pmaddubsw128): Likewise.
(*_pmulhrsw3): Likewise.
(_pshufb3): Likewise.
(_psign3): Likewise.
(_palignr): Likewise.
(*abs2): Likewise.
(sse4_2_pcmpestr): Likewise.
(sse4_2_pcmpestri): Likewise.
(sse4_2_pcmpestrm): Likewise.
(sse4_2_pcmpestr_cconly): Likewise.
(sse4_2_pcmpistr): Likewise.
(sse4_2_pcmpistri): Likewise.
(sse4_2_pcmpistrm): Likewise.
(sse4_2_pcmpistr_cconly): Likewise.
(vgf2p8affineinvqb_): Likewise.
(vgf2p8affineqb_): Likewise.
(vgf2p8mulb_): Likewise.
(*v8hi3 [smaxmin]): Drop "prefix_data16" and
"prefix_extra".
(*v16qi3 [umaxmin]): Likewise.

--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -3863,7 +3863,6 @@
 }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "prefix_data16" "1")
(set_attr "prefix_extra" "1")
(set_attr "type" "sselog")
(set_attr "length_immediate" "1")
@@ -3950,7 +3949,6 @@
 }
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sselog")
-   (set_attr "prefix_data16" "1")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "orig,vex")
@@ -4002,7 +4000,6 @@
%vpextrb\t{%2, %1, %k0|%k0, %1, %2}
%vpextrb\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_data16" "1")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "maybe_vex")
@@ -4017,7 +4014,6 @@
   "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
   "%vpextrb\t{%2, %1, %k0|%k0, %1, %2}"
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_data16" "1")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "maybe_vex")
@@ -4035,7 +4031,6 @@
vpshufb\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sselog1")
-   (set_attr "prefix_data16" "1,*")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,maybe_evex")
(set_attr "btver2_decode" "vector")
@@ -4053,7 +4048,6 @@
vpshufb\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sselog1")
-   (set_attr "prefix_data16" "1,*")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,maybe_evex")
(set_attr "btver2_decode" "vector")
@@ -4191,7 +4185,6 @@
(set_attr "mmx_isa" "native,*")
(set_attr "type" "mmxcvt,sselog1")
(set_attr "prefix_extra" "1,*")
-   (set_attr "prefix_data16" "*,1")
(set_attr "length_immediate" "*,1")
(set_attr "mode" "DI,TI")])
 
@@ -4531,7 +4524,6 @@
 }
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sselog")
-   (set_attr "prefix_data16" "1")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "orig,vex")
@@ -4575,7 +4567,6 @@
%vpextrb\t{%2, %1, %k0|%k0, %1, %2}
%vpextrb\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_data16" "1")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "maybe_vex")
@@ -4590,7 +4581,6 @@
   "TARGET_SSE4_1"
   "%vpextrb\t{%2, %1, %k0|%k0, %1, %2}"
   [(set_attr "type" "sselog1")
-   (set_attr "prefix_data16" "1")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "maybe_vex")
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15614,7 +15614,6 @@
vpmuldq\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "sseimul")
-   (set_attr "prefix_data16" "1,1,*")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "TI")])
@@ -16688,8 +16687,6 @@
vpw\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sseiadd")
-   (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix_extra" "*,1")
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
@@ -16772,8 +16769,6 @@
vpb\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sse

Re: [PATCH 1/2] bpf: Implementation of BPF CO-RE builtins

2023-08-03 Thread Cupertino Miranda via Gcc-patches


Jose E. Marchesi writes:

>> This patch updates the support for the BPF CO-RE builtins
>> __builtin_preserve_access_index and __builtin_preserve_field_info,
>> and adds support for the CO-RE builtins __builtin_btf_type_id,
>> __builtin_preserve_type_info and __builtin_preserve_enum_value.
>>
>> These CO-RE relocations are now converted to __builtin_core_reloc which
>> abstracts all of the original builtins in a polymorphic relocation
>> specific builtin.
>>
>> The builtin processing is now split in 2 stages, the first (pack) is
>> executed right after the front-end and the second (process) right before
>> the asm output.
>>
>> In expand pass the __builtin_core_reloc is converted to a
>> unspec:UNSPEC_CORE_RELOC rtx entry.
>>
>> The data required to process the builtin is now collected in the packing
>> stage (after front-end), not allowing the compiler to optimize any of
>> the relevant information required to compose the relocation when
>> necessary.
>> At expansion, that information is recovered and CTF/BTF is queried to
>> construct the information that will be used in the relocation.
>> At this point the relocation is added to specific section and the
>> builtin is expanded to the expected default value for the builtin.
>>
>> In order to process __builtin_preserve_enum_value, it was necessary to
>> hook the front-end to collect the original enum value reference.
>> This is needed since the parser folds all the enum values to its
>> integer_cst representation.
>>
>> More details can be found within the core-builtins.cc.
>>
>> Regtested in host x86_64-linux-gnu and target bpf-unknown-none.
>> ---
>>  gcc/config.gcc|4 +-
>>  gcc/config/bpf/bpf-passes.def |   20 -
>>  gcc/config/bpf/bpf-protos.h   |4 +-
>>  gcc/config/bpf/bpf.cc |  817 +-
>>  gcc/config/bpf/bpf.md |   17 +
>>  gcc/config/bpf/core-builtins.cc   | 1397 +
>>  gcc/config/bpf/core-builtins.h|   36 +
>>  gcc/config/bpf/coreout.cc |   50 +-
>>  gcc/config/bpf/coreout.h  |   13 +-
>>  gcc/config/bpf/t-bpf  |6 +-
>>  gcc/doc/extend.texi   |   51 +
>>  ...core-builtin-fieldinfo-const-elimination.c |   29 +
>>  12 files changed, 1639 insertions(+), 805 deletions(-)
>>  delete mode 100644 gcc/config/bpf/bpf-passes.def
>>  create mode 100644 gcc/config/bpf/core-builtins.cc
>>  create mode 100644 gcc/config/bpf/core-builtins.h
>>  create mode 100644 
>> gcc/testsuite/gcc.target/bpf/core-builtin-fieldinfo-const-elimination.c
>>
>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>> index eba69a463be0..c521669e78b1 100644
>> --- a/gcc/config.gcc
>> +++ b/gcc/config.gcc
>> @@ -1597,8 +1597,8 @@ bpf-*-*)
>>  use_collect2=no
>>  extra_headers="bpf-helpers.h"
>>  use_gcc_stdint=provide
>> -extra_objs="coreout.o"
>> -target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc"
>> +extra_objs="coreout.o core-builtins.o"
>> +target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc 
>> \$(srcdir)/config/bpf/core-builtins.cc"
>>  ;;
>>  cris-*-elf | cris-*-none)
>>  tm_file="elfos.h newlib-stdint.h ${tm_file}"
>> diff --git a/gcc/config/bpf/bpf-passes.def b/gcc/config/bpf/bpf-passes.def
>> deleted file mode 100644
>> index deeaee988a01..
>> --- a/gcc/config/bpf/bpf-passes.def
>> +++ /dev/null
>> @@ -1,20 +0,0 @@
>> -/* Declaration of target-specific passes for eBPF.
>> -   Copyright (C) 2021-2023 Free Software Foundation, Inc.
>> -
>> -   This file is part of GCC.
>> -
>> -   GCC is free software; you can redistribute it and/or modify it
>> -   under the terms of the GNU General Public License as published by
>> -   the Free Software Foundation; either version 3, or (at your option)
>> -   any later version.
>> -
>> -   GCC is distributed in the hope that it will be useful, but
>> -   WITHOUT ANY WARRANTY; without even the implied warranty of
>> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> -   General Public License for more details.
>> -
>> -   You should have received a copy of the GNU General Public License
>> -   along with GCC; see the file COPYING3.  If not see
>> -   .  */
>> -
>> -INSERT_PASS_AFTER (pass_df_initialize_opt, 1, pass_bpf_core_attr);
>> diff --git a/gcc/config/bpf/bpf-protos.h b/gcc/config/bpf/bpf-protos.h
>> index b484310e8cbf..fbcf5111eb21 100644
>> --- a/gcc/config/bpf/bpf-protos.h
>> +++ b/gcc/config/bpf/bpf-protos.h
>> @@ -30,7 +30,7 @@ extern void bpf_print_operand_address (FILE *, rtx);
>>  extern void bpf_expand_prologue (void);
>>  extern void bpf_expand_epilogue (void);
>>  extern void bpf_expand_cbranch (machine_mode, rtx *);
>> -
>> -rtl_opt_pass * make_pass_bpf_core_attr (gcc::context *);
>> +const char *bpf_add_core_reloc (r

[PATCH] Swap loop splitting and final value replacement

2023-08-03 Thread Richard Biener via Gcc-patches
The following swaps the loop splitting pass and the final value
replacement pass to avoid keeping the IV of the earlier loop
live when not necessary.  The existing gcc.target/i386/pr87007-5.c
testcase shows that we otherwise fail to elide an empty loop
later.  I don't see any good reason why loop splitting would need
final value replacement, all exit values honor the constraints
we place on loop header PHIs automatically.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, I plan
to install this if testing succeeds.

Richard.

* passes.def: Exchange loop splitting and final value
replacement passes.

* gcc.target/i386/pr87007-5.c: Make sure we split the loop
and eliminate both in the end.
---
 gcc/passes.def| 2 +-
 gcc/testsuite/gcc.target/i386/pr87007-5.c | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index f2893ae8a8b..ef5a21afe49 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -282,8 +282,8 @@ along with GCC; see the file COPYING3.  If not see
 form if possible.  */
  NEXT_PASS (pass_tree_loop_init);
  NEXT_PASS (pass_tree_unswitch);
- NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_loop_split);
+ NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_loop_versioning);
  NEXT_PASS (pass_loop_jam);
  /* All unswitching, final value replacement and splitting can expose
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
b/gcc/testsuite/gcc.target/i386/pr87007-5.c
index b36e81c270c..a6cdf11522e 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse 
-fno-tree-vectorize" } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize 
-fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */
 /* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to 
avoid partial dependence.  */
 
 #include
@@ -15,4 +15,6 @@ foo (int n, int k)
   d1 = sqrt (d3);
 }
 
+/* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
+/* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
 /* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
-- 
2.35.3


Re: [PATCH] Swap loop splitting and final value replacement

2023-08-03 Thread Jan Hubicka via Gcc-patches
> The following swaps the loop splitting pass and the final value
> replacement pass to avoid keeping the IV of the earlier loop
> live when not necessary.  The existing gcc.target/i386/pr87007-5.c
> testcase shows that we otherwise fail to elide an empty loop
> later.  I don't see any good reason why loop splitting would need
> final value replacement, all exit values honor the constraints
> we place on loop header PHIs automatically.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, I plan
> to install this if testing succeeds.
Thanks! I was just looking into the same. This should let us to turn the
split loop into non-loop for hmmer.

Honza


[PATCH v3][RFC] c-family: Implement __has_feature and __has_extension [PR60512]

2023-08-03 Thread Alex Coplan via Gcc-patches
Hi,

This patch implements clang's __has_feature and __has_extension in GCC.
This is a v3 which addresses feedback for the v2 patch posted here:

https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626058.html

Main changes since v2:
 - As per Jason's feedback, dropped the langhook in favour of
   a function prototyped in c-family/c-common.h and implemented in
   *-lang.cc for each frontend.
 - Also dropped the callbacks as suggested, we now compute whether
   features/extensions are available when __has_feature is first invoked,
   and only add available features to the hash table (storing a boolean
   to indicate whether a given identifier names a feature or an extension).
 - Added many comments to top-level definitions.
 - Generally polished and tidied up a bit.

As of this writing, there are still a couple of unresolved issues
around cxx_binary_literals and TLS, see:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626058.html

Bootstrapped/regtested on aarch64-linux-gnu and x86_64-apple-darwin.
How does this version look?

Thanks,
Alex

gcc/c-family/ChangeLog:

PR c++/60512
* c-common.cc (struct hf_feature_info): New.
(c_common_register_feature): New.
(init_has_feature): New.
(has_feature_p): New.
* c-common.h (c_common_has_feature): New.
(c_family_register_lang_features): New.
(c_common_register_feature): New.
(has_feature_p): New.
(c_register_features): New.
(cp_register_features): New.
* c-lex.cc (init_c_lex): Plumb through has_feature callback.
(c_common_has_builtin): Generalize and move common part ...
(c_common_lex_availability_macro): ... here.
(c_common_has_feature): New.
* c-ppoutput.cc (init_pp_output): Plumb through has_feature.

gcc/c/ChangeLog:

PR c++/60512
* c-lang.cc (c_family_register_lang_features): New.
* c-objc-common.cc (struct c_feature_info): New.
(c_register_features): New.

gcc/cp/ChangeLog:

PR c++/60512
* cp-lang.cc (c_family_register_lang_features): New.
* cp-objcp-common.cc (struct cp_feature_selector): New.
(cp_feature_selector::has_feature): New.
(struct cp_feature_info): New.
(cp_register_features): New.

gcc/ChangeLog:

PR c++/60512
* doc/cpp.texi: Document __has_{feature,extension}.

gcc/objc/ChangeLog:

PR c++/60512
* objc-act.cc (struct objc_feature_info): New.
(objc_nonfragile_abi_p): New.
(objc_common_register_features): New.
* objc-act.h (objc_common_register_features): New.
* objc-lang.cc (c_family_register_lang_features): New.

gcc/objcp/ChangeLog:

PR c++/60512
* objcp-lang.cc (c_family_register_lang_features): New.

libcpp/ChangeLog:

PR c++/60512
* include/cpplib.h (struct cpp_callbacks): Add has_feature.
(enum cpp_builtin_type): Add BT_HAS_{FEATURE,EXTENSION}.
* init.cc: Add __has_{feature,extension}.
* macro.cc (_cpp_builtin_macro_text): Handle
BT_HAS_{FEATURE,EXTENSION}.


gcc/testsuite/ChangeLog:

PR c++/60512
* c-c++-common/has-feature-common.c: New test.
* g++.dg/ext/has-feature.C: New test.
* gcc.dg/asan/has-feature-asan.c: New test.
* gcc.dg/has-feature.c: New test.
* gcc.dg/ubsan/has-feature-ubsan.c: New test.
* obj-c++.dg/has-feature.mm: New test.
* objc.dg/has-feature.m: New test.
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 9fbaeb437a1..64ca6fa3d77 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -311,6 +311,43 @@ const struct fname_var_t fname_vars[] =
   {NULL, 0, 0},
 };
 
+/* Flags to restrict availability of generic features that
+   are known to __has_{feature,extension}.  */
+
+enum
+{
+  HF_FLAG_EXT = 1, /* Available only as an extension.  */
+  HF_FLAG_SANITIZE = 2, /* Availability depends on sanitizer flags.  */
+};
+
+/* Info for generic features which can be queried through
+   __has_{feature,extension}.  */
+
+struct hf_feature_info
+{
+  const char *ident;
+  unsigned flags;
+  unsigned mask;
+};
+
+/* Table of generic features which can be queried through
+   __has_{feature,extension}.  */
+
+static const hf_feature_info has_feature_table[] =
+{
+  { "address_sanitizer",   HF_FLAG_SANITIZE, SANITIZE_ADDRESS },
+  { "thread_sanitizer",HF_FLAG_SANITIZE, SANITIZE_THREAD },
+  { "leak_sanitizer",  HF_FLAG_SANITIZE, SANITIZE_LEAK },
+  { "hwaddress_sanitizer", HF_FLAG_SANITIZE, SANITIZE_HWADDRESS },
+  { "undefined_behavior_sanitizer", HF_FLAG_SANITIZE, SANITIZE_UNDEFINED },
+  { "attribute_deprecated_with_message",  0, 0 },
+  { "attribute_unavailable_with_message", 0, 0 },
+  { "enumerator_attributes", 0, 0 },
+  { "tls", 0, 0 },
+  { "gnu_asm_goto_with_outputs", HF_FLAG_EXT, 0 },
+  { "gnu_asm_goto_with_outputs_full

Re: [PATCH] Fix PR 110874: infinite loop in gimple_bitwise_inverted_equal_p with fre

2023-08-03 Thread Andrew Pinski via Gcc-patches
On Thu, Aug 3, 2023 at 12:23 AM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, Aug 3, 2023 at 4:34 AM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > So I didn't expect valueization to cause calling gimple_nop_convert
> > to iterate between 2 different SSA names causing an infinite loop
> > in gimple_bitwise_inverted_equal_p.
> > So we should cause a bound on gimple_bitwise_inverted_equal_p calling
> > gimple_nop_convert and only look through one rather than always.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/110874
> > * gimple-match-head.cc (gimple_bitwise_inverted_equal_p):
> > Add new argument, again with default value of true.
> > Don't try gimple_nop_convert if again is false.
> > Update call to gimple_bitwise_inverted_equal_p for
> > new argument.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/110874
> > * gcc.c-torture/compile/pr110874-a.c: New test.
> > ---
> >  gcc/gimple-match-head.cc| 14 +-
> >  .../gcc.c-torture/compile/pr110874-a.c  | 17 +
> >  2 files changed, 26 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr110874-a.c
> >
> > diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> > index b1e96304d7c..e91aaab86dd 100644
> > --- a/gcc/gimple-match-head.cc
> > +++ b/gcc/gimple-match-head.cc
> > @@ -273,7 +273,7 @@ gimple_bitwise_equal_p (tree expr1, tree expr2, tree 
> > (*valueize) (tree))
> >  /* Helper function for bitwise_equal_p macro.  */
> >
> >  static inline bool
> > -gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, tree (*valueize) 
> > (tree))
> > +gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, tree (*valueize) 
> > (tree), bool again = true)
> >  {
> >if (expr1 == expr2)
> >  return false;
> > @@ -285,12 +285,16 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree 
> > expr2, tree (*valueize) (tree)
> >  return false;
> >
> >tree other;
> > -  if (gimple_nop_convert (expr1, &other, valueize)
> > -  && gimple_bitwise_inverted_equal_p (other, expr2, valueize))
> > +  if (again
> > +  && gimple_nop_convert (expr1, &other, valueize)
> > +  && other != expr1
> > +  && gimple_bitwise_inverted_equal_p (other, expr2, valueize, false))
> >  return true;
> >
> > -  if (gimple_nop_convert (expr2, &other, valueize)
> > -  && gimple_bitwise_inverted_equal_p (expr1, other, valueize))
> > +  if (again
> > +  && gimple_nop_convert (expr2, &other, valueize)
> > +  && other != expr2
> > +  && gimple_bitwise_inverted_equal_p (expr1, other, valueize, false))
> >  return true;
>
> Hmm, I don't think this tests all three relevant combinations?  I think the 
> way
> gimple_bitwise_equal_p handles this is better (not recursing).  I'd split out
> the "tail" matching the BIT_NOT to another helper, I suppose that could
> even be a (match ...) pattern here.

That sounds like a better idea. I am testing the patch right now that
uses a (match ) pattern for
the BIT_NOT and CMP cases. That will remove the recursiveness too.

Thanks,
Andrew

>
> >if (TREE_CODE (expr1) != SSA_NAME
> > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr110874-a.c 
> > b/gcc/testsuite/gcc.c-torture/compile/pr110874-a.c
> > new file mode 100644
> > index 000..b314410a892
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/compile/pr110874-a.c
> > @@ -0,0 +1,17 @@
> > +struct S1 {
> > +  unsigned f0;
> > +};
> > +static int g_161;
> > +void func_109(unsigned g_227, unsigned t) {
> > +  struct S1 l_178;
> > +  int l_160 = 0x1FAE99D5L;
> > +  int *l_230[] = {&l_160};
> > +  if (l_160) {
> > +for (l_178.f0 = -7; l_178.f0;) {
> > +  ++g_227;
> > +  break;
> > +}
> > +(g_161) = g_227;
> > +  }
> > +  (g_161) &= t;
> > +}
> > --
> > 2.31.1
> >


[PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-03 Thread juzhe.zh...@rivai.ai
https://github.com/gcc-mirror/gcc/commit/e15d0b6680d10d7666195e9db65581364ad5e5df
 

This patch causes so many fails in the regression:

FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-2.c   -O1   
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-40.c   -O2   
scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*mf2,\\s*t[au],\\s*m[au]\\s+\\.L[0-9]+ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-2.c   -O2   
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-2.c   -O2   
scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*m8,\\s*t[au],\\s*m[au]\\s+\\.L[0-9]+ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-2.c   -O2   
scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*m8,\\s*t[au],\\s*m[au]\\s+j\\s+\\.L[0-9]+
 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-79.c   -Os   scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e32,\\s*mf2,\\s*tu,\\s*mu 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-40.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*mf2,\\s*t[au],\\s*m[au]\\s+\\.L[0-9]+ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-2.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-2.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*m8,\\s*t[au],\\s*m[au]\\s+\\.L[0-9]+ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-2.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*m8,\\s*t[au],\\s*m[au]\\s+j\\s+\\.L[0-9]+
 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-40.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*mf2,\\s*t[au],\\s*m[au]\\s+\\.L[0-9]+ 1
FAIL: gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-5.c   -Os   
scan-assembler-times 
\\.L[0-9]+\\:\\s+vle64\\.v\\s+v[0-9]+,\\s*0\\s*\\([a-x0-9]+\\) 8
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-2.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-2.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*m8,\\s*t[au],\\s*m[au]\\s+\\.L[0-9]+ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-2.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*m8,\\s*t[au],\\s*m[au]\\s+j\\s+\\.L[0-9]+
 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_multiple-10.c   -O1   
scan-assembler-times 
\\.L[0-9]+\\:\\s+add\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[a-x0-9]+\\s+vle8\\.v\\s+v[0-9]+,\\s*0\\s*\\([a-x0-9]+\\)
 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-41.c   -O2   
scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*mf2,\\s*t[au],\\s*m[au] 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-41.c   -O2   
scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c   -O2   
scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*mf8,\\s*t[au],\\s*m[au] 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c   -O2   
scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-41.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*mf2,\\s*t[au],\\s*m[au] 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-41.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*mf8,\\s*t[au],\\s*m[au] 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-41.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*mf2,\\s*t[au],\\s*m[au] 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-41.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c   -Os   scan-assembler-times 
vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e8,\\s*mf8,\\s*t[au],\\s*m[au] 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-6.c   -Os   
scan-assembler-times 
\\.L[0-9]+\\:\\s+vle64\\.v\\s+v[0-9]+,\\s*0\\s*\\([a-x0-9]+\\) 4

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-08-03 Thread Hao Liu OS via Gcc-patches
Gentle ping. Is it OK for master?

I'm afraid the ICE may cause trouble and hope it can be fixed ASAP.

Thanks,
Hao


From: Hao Liu OS 
Sent: Wednesday, August 2, 2023 11:45
To: Richard Sandiford
Cc: Richard Biener; GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
multiplying count [PR110625]

Hi Richard,

Update the patch with a simple case (see below case and comments).  It shows a 
live stmt may not have reduction def, which introduce the ICE.

Is it OK for trunk?


Fix the assertion failure on empty reduction define in info_for_reduction.
Even a stmt is live, it may still have empty reduction define.  Check the
reduction definition instead of live info before calling info_for_reduction.

gcc/ChangeLog:

PR target/110625
* config/aarch64/aarch64.cc (aarch64_force_single_cycle): check
STMT_VINFO_REDUC_DEF to avoid failures in info_for_reduction.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr110625_3.c: New testcase.
---
 gcc/config/aarch64/aarch64.cc |  2 +-
 gcc/testsuite/gcc.target/aarch64/pr110625_3.c | 34 +++
 2 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_3.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d4d76025545..5b8d8fa8e2d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16776,7 +16776,7 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info,
 static bool
 aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info)
 {
-  if (!STMT_VINFO_LIVE_P (stmt_info))
+  if (!STMT_VINFO_REDUC_DEF (stmt_info))
 return false;

   auto reduc_info = info_for_reduction (vinfo, stmt_info);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625_3.c 
b/gcc/testsuite/gcc.target/aarch64/pr110625_3.c
new file mode 100644
index 000..35a50290cb0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr110625_3.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mcpu=neoverse-n2" } */
+
+/* Avoid ICE on empty reduction def in single_defuse_cycle.
+
+   E.g.
+  [local count: 858993456]:
+ # sum_18 = PHI 
+ sum.0_5 = (unsigned int) sum_18;
+ _6 = _4 + sum.0_5; <-- it is "live" but doesn't have reduction def
+ sum_15 = (int) _6;
+ ...
+ if (ivtmp_29 != 0)
+   goto ; [75.00%]
+ else
+   goto ; [25.00%]
+
+  [local count: 644245086]:
+ goto ; [100.00%]
+
+  [local count: 214748368]:
+ # _31 = PHI <_6(3)>
+ _8 = _31 >> 1;
+*/
+
+int
+f (unsigned int *tmp)
+{
+  int sum = 0;
+  for (int i = 0; i < 4; i++)
+sum += tmp[i];
+
+  return (unsigned int) sum >> 1;
+}
--
2.34.1


From: Hao Liu OS 
Sent: Tuesday, August 1, 2023 17:43
To: Richard Sandiford
Cc: Richard Biener; GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
multiplying count [PR110625]

Hi Richard,

This is a quick fix to the several ICEs.  It seems even STMT_VINFO_LIVE_P is 
true, some reduct stmts still don't have REDUC_DEF.  So I change the check to 
STMT_VINFO_REDUC_DEF.

Is it OK for trunk?

---
Fix the ICEs on empty reduction define.  Even STMT_VINFO_LIVE_P is true, some 
reduct stmts
still don't have definition.

gcc/ChangeLog:

PR target/110625
* config/aarch64/aarch64.cc (aarch64_force_single_cycle): check
STMT_VINFO_REDUC_DEF to avoid failures in info_for_reduction
---
 gcc/config/aarch64/aarch64.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d4d76025545..5b8d8fa8e2d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16776,7 +16776,7 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info,
 static bool
 aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info)
 {
-  if (!STMT_VINFO_LIVE_P (stmt_info))
+  if (!STMT_VINFO_REDUC_DEF (stmt_info))
 return false;

   auto reduc_info = info_for_reduction (vinfo, stmt_info);
--
2.40.0



From: Richard Sandiford 
Sent: Monday, July 31, 2023 17:11
To: Hao Liu OS
Cc: Richard Biener; GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
multiplying count [PR110625]

Hao Liu OS  writes:
>> Which test case do you see this for?  The two tests in the patch still
>> seem to report correct latencies for me if I make the change above.
>
> Not the newly added tests.  It is still the existing case causing the 
> previous ICE (i.e. assertion problem): gcc.target/aarch64/sve/cost_model_13.c.
>
> It's not the test case itself failed, but the dump message of vect says the 
> "reduction latency" is 0:
>
> Before the change:
> cost_model_13.c:7:21: note:  Orig

Re: [x86 PATCH] Split SUBREGs of SSE vector registers into vec_select insns.

2023-08-03 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 3, 2023 at 9:10 AM Roger Sayle  wrote:
>
>
> This patch is the final piece in the series to improve the ABI issues
> affecting PR 88873.  The previous patches tackled inserting DFmode
> values into V2DFmode registers, by introducing insvti_{low,high}part
> patterns.  This patch improves the extraction of DFmode values from
> v2DFmode registers via TImode intermediates.
>
> I'd initially thought this would require new extvti_{low,high}part
> patterns to be defined, but all that's required is to recognize that
> the SUBREG idioms produced by combine are equivalent to (forms of)
> vec_select patterns.  The target-independent middle-end can't be sure
> that the appropriate vec_select instruction exists on the target,
> hence doesn't canonicalize a SUBREG of a vector mode as a vec_select,
> but the backend can provide a define_split stating where and when
> this is useful, for example, considering whether the operand is in
> memory, or whether !TARGET_SSE_MATH and the destination is i387.
>
> For pr88873.c, gcc -O2 -march=cascadelake currently generates:
>
> foo:vpunpcklqdq %xmm3, %xmm2, %xmm7
> vpunpcklqdq %xmm1, %xmm0, %xmm6
> vpunpcklqdq %xmm5, %xmm4, %xmm2
> vmovdqa %xmm7, -24(%rsp)
> vmovdqa %xmm6, %xmm1
> movq-16(%rsp), %rax
> vpinsrq $1, %rax, %xmm7, %xmm4
> vmovapd %xmm4, %xmm6
> vfmadd132pd %xmm1, %xmm2, %xmm6
> vmovapd %xmm6, -24(%rsp)
> vmovsd  -16(%rsp), %xmm1
> vmovsd  -24(%rsp), %xmm0
> ret
>
> with this patch, we now generate:
>
> foo:vpunpcklqdq %xmm1, %xmm0, %xmm6
> vpunpcklqdq %xmm3, %xmm2, %xmm7
> vpunpcklqdq %xmm5, %xmm4, %xmm2
> vmovdqa %xmm6, %xmm1
> vfmadd132pd %xmm7, %xmm2, %xmm1
> vmovsd  %xmm1, %xmm1, %xmm0
> vunpckhpd   %xmm1, %xmm1, %xmm1
> ret
>
> The improvement is even more dramatic when compared to the original
> 29 instructions shown in comment #8.  GCC 13, for example, required
> 12 transfers to/from memory.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-08-03  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/sse.md (define_split): Convert highpart:DF extract
> from V2DFmode register into a sse2_storehpd instruction.
> (define_split): Likewise, convert lowpart:DF extract from V2DF
> register into a sse2_storelpd instruction.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/pr88873.c: Tweak to check for improved code.

OK.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>


Re: Fix profile upate after vectorizer peeling

2023-08-03 Thread Jan Hubicka via Gcc-patches
> 
> Note most of the profile consistency checks FAIL when testing with -m32 on
> x86_64-unknown-linux-gnu ...
> 
> For example vect-11.c has
> 
> ;;   basic block 4, loop depth 0, count 719407024 (estimated locally,
> freq 0.6700), maybe hot
> ;;   Invalid sum of incoming counts 708669602 (estimated locally, freq
> 0.6600), should be 719407024 (estimated locally, freq 0.6700)
> ;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   3 [always (guessed)]  count:708669602 (estimated
> locally, freq 0.6600) (FALSE_VALUE,EXECUTABLE)
>   __asm__ __volatile__("cpuid
> " : "=a" a_44, "=b" b_45, "=c" c_46, "=d" d_47 : "0" 1, "2" 0);
>   _3 = d_47 & 67108864;
> 
> so it looks like it's the check_vect () function that goes wrong
> everywhere but only on i?86.
> The first dump with the Invalid sum is 095t.fixup_cfg3 already.

Sorry for that, looks like missing/undetected noreturn.  I will take a look.

Honza


Re: [PATCH 1/2] bpf: Implementation of BPF CO-RE builtins

2023-08-03 Thread Cupertino Miranda via Gcc-patches


>> +  /* FIXED: This wat not Ok.
>
> Hm?  If that is fixed, do we still need that comment? :)
Touche! ;)

>
>> +emit_insn ( \
>> +  gen_mov_reloc_coredi (reg, \
>> +gen_rtx_CONST_INT (Pmode, 0), \
>> +gen_rtx_CONST_INT (Pmode, index))); \
>
> These backslahes... was that in a macro originally?
Forgot to reply to this.
Yes, but my own defined macro. I initially assumed it needed
different emits depending on the original mode, so I made a macro and
used it in a switch case.
>
>> +return true;
>> +  }
>> +  }
>> +  return false;
>> +}


Re: [v2 PATCH 1/2] bpf: Implementation of BPF CO-RE builtins

2023-08-03 Thread Cupertino Miranda via Gcc-patches
>From fda9603ded735205b6e20fc5b65a04f8d15685e6 Mon Sep 17 00:00:00 2001
From: Cupertino Miranda 
Date: Thu, 6 Apr 2023 15:22:48 +0100
Subject: [PATCH v2 1/2] bpf: Implementation of BPF CO-RE builtins

This patch updates the support for the BPF CO-RE builtins
__builtin_preserve_access_index and __builtin_preserve_field_info,
and adds support for the CO-RE builtins __builtin_btf_type_id,
__builtin_preserve_type_info and __builtin_preserve_enum_value.

These CO-RE relocations are now converted to __builtin_core_reloc which
abstracts all of the original builtins in a polymorphic relocation
specific builtin.

The builtin processing is now split in 2 stages, the first (pack) is
executed right after the front-end and the second (process) right before
the asm output.

In expand pass the __builtin_core_reloc is converted to a
unspec:UNSPEC_CORE_RELOC rtx entry.

The data required to process the builtin is now collected in the packing
stage (after front-end), not allowing the compiler to optimize any of
the relevant information required to compose the relocation when
necessary.
At expansion, that information is recovered and CTF/BTF is queried to
construct the information that will be used in the relocation.
At this point the relocation is added to specific section and the
builtin is expanded to the expected default value for the builtin.

In order to process __builtin_preserve_enum_value, it was necessary to
hook the front-end to collect the original enum value reference.
This is needed since the parser folds all the enum values to its
integer_cst representation.

More details can be found within the core-builtins.cc.

Regtested in host x86_64-linux-gnu and target bpf-unknown-none.
---
 gcc/config.gcc  |4 +-
 gcc/config/bpf/bpf-passes.def   |   20 -
 gcc/config/bpf/bpf-protos.h |4 +-
 gcc/config/bpf/bpf.cc   |  806 ++
 gcc/config/bpf/bpf.md   |   17 +
 gcc/config/bpf/core-builtins.cc | 1394 +++
 gcc/config/bpf/core-builtins.h  |   35 +
 gcc/config/bpf/coreout.cc   |   50 +-
 gcc/config/bpf/coreout.h|   13 +-
 gcc/config/bpf/t-bpf|6 +-
 gcc/doc/extend.texi |   51 ++
 11 files changed, 1595 insertions(+), 805 deletions(-)
 delete mode 100644 gcc/config/bpf/bpf-passes.def
 create mode 100644 gcc/config/bpf/core-builtins.cc
 create mode 100644 gcc/config/bpf/core-builtins.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index eba69a463be0..c521669e78b1 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1597,8 +1597,8 @@ bpf-*-*)
 use_collect2=no
 extra_headers="bpf-helpers.h"
 use_gcc_stdint=provide
-extra_objs="coreout.o"
-target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc"
+extra_objs="coreout.o core-builtins.o"
+target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc \$(srcdir)/config/bpf/core-builtins.cc"
 ;;
 cris-*-elf | cris-*-none)
 	tm_file="elfos.h newlib-stdint.h ${tm_file}"
diff --git a/gcc/config/bpf/bpf-passes.def b/gcc/config/bpf/bpf-passes.def
deleted file mode 100644
index deeaee988a01..
--- a/gcc/config/bpf/bpf-passes.def
+++ /dev/null
@@ -1,20 +0,0 @@
-/* Declaration of target-specific passes for eBPF.
-   Copyright (C) 2021-2023 Free Software Foundation, Inc.
-
-   This file is part of GCC.
-
-   GCC is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3, or (at your option)
-   any later version.
-
-   GCC is distributed in the hope that it will be useful, but
-   WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with GCC; see the file COPYING3.  If not see
-   .  */
-
-INSERT_PASS_AFTER (pass_df_initialize_opt, 1, pass_bpf_core_attr);
diff --git a/gcc/config/bpf/bpf-protos.h b/gcc/config/bpf/bpf-protos.h
index b484310e8cbf..fbe0d8a0213f 100644
--- a/gcc/config/bpf/bpf-protos.h
+++ b/gcc/config/bpf/bpf-protos.h
@@ -30,7 +30,7 @@ extern void bpf_print_operand_address (FILE *, rtx);
 extern void bpf_expand_prologue (void);
 extern void bpf_expand_epilogue (void);
 extern void bpf_expand_cbranch (machine_mode, rtx *);
-
-rtl_opt_pass * make_pass_bpf_core_attr (gcc::context *);
+const char *bpf_add_core_reloc (rtx *operands, const char *templ);
+void bpf_replace_core_move_operands (rtx *operands);
 
 #endif /* ! GCC_BPF_PROTOS_H */
diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 57817cdf2f86..4873475e73bd 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -69,10 +69,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "gimplify-me.h"
 
-#include "ctfc.h"
-#i

Re: [v2 PATCH 2/2] bpf: CO-RE builtins support tests.

2023-08-03 Thread Cupertino Miranda via Gcc-patches

Hi,

Resending this patch since I have noticed I had a testcase added in
previous patch. Makes more sense here.

Thanks,
Cupertino

>From 334e9ae0f428f6573f2a5e8a3067a4d181b8b9c5 Mon Sep 17 00:00:00 2001
From: Cupertino Miranda 
Date: Thu, 27 Jul 2023 18:05:22 +0100
Subject: [PATCH v2 2/2] bpf: CO-RE builtins support tests.

This patch adds tests for the following builtins:
  __builtin_preserve_enum_value
  __builtin_btf_type_id
  __builtin_preserve_type_info
---
 .../gcc.target/bpf/core-builtin-enumvalue.c   |  52 +
 .../bpf/core-builtin-enumvalue_errors.c   |  22 
 .../bpf/core-builtin-enumvalue_opt.c  |  35 ++
 ...core-builtin-fieldinfo-const-elimination.c |  29 +
 .../bpf/core-builtin-fieldinfo-errors-1.c |   2 +-
 .../bpf/core-builtin-fieldinfo-errors-2.c |   2 +-
 .../gcc.target/bpf/core-builtin-type-based.c  |  58 ++
 .../gcc.target/bpf/core-builtin-type-id.c |  40 +++
 gcc/testsuite/gcc.target/bpf/core-support.h   | 109 ++
 9 files changed, 347 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_errors.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_opt.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-fieldinfo-const-elimination.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-type-based.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-type-id.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-support.h

diff --git a/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue.c b/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue.c
new file mode 100644
index ..3e3334dc089a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -dA -gbtf -mco-re" } */
+
+#include "core-support.h"
+
+extern int *v;
+
+int foo(void *data)
+{
+ int i = 0;
+ enum named_ue64 named_unsigned64 = 0;
+ enum named_se64 named_signed64 = 0;
+ enum named_ue named_unsigned = 0;
+ enum named_se named_signed = 0;
+
+ v[i++] = bpf_core_enum_value_exists (named_unsigned64, UE64_VAL1);
+ v[i++] = bpf_core_enum_value_exists (enum named_ue64, UE64_VAL2);
+ v[i++] = bpf_core_enum_value_exists (enum named_ue64, UE64_VAL3);
+ v[i++] = bpf_core_enum_value_exists (named_signed64, SE64_VAL1);
+ v[i++] = bpf_core_enum_value_exists (enum named_se64, SE64_VAL2);
+ v[i++] = bpf_core_enum_value_exists (enum named_se64, SE64_VAL3);
+
+ v[i++] = bpf_core_enum_value (named_unsigned64, UE64_VAL1);
+ v[i++] = bpf_core_enum_value (named_unsigned64, UE64_VAL2);
+ v[i++] = bpf_core_enum_value (named_signed64, SE64_VAL1);
+ v[i++] = bpf_core_enum_value (named_signed64, SE64_VAL2);
+
+ v[i++] = bpf_core_enum_value_exists (named_unsigned, UE_VAL1);
+ v[i++] = bpf_core_enum_value_exists (enum named_ue, UE_VAL2);
+ v[i++] = bpf_core_enum_value_exists (enum named_ue, UE_VAL3);
+ v[i++] = bpf_core_enum_value_exists (named_signed, SE_VAL1);
+ v[i++] = bpf_core_enum_value_exists (enum named_se, SE_VAL2);
+ v[i++] = bpf_core_enum_value_exists (enum named_se, SE_VAL3);
+
+ v[i++] = bpf_core_enum_value (named_unsigned, UE_VAL1);
+ v[i++] = bpf_core_enum_value (named_unsigned, UE_VAL2);
+ v[i++] = bpf_core_enum_value (named_signed, SE_VAL1);
+ v[i++] = bpf_core_enum_value (named_signed, SE_VAL2);
+
+ return 0;
+}
+
+/* { dg-final { scan-assembler-times "\t.4byte\t0x8\t; bpfcr_type \\(named_ue64\\)" 5 } } */
+/* { dg-final { scan-assembler-times "\t.4byte\t0x9\t; bpfcr_type \\(named_se64\\)" 5} } */
+/* { dg-final { scan-assembler-times "\t.4byte\t0xb\t; bpfcr_type \\(named_ue\\)" 5 } } */
+/* { dg-final { scan-assembler-times "\t.4byte\t0xc\t; bpfcr_type \\(named_se\\)" 5} } */
+/* { dg-final { scan-assembler-times "\t.4byte\t0xa\t; bpfcr_kind" 12 } } BPF_ENUMVAL_EXISTS */
+/* { dg-final { scan-assembler-times "\t.4byte\t0xb\t; bpfcr_kind" 8 } } BPF_ENUMVAL_VALUE */
+
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0\"\\)" 8 } } */
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"1\"\\)" 8 } } */
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"2\"\\)" 4 } } */
diff --git a/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_errors.c b/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_errors.c
new file mode 100644
index ..138e99895160
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_errors.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -dA -gbtf -mco-re" } */
+
+#include "core-support.h"
+
+extern int *v;
+
+unsigned long foo(void *data)
+{
+  int i = 0;
+  enum named_ue64 named_unsigned = 0;
+  enum named_se64 named_signed = 0;
+  typeof(enum named_ue64) a = 0;
+
+  v[i++] = __builtin_preserve_enum_value (({ extern typeof(named_unsigned) *_type0; _type0; }),	 0,	BPF_ENUMVAL_EXISTS); /* { dg-error "invalid enum value argument fo

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-08-03 Thread Richard Sandiford via Gcc-patches
Hao Liu OS  writes:
> Hi Richard,
>
> Update the patch with a simple case (see below case and comments).  It shows 
> a live stmt may not have reduction def, which introduce the ICE.
>
> Is it OK for trunk?

OK, thanks.

Richard

> 
> Fix the assertion failure on empty reduction define in info_for_reduction.
> Even a stmt is live, it may still have empty reduction define.  Check the
> reduction definition instead of live info before calling info_for_reduction.
>
> gcc/ChangeLog:
>
> PR target/110625
> * config/aarch64/aarch64.cc (aarch64_force_single_cycle): check
> STMT_VINFO_REDUC_DEF to avoid failures in info_for_reduction.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/pr110625_3.c: New testcase.
> ---
>  gcc/config/aarch64/aarch64.cc |  2 +-
>  gcc/testsuite/gcc.target/aarch64/pr110625_3.c | 34 +++
>  2 files changed, 35 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_3.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index d4d76025545..5b8d8fa8e2d 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16776,7 +16776,7 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
> stmt_vec_info stmt_info,
>  static bool
>  aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info)
>  {
> -  if (!STMT_VINFO_LIVE_P (stmt_info))
> +  if (!STMT_VINFO_REDUC_DEF (stmt_info))
>  return false;
>
>auto reduc_info = info_for_reduction (vinfo, stmt_info);
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625_3.c 
> b/gcc/testsuite/gcc.target/aarch64/pr110625_3.c
> new file mode 100644
> index 000..35a50290cb0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr110625_3.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mcpu=neoverse-n2" } */
> +
> +/* Avoid ICE on empty reduction def in single_defuse_cycle.
> +
> +   E.g.
> +  [local count: 858993456]:
> + # sum_18 = PHI 
> + sum.0_5 = (unsigned int) sum_18;
> + _6 = _4 + sum.0_5; <-- it is "live" but doesn't have reduction def
> + sum_15 = (int) _6;
> + ...
> + if (ivtmp_29 != 0)
> +   goto ; [75.00%]
> + else
> +   goto ; [25.00%]
> +
> +  [local count: 644245086]:
> + goto ; [100.00%]
> +
> +  [local count: 214748368]:
> + # _31 = PHI <_6(3)>
> + _8 = _31 >> 1;
> +*/
> +
> +int
> +f (unsigned int *tmp)
> +{
> +  int sum = 0;
> +  for (int i = 0; i < 4; i++)
> +sum += tmp[i];
> +
> +  return (unsigned int) sum >> 1;
> +}
> --
> 2.34.1
>
> 
> From: Hao Liu OS 
> Sent: Tuesday, August 1, 2023 17:43
> To: Richard Sandiford
> Cc: Richard Biener; GCC-patches@gcc.gnu.org
> Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
> multiplying count [PR110625]
>
> Hi Richard,
>
> This is a quick fix to the several ICEs.  It seems even STMT_VINFO_LIVE_P is 
> true, some reduct stmts still don't have REDUC_DEF.  So I change the check to 
> STMT_VINFO_REDUC_DEF.
>
> Is it OK for trunk?
>
> ---
> Fix the ICEs on empty reduction define.  Even STMT_VINFO_LIVE_P is true, some 
> reduct stmts
> still don't have definition.
>
> gcc/ChangeLog:
>
> PR target/110625
> * config/aarch64/aarch64.cc (aarch64_force_single_cycle): check
> STMT_VINFO_REDUC_DEF to avoid failures in info_for_reduction
> ---
>  gcc/config/aarch64/aarch64.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index d4d76025545..5b8d8fa8e2d 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16776,7 +16776,7 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
> stmt_vec_info stmt_info,
>  static bool
>  aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info)
>  {
> -  if (!STMT_VINFO_LIVE_P (stmt_info))
> +  if (!STMT_VINFO_REDUC_DEF (stmt_info))
>  return false;
>
>auto reduc_info = info_for_reduction (vinfo, stmt_info);
> --
> 2.40.0
>
>
> 
> From: Richard Sandiford 
> Sent: Monday, July 31, 2023 17:11
> To: Hao Liu OS
> Cc: Richard Biener; GCC-patches@gcc.gnu.org
> Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
> multiplying count [PR110625]
>
> Hao Liu OS  writes:
>>> Which test case do you see this for?  The two tests in the patch still
>>> seem to report correct latencies for me if I make the change above.
>>
>> Not the newly added tests.  It is still the existing case causing the 
>> previous ICE (i.e. assertion problem): 
>> gcc.target/aarch64/sve/cost_model_13.c.
>>
>> It's not the test case itself failed, but the dump message of vect says the 
>> "reduction latency" is 0:
>>
>> Before the change:
>> cost_model_13.c:7:21: note:  Original vector body cost = 6
>> cost_model_13.c:7:21: note:  Scalar iss

Re: Fix profile upate after vectorizer peeling

2023-08-03 Thread Jan Hubicka via Gcc-patches
> > 
> > Note most of the profile consistency checks FAIL when testing with -m32 on
> > x86_64-unknown-linux-gnu ...
> > 
> > For example vect-11.c has
> > 
> > ;;   basic block 4, loop depth 0, count 719407024 (estimated locally,
> > freq 0.6700), maybe hot
> > ;;   Invalid sum of incoming counts 708669602 (estimated locally, freq
> > 0.6600), should be 719407024 (estimated locally, freq 0.6700)
> > ;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
> > ;;pred:   3 [always (guessed)]  count:708669602 (estimated
> > locally, freq 0.6600) (FALSE_VALUE,EXECUTABLE)
> >   __asm__ __volatile__("cpuid
> > " : "=a" a_44, "=b" b_45, "=c" c_46, "=d" d_47 : "0" 1, "2" 0);
> >   _3 = d_47 & 67108864;
> > 
> > so it looks like it's the check_vect () function that goes wrong
> > everywhere but only on i?86.
> > The first dump with the Invalid sum is 095t.fixup_cfg3 already.
> 
> Sorry for that, looks like missing/undetected noreturn.  I will take a look.

The mismatch at fixup_cfg3 is harmless since we repropagate frequencies
later now.  The misupdate is caused by jump threading:

vect-11.c.102t.adjust_alignment:;;   Invalid sum of incoming counts 354334800 
(estimated locally, freq 0.3300), should be 233860966 (estimated locally, freq 
0.2178)
vect-11.c.102t.adjust_alignment:;;   Invalid sum of incoming counts 354334800 
(estimated locally, freq 0.3300), should be 474808634 (estimated locally, freq 
0.4422)
vect-11.c.107t.rebuild_frequencies1
vect-11.c.116t.threadfull1:;;   Invalid sum of incoming counts 708669600 
(estimated locally, freq 0.6600), should be 719407024 (estimated locally, freq 
0.6700)

I know that there are problems left in profile threading update.  It was
main pass disturbing profile until gcc13 and now works for basic
testcases but not always.  I already spent quite some time trying to
figure out what is wrong with profile threading (PR103680), so at least
this is small testcase.

Jeff, an help would be appreciated here :)

I will try to debug this.  One option would be to disable branch
prediciton on vect_check for time being - it is not inlined anyway

diff --git a/gcc/testsuite/gcc.dg/vect/tree-vect.h 
b/gcc/testsuite/gcc.dg/vect/tree-vect.h
index c4b81441216..544be31be78 100644
--- a/gcc/testsuite/gcc.dg/vect/tree-vect.h
+++ b/gcc/testsuite/gcc.dg/vect/tree-vect.h
@@ -20,7 +20,7 @@ sig_ill_handler (int sig)
   exit(0);
 }
 
-static void __attribute__((noinline))
+static void __attribute__((noinline,optimize(0)))
 check_vect (void)
 {
   signal(SIGILL, sig_ill_handler);

Honza


Re: [PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting

2023-08-03 Thread Jan Hubicka via Gcc-patches
> On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor  wrote:
> >
> > Hi,
> >
> > when IPA-SRA detects whether a parameter passed by reference is
> > written to, it does not special case CLOBBERs which means it often
> > bails out unnecessarily, especially when dealing with C++ destructors.
> > Fixed by the obvious continue in the two relevant loops.
> >
> > The (slightly) more complex testcases in the PR need surprisingly more
> > effort but the simple one can be fixed now easily by this patch and I'll
> > work on the others incrementally.
> >
> > Bootstrapped and currently undergoing testsuite run on x86_64-linux.  OK
> > if it passes too?
> 
> LGTM, btw - how are the clobbers handled during transform?

Looks good to me too. I was also wondering if we want to preserve
something about the clobber.  If SRA fully suceeds it would not be
needed but if the original location is not fully SRAed we may
theoretically lose information. We put additonal clobber after
destructor call, so one would need to wrap it in non-dstructor and be
sure that ipa-modref understands the clobber in order to obtain a
testcase.

Honza


[PATCH] c-family: Add _BitInt support for __atomic_*fetch* [PR102989]

2023-08-03 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch implements the lowering of __atomic_*fetch* functions
where first argument is a pointer to (optionally _Atomic) _BitInt which
either doesn't have size 1, 2, 4, 8 or 16 bytes or has 16 byte size but
target doesn't support TImode.
Patch on top of the _BitInt patch series.

Tested on x86_64-linux.

2023-08-03  Jakub Jelinek  

PR c/102989
gcc/c-family/
* c-common.cc (sync_resolve_size): Add ORIG_FORMAT argument.  If
FETCH && !ORIG_FORMAT, type is BITINT_TYPE, return -1 if size isn't
one of 1, 2, 4, 8 or 16 or if it is 16 but TImode is not supported.
(atomic_bitint_fetch_using_cas_loop): New function.
(resolve_overloaded_builtin): Adjust sync_resolve_size caller.  If
-1 is returned, use atomic_bitint_fetch_using_cas_loop to lower it.
Formatting fix.
gcc/testsuite/
* gcc.dg/bitint-18.c: New test.

--- gcc/c-family/c-common.cc.jj 2023-07-11 15:28:55.119673958 +0200
+++ gcc/c-family/c-common.cc2023-08-03 12:10:50.852085519 +0200
@@ -7190,12 +7190,16 @@ speculation_safe_value_resolve_return (t
 /* A helper function for resolve_overloaded_builtin in resolving the
overloaded __sync_ builtins.  Returns a positive power of 2 if the
first operand of PARAMS is a pointer to a supported data type.
-   Returns 0 if an error is encountered.
+   Returns 0 if an error is encountered.  Return -1 for _BitInt
+   __atomic*fetch* with unsupported type which should be handled by
+   a cas loop.
FETCH is true when FUNCTION is one of the _FETCH_OP_ or _OP_FETCH_
+   built-ins.  ORIG_FORMAT is for __sync_* rather than __atomic_*
built-ins.  */
 
 static int
-sync_resolve_size (tree function, vec *params, bool fetch)
+sync_resolve_size (tree function, vec *params, bool fetch,
+  bool orig_format)
 {
   /* Type of the argument.  */
   tree argtype;
@@ -7230,9 +7234,19 @@ sync_resolve_size (tree function, vec *orig_params)
+{
+  enum tree_code code = ERROR_MARK;
+  bool return_old_p = false;
+  switch (orig_code)
+{
+case BUILT_IN_ATOMIC_ADD_FETCH_N:
+  code = PLUS_EXPR;
+  break;
+case BUILT_IN_ATOMIC_SUB_FETCH_N:
+  code = MINUS_EXPR;
+  break;
+case BUILT_IN_ATOMIC_AND_FETCH_N:
+  code = BIT_AND_EXPR;
+  break;
+case BUILT_IN_ATOMIC_NAND_FETCH_N:
+  break;
+case BUILT_IN_ATOMIC_XOR_FETCH_N:
+  code = BIT_XOR_EXPR;
+  break;
+case BUILT_IN_ATOMIC_OR_FETCH_N:
+  code = BIT_IOR_EXPR;
+  break;
+case BUILT_IN_ATOMIC_FETCH_ADD_N:
+  code = PLUS_EXPR;
+  return_old_p = true;
+  break;
+case BUILT_IN_ATOMIC_FETCH_SUB_N:
+  code = MINUS_EXPR;
+  return_old_p = true;
+  break;
+case BUILT_IN_ATOMIC_FETCH_AND_N:
+  code = BIT_AND_EXPR;
+  return_old_p = true;
+  break;
+case BUILT_IN_ATOMIC_FETCH_NAND_N:
+  return_old_p = true;
+  break;
+case BUILT_IN_ATOMIC_FETCH_XOR_N:
+  code = BIT_XOR_EXPR;
+  return_old_p = true;
+  break;
+case BUILT_IN_ATOMIC_FETCH_OR_N:
+  code = BIT_IOR_EXPR;
+  return_old_p = true;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  if (orig_params->length () != 3)
+{
+  if (orig_params->length () < 3)
+   error_at (loc, "too few arguments to function %qE", orig_function);
+  else
+   error_at (loc, "too many arguments to function %qE", orig_function);
+  return error_mark_node;
+}
+
+  tree stmts = push_stmt_list ();
+
+  tree nonatomic_lhs_type = TREE_TYPE (TREE_TYPE ((*orig_params)[0]));
+  nonatomic_lhs_type = TYPE_MAIN_VARIANT (nonatomic_lhs_type);
+  gcc_assert (TREE_CODE (nonatomic_lhs_type) == BITINT_TYPE);
+
+  tree lhs_addr = (*orig_params)[0];
+  tree val = convert (nonatomic_lhs_type, (*orig_params)[1]);
+  tree model = convert (integer_type_node, (*orig_params)[2]);
+  if (TREE_SIDE_EFFECTS (lhs_addr))
+{
+  tree var = create_tmp_var_raw (TREE_TYPE (lhs_addr));
+  lhs_addr = build4 (TARGET_EXPR, TREE_TYPE (lhs_addr), var, lhs_addr,
+NULL_TREE, NULL_TREE);
+  add_stmt (lhs_addr);
+}
+  if (TREE_SIDE_EFFECTS (val))
+{
+  tree var = create_tmp_var_raw (nonatomic_lhs_type);
+  val = build4 (TARGET_EXPR, nonatomic_lhs_type, var, val, NULL_TREE,
+   NULL_TREE);
+  add_stmt (val);
+}
+  if (TREE_SIDE_EFFECTS (model))
+{
+  tree var = create_tmp_var_raw (integer_type_node);
+  model = build4 (TARGET_EXPR, integer_type_node, var, model, NULL_TREE,
+ NULL_TREE);
+  add_stmt (model);
+}
+
+  tree old = create_tmp_var_raw (nonatomic_lhs_type);
+  tree old_addr = build_unary_op (loc, ADDR_EXPR, old, false);
+  TREE_ADDRESSABLE (old) = 1;
+  suppress_warning (old);
+
+  tree newval = create_tmp_var_raw (nonatomic_lhs_type);
+  tree newval_addr = build_unary_op (loc, ADDR_EXPR, newval, false);
+  TREE_ADDRESSABLE (newval) = 1;
+  suppress_warning (newval);
+
+  tree loop

Re: [PATCH V2] RISC-V: Support CALL conditional autovec patterns

2023-08-03 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

I would find it a bit clearer if the prepare_ternay part were a
separate patch.  As it's mostly mechanical replacements I don't
mind too much, though so it's LGTM from my side without that.

As to the lmul = 8 ICE, is the problem that the register allocator
would actually need 5 "registers" when doing the merge by itself
and we only have 4?

Regards
 Robin


[RFC] c++: extend cold, hot attributes to classes

2023-08-03 Thread Javier Martinez via Gcc-patches
Most code is cold. This patch extends support for attribute ((cold)) to C++
Classes, Unions, and Structs (RECORD_TYPES and UNION_TYPES) to benefit from
encapsulation - reducing the verbosity of using the attribute where
deserved. The ((hot)) attribute is also extended for its semantic relation.
What is the sentiment on this use-case?

for gcc/c-family/ChangeLog

* c-attribs.c (attribute_spec): Remove decl_required
field for "hot" and "cold" attributes.
(handle_cold_attribute): remove warning on RECORD_TYPE
and UNION_TYPE when c_dialect_cxx.
(handle_cold_attribute): remove warning on RECORD_TYPE
and UNION_TYPE when c_dialect_cxx.

for gcc/cp/ChangeLog

* class.c (finish_struct) propagate hot and cold
attributes to all FUNCTION_DECL when the class
itself is marked hot or cold.

for  gcc/testsuite/ChangeLog

* g++.dg/ext/attr-hotness.C: New.


Signed-off-by: Javier Martinez 

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index dc9579c..815df66 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -398,10 +398,10 @@ const struct attribute_spec
c_common_attribute_table[] =
   { "alloc_size",  1, 2, false, true, true, false,
   handle_alloc_size_attribute,
   attr_alloc_exclusions },
-  { "cold",   0, 0, true,  false, false, false,
+  { "cold",   0, 0, false,  false, false, false,
   handle_cold_attribute,
   attr_cold_hot_exclusions },
-  { "hot",0, 0, true,  false, false, false,
+  { "hot",0, 0, false,  false, false, false,
   handle_hot_attribute,
   attr_cold_hot_exclusions },
   { "no_address_safety_analysis",
@@ -837,22 +837,23 @@ handle_noreturn_attribute (tree *node, tree name,
tree ARG_UNUSED (args),

 static tree
 handle_hot_attribute (tree *node, tree name, tree ARG_UNUSED (args),
-  int ARG_UNUSED (flags), bool *no_add_attrs)
+   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
-  if (TREE_CODE (*node) == FUNCTION_DECL
-  || TREE_CODE (*node) == LABEL_DECL)
+  if ( (TREE_CODE (*node) == FUNCTION_DECL ||
+TREE_CODE (*node) == LABEL_DECL)
+  || ((TREE_CODE(*node) == RECORD_TYPE ||
+   TREE_CODE(*node) == UNION_TYPE) && c_dialect_cxx()))
 {
   /* Attribute hot processing is done later with lookup_attribute.  */
 }
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
-  *no_add_attrs = true;
+  *no_add_attrs = true;
 }

   return NULL_TREE;
 }
-
 /* Handle a "cold" and attribute; arguments as in
struct attribute_spec.handler.  */

@@ -860,15 +861,17 @@ static tree
 handle_cold_attribute (tree *node, tree name, tree ARG_UNUSED (args),
int ARG_UNUSED (flags), bool *no_add_attrs)
 {
-  if (TREE_CODE (*node) == FUNCTION_DECL
-  || TREE_CODE (*node) == LABEL_DECL)
+  if ( (TREE_CODE (*node) == FUNCTION_DECL ||
+TREE_CODE (*node) == LABEL_DECL)
+  || ((TREE_CODE(*node) == RECORD_TYPE ||
+   TREE_CODE(*node) == UNION_TYPE) && c_dialect_cxx()))
 {
   /* Attribute cold processing is done later with lookup_attribute.  */
 }
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
-  *no_add_attrs = true;
+  *no_add_attrs = true;
 }

   return NULL_TREE;
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 07abe52..70f734f 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -7540,6 +7540,35 @@ finish_struct (tree t, tree attributes)
   && !LAMBDA_TYPE_P (t))
 add_stmt (build_min (TAG_DEFN, t));

+
+  /* classes marked with hotness attributes propagate the attribute to
+  all methods. We propagate these here as there is a guarantee that
+  TYPE_FIELDS is populated, as opposed from within decl_attributes. */
+
+  tree has_cold_attr = lookup_attribute("cold", TYPE_ATTRIBUTES(t));
+  tree has_hot_attr = lookup_attribute("hot", TYPE_ATTRIBUTES(t));
+
+  if ( has_cold_attr || has_hot_attr ) {
+
+/* hoisted out of the loop */
+tree attr_cold_id = get_identifier("cold");
+tree attr_hot_id = get_identifier("hot");
+
+for (tree f = TYPE_FIELDS (t); f; f = DECL_CHAIN (f))
+  {
+if (TREE_CODE (f) == FUNCTION_DECL) {
+  /* decl_attributes will take care of conflicts,
+  also prioritizing attributes explicitly marked in methods */
+
+  if (has_cold_attr) {
+decl_attributes (&f, tree_cons (attr_cold_id, NULL, NULL), 0);
+  } else if (has_hot_attr) {
+decl_attributes (&f, tree_cons (attr_hot_id, NULL, NULL), 0);
+  }
+}
+  }
+  }
+
   return t;
 }

diff --git a/gcc/testsuite/g++.dg/ext/attr-hotness.C
b/gcc/testsuite/g++.dg/ext/attr-hotness.C
new file mode 100644
index 000..075c624
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attr-hotness.C
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -Wattributes -fdump-tree-gimple" } 

[PATCH v3 0/8] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-03 Thread Chenghui Pan
This is an update of : 
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624770.html

Changes since last version:
- Revert vabsd/xvabsd RTL templates to unspec impl, because arithmetic RTL 
expression
  cannot cover the edge case of the instruction output. v2 impl of vabsd/xvabsd 
template
  cause the failure of gcc.dg/sabd_1.c when running regression test with 
  RUNTESTFLAGS="--target_board=unix/-mlsx".
- Resolve warning in gcc/config/loongarch/loongarch.cc when bootstrapping with 
  BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx".
- Remove redundant definitions in lasxintrin.h.
- Refine commit info.

Lulu Cheng (8):
  LoongArch: Add Loongson SX vector directive compilation framework.
  LoongArch: Add Loongson SX base instruction support.
  LoongArch: Add Loongson SX directive builtin function support.
  LoongArch: Add Loongson ASX vector directive compilation framework.
  LoongArch: Add Loongson ASX base instruction support.
  LoongArch: Add Loongson ASX directive builtin function support.
  LoongArch: Add Loongson SX directive test cases.
  LoongArch: Add Loongson ASX directive test cases.

 gcc/config.gcc| 2 +-
 gcc/config/loongarch/constraints.md   |   131 +-
 .../loongarch/genopts/loongarch-strings   | 4 +
 gcc/config/loongarch/genopts/loongarch.opt.in |12 +-
 gcc/config/loongarch/lasx.md  |  5122 +++
 gcc/config/loongarch/lasxintrin.h |  5338 +++
 gcc/config/loongarch/loongarch-builtins.cc|  2686 +-
 gcc/config/loongarch/loongarch-c.cc   |18 +
 gcc/config/loongarch/loongarch-def.c  | 6 +
 gcc/config/loongarch/loongarch-def.h  | 9 +-
 gcc/config/loongarch/loongarch-driver.cc  |10 +
 gcc/config/loongarch/loongarch-driver.h   | 2 +
 gcc/config/loongarch/loongarch-ftypes.def |   666 +-
 gcc/config/loongarch/loongarch-modes.def  |39 +
 gcc/config/loongarch/loongarch-opts.cc|89 +-
 gcc/config/loongarch/loongarch-opts.h | 3 +
 gcc/config/loongarch/loongarch-protos.h   |35 +
 gcc/config/loongarch/loongarch-str.h  | 3 +
 gcc/config/loongarch/loongarch.cc |  4669 +-
 gcc/config/loongarch/loongarch.h  |   117 +-
 gcc/config/loongarch/loongarch.md |56 +-
 gcc/config/loongarch/loongarch.opt|12 +-
 gcc/config/loongarch/lsx.md   |  4481 ++
 gcc/config/loongarch/lsxintrin.h  |  5181 +++
 gcc/config/loongarch/predicates.md|   333 +-
 gcc/doc/md.texi   |11 +
 .../gcc.target/loongarch/strict-align.c   |13 +
 .../vector/lasx/lasx-bit-manipulate.c | 27813 +++
 .../loongarch/vector/lasx/lasx-builtin.c  |  1509 +
 .../loongarch/vector/lasx/lasx-cmp.c  |  5361 +++
 .../loongarch/vector/lasx/lasx-fp-arith.c |  6259 +++
 .../loongarch/vector/lasx/lasx-fp-cvt.c   |  7315 +++
 .../loongarch/vector/lasx/lasx-int-arith.c| 38361 
 .../loongarch/vector/lasx/lasx-mem.c  |   147 +
 .../loongarch/vector/lasx/lasx-perm.c |  7730 
 .../vector/lasx/lasx-str-manipulate.c |   712 +
 .../loongarch/vector/lasx/lasx-xvldrepl.c |13 +
 .../loongarch/vector/lasx/lasx-xvstelm.c  |12 +
 .../loongarch/vector/loongarch-vector.exp |42 +
 .../loongarch/vector/lsx/lsx-bit-manipulate.c | 15586 +++
 .../loongarch/vector/lsx/lsx-builtin.c|  1461 +
 .../gcc.target/loongarch/vector/lsx/lsx-cmp.c |  3354 ++
 .../loongarch/vector/lsx/lsx-fp-arith.c   |  3713 ++
 .../loongarch/vector/lsx/lsx-fp-cvt.c |  4114 ++
 .../loongarch/vector/lsx/lsx-int-arith.c  | 22424 +
 .../gcc.target/loongarch/vector/lsx/lsx-mem.c |   537 +
 .../loongarch/vector/lsx/lsx-perm.c   |   +++
 .../loongarch/vector/lsx/lsx-str-manipulate.c |   408 +
 .../loongarch/vector/simd_correctness_check.h |39 +
 49 files changed, 181229 insertions(+), 284 deletions(-)
 create mode 100644 gcc/config/loongarch/lasx.md
 create mode 100644 gcc/config/loongarch/lasxintrin.h
 create mode 100644 gcc/config/loongarch/lsx.md
 create mode 100644 gcc/config/loongarch/lsxintrin.h
 create mode 100644 gcc/testsuite/gcc.target/loongarch/strict-align.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-bit-manipulate.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-builtin.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-cmp.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-fp-arith.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-fp-cvt.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-int-arith.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-mem.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-perm.c
 create mode 1

[PATCH v3 1/8] LoongArch: Add Loongson SX vector directive compilation framework.

2023-08-03 Thread Chenghui Pan
From: Lulu Cheng 

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Add compilation framework.
* config/loongarch/genopts/loongarch.opt.in: Ditto.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins): Ditto.
* config/loongarch/loongarch-def.c: Ditto.
* config/loongarch/loongarch-def.h (N_ISA_EXT_TYPES): Ditto.
(ISA_EXT_SIMD_LSX): Ditto.
(N_SWITCH_TYPES): Ditto.
(SW_LSX): Ditto.
(struct loongarch_isa): Ditto.
* config/loongarch/loongarch-driver.cc (APPEND_SWITCH): Ditto.
(driver_get_normalized_m_opts): Ditto.
* config/loongarch/loongarch-driver.h (driver_get_normalized_m_opts): 
Ditto.
* config/loongarch/loongarch-opts.cc (loongarch_config_target): Ditto.
(isa_str): Ditto.
* config/loongarch/loongarch-opts.h (ISA_HAS_LSX): Ditto.
* config/loongarch/loongarch-str.h (OPTSTR_LSX): Ditto.
* config/loongarch/loongarch.opt: Ditto.
---
 .../loongarch/genopts/loongarch-strings   |  3 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  8 +-
 gcc/config/loongarch/loongarch-c.cc   |  7 ++
 gcc/config/loongarch/loongarch-def.c  |  4 +
 gcc/config/loongarch/loongarch-def.h  |  7 +-
 gcc/config/loongarch/loongarch-driver.cc  | 10 +++
 gcc/config/loongarch/loongarch-driver.h   |  1 +
 gcc/config/loongarch/loongarch-opts.cc| 82 ++-
 gcc/config/loongarch/loongarch-opts.h |  1 +
 gcc/config/loongarch/loongarch-str.h  |  2 +
 gcc/config/loongarch/loongarch.opt|  8 +-
 11 files changed, 128 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index a40998ead97..24a5025061f 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -40,6 +40,9 @@ OPTSTR_SOFT_FLOAT soft-float
 OPTSTR_SINGLE_FLOAT   single-float
 OPTSTR_DOUBLE_FLOAT   double-float
 
+# SIMD extensions
+OPTSTR_LSX lsx
+
 # -mabi=
 OPTSTR_ABI_BASE  abi
 STR_ABI_BASE_LP64Dlp64d
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 4b9b4ac273e..338d77a7e40 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -76,6 +76,9 @@ m@@OPTSTR_DOUBLE_FLOAT@@
 Target Driver RejectNegative Var(la_opt_switches) Mask(FORCE_F64) 
Negative(m@@OPTSTR_SOFT_FLOAT@@)
 Allow hardware floating-point instructions to cover both 32-bit and 64-bit 
operations.
 
+m@@OPTSTR_LSX@@
+Target RejectNegative Var(la_opt_switches) Mask(LSX) Negative(m@@OPTSTR_LSX@@)
+Enable LoongArch SIMD Extension (LSX).
 
 ;; Base target models (implies ISA & tune parameters)
 Enum
@@ -125,11 +128,14 @@ Target RejectNegative Joined ToLower Enum(abi_base) 
Var(la_opt_abi_base) Init(M_
 Variable
 int la_opt_abi_ext = M_OPTION_NOT_SEEN
 
-
 mbranch-cost=
 Target RejectNegative Joined UInteger Var(loongarch_branch_cost)
 -mbranch-cost=COST Set the cost of branches to roughly COST instructions.
 
+mmemvec-cost=
+Target RejectNegative Joined UInteger Var(loongarch_vector_access_cost) 
IntegerRange(1, 5)
+mmemvec-cost=COST  Set the cost of vector memory access instructions.
+
 mcheck-zero-division
 Target Mask(CHECK_ZERO_DIV)
 Trap on integer divide by zero.
diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 67911b78f28..b065921adc3 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -99,6 +99,13 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   else
 builtin_define ("__loongarch_frlen=0");
 
+  if (ISA_HAS_LSX)
+{
+  builtin_define ("__loongarch_simd");
+  builtin_define ("__loongarch_sx");
+  builtin_define ("__loongarch_sx_width=128");
+}
+
   /* Native Data Sizes.  */
   builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE);
   builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE);
diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index 6729c857f7c..28e24c62249 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -49,10 +49,12 @@ loongarch_cpu_default_isa[N_ARCH_TYPES] = {
   [CPU_LOONGARCH64] = {
   .base = ISA_BASE_LA64V100,
   .fpu = ISA_EXT_FPU64,
+  .simd = 0,
   },
   [CPU_LA464] = {
   .base = ISA_BASE_LA64V100,
   .fpu = ISA_EXT_FPU64,
+  .simd = ISA_EXT_SIMD_LSX,
   },
 };
 
@@ -147,6 +149,7 @@ loongarch_isa_ext_strings[N_ISA_EXT_TYPES] = {
   [ISA_EXT_FPU64] = STR_ISA_EXT_FPU64,
   [ISA_EXT_FPU32] = STR_ISA_EXT_FPU32,
   [ISA_EXT_NOFPU] = STR_ISA_EXT_NOFPU,
+  [ISA_EXT_SIMD_LSX] = OPTSTR_LSX,
 };
 
 const char*
@@ -176,6 +179,7 @@ loongarch_switch_strings[] = {
   [SW_SOFT_FLOAT]= OPTSTR_SOFT_FLOAT,
   [SW_SINGLE_FLOAT]  = 

[PATCH v3 4/8] LoongArch: Add Loongson ASX vector directive compilation framework.

2023-08-03 Thread Chenghui Pan
From: Lulu Cheng 

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Add compilation framework.
* config/loongarch/genopts/loongarch.opt.in: Ditto.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins): Ditto.
* config/loongarch/loongarch-def.c: Ditto.
* config/loongarch/loongarch-def.h (N_ISA_EXT_TYPES): Ditto.
(ISA_EXT_SIMD_LASX): Ditto.
(N_SWITCH_TYPES): Ditto.
(SW_LASX): Ditto.
* config/loongarch/loongarch-driver.cc (driver_get_normalized_m_opts): 
Ditto.
* config/loongarch/loongarch-driver.h (driver_get_normalized_m_opts): 
Ditto.
* config/loongarch/loongarch-opts.cc (isa_str): Ditto.
* config/loongarch/loongarch-opts.h (ISA_HAS_LSX): Ditto.
(ISA_HAS_LASX): Ditto.
* config/loongarch/loongarch-str.h (OPTSTR_LASX): Ditto.
* config/loongarch/loongarch.opt: Ditto.
---
 gcc/config/loongarch/genopts/loongarch-strings |  1 +
 gcc/config/loongarch/genopts/loongarch.opt.in  |  4 
 gcc/config/loongarch/loongarch-c.cc| 11 +++
 gcc/config/loongarch/loongarch-def.c   |  4 +++-
 gcc/config/loongarch/loongarch-def.h   |  6 --
 gcc/config/loongarch/loongarch-driver.cc   |  2 +-
 gcc/config/loongarch/loongarch-driver.h|  1 +
 gcc/config/loongarch/loongarch-opts.cc |  9 -
 gcc/config/loongarch/loongarch-opts.h  |  4 +++-
 gcc/config/loongarch/loongarch-str.h   |  1 +
 gcc/config/loongarch/loongarch.opt |  4 
 11 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index 24a5025061f..35d08f5967d 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -42,6 +42,7 @@ OPTSTR_DOUBLE_FLOAT   double-float
 
 # SIMD extensions
 OPTSTR_LSX lsx
+OPTSTR_LASXlasx
 
 # -mabi=
 OPTSTR_ABI_BASE  abi
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 338d77a7e40..afde23c9661 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -80,6 +80,10 @@ m@@OPTSTR_LSX@@
 Target RejectNegative Var(la_opt_switches) Mask(LSX) Negative(m@@OPTSTR_LSX@@)
 Enable LoongArch SIMD Extension (LSX).
 
+m@@OPTSTR_LASX@@
+Target RejectNegative Var(la_opt_switches) Mask(LASX) 
Negative(m@@OPTSTR_LASX@@)
+Enable LoongArch Advanced SIMD Extension (LASX).
+
 ;; Base target models (implies ISA & tune parameters)
 Enum
 Name(cpu_type) Type(int)
diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index b065921adc3..2747fb9e472 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -104,8 +104,19 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define ("__loongarch_simd");
   builtin_define ("__loongarch_sx");
   builtin_define ("__loongarch_sx_width=128");
+
+  if (!ISA_HAS_LASX)
+   builtin_define ("__loongarch_simd_width=128");
 }
 
+  if (ISA_HAS_LASX)
+{
+  builtin_define ("__loongarch_asx");
+  builtin_define ("__loongarch_asx_width=256");
+  builtin_define ("__loongarch_simd_width=256");
+}
+
+
   /* Native Data Sizes.  */
   builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE);
   builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE);
diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index 28e24c62249..bff92c86532 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -54,7 +54,7 @@ loongarch_cpu_default_isa[N_ARCH_TYPES] = {
   [CPU_LA464] = {
   .base = ISA_BASE_LA64V100,
   .fpu = ISA_EXT_FPU64,
-  .simd = ISA_EXT_SIMD_LSX,
+  .simd = ISA_EXT_SIMD_LASX,
   },
 };
 
@@ -150,6 +150,7 @@ loongarch_isa_ext_strings[N_ISA_EXT_TYPES] = {
   [ISA_EXT_FPU32] = STR_ISA_EXT_FPU32,
   [ISA_EXT_NOFPU] = STR_ISA_EXT_NOFPU,
   [ISA_EXT_SIMD_LSX] = OPTSTR_LSX,
+  [ISA_EXT_SIMD_LASX] = OPTSTR_LASX,
 };
 
 const char*
@@ -180,6 +181,7 @@ loongarch_switch_strings[] = {
   [SW_SINGLE_FLOAT]  = OPTSTR_SINGLE_FLOAT,
   [SW_DOUBLE_FLOAT]  = OPTSTR_DOUBLE_FLOAT,
   [SW_LSX]   = OPTSTR_LSX,
+  [SW_LASX]  = OPTSTR_LASX,
 };
 
 
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index f34cffcfb9b..0bbcdb03d22 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -64,7 +64,8 @@ extern const char* loongarch_isa_ext_strings[];
 #define ISA_EXT_FPU642
 #define N_ISA_EXT_FPU_TYPES   3
 #define ISA_EXT_SIMD_LSX  3
-#define N_ISA_EXT_TYPES  4
+#define ISA_EXT_SIMD_LASX 4
+#define N_ISA_EXT_TYPES  5
 
 /* enum abi_base */
 extern const char* loong

Re: [RFC] light expander sra for parameters and returns

2023-08-03 Thread Richard Biener via Gcc-patches
On Thu, 3 Aug 2023, Jiufu Guo wrote:

> 
> Hi Richard,
> 
> Richard Biener  writes:
> 
> > On Tue, 1 Aug 2023, Jiufu Guo wrote:
> >
> >> 
> >> Hi,
> >> 
> >> Richard Biener  writes:
> >> 
> >> > On Mon, 24 Jul 2023, Jiufu Guo wrote:
> >> >
> >> >> 
> >> >> Hi Martin,
> >> >> 
> >> >> Not sure about your current option about re-using the ipa-sra code
> >> >> in the light-expander-sra. And if anything I could input please
> >> >> let me know.
> >> >> 
> >> >> And I'm thinking about the difference between the expander-sra, ipa-sra
> >> >> and tree-sra. 1. For stmts walking, expander-sra has special behavior
> >> >> for return-stmt, and also a little special on assign-stmt. And phi
> >> >> stmts are not checked by ipa-sra/tree-sra. 2. For the access structure,
> >> >> I'm also thinking if we need a tree structure; it would be useful when
> >> >> checking overlaps, it was not used now in the expander-sra.
> >> >> 
> >> >> For ipa-sra and tree-sra, I notice that there is some similar code,
> >> >> but of cause there are differences. While it seems the difference
> >> >> is 'intended', for example: 1. when creating and accessing,
> >> >> 'size != max_size' is acceptable in tree-sra but not for ipa-sra.
> >> >> 2. 'AGGREGATE_TYPE_P' for ipa-sra is accepted for some cases, but
> >> >> not ok for tree-ipa.  
> >> >> I'm wondering if those slight difference blocks re-use the code
> >> >> between ipa-sra and tree-sra.
> >> >> 
> >> >> The expander-sra may be more light, for example, maybe we can use
> >> >> FOR_EACH_IMM_USE_STMT to check the usage of each parameter, and not
> >> >> need to walk all the stmts.
> >> >
> >> > What I was hoping for is shared stmt-level analysis and a shared
> >> > data structure for the "access"(es) a stmt performs.  Because that
> >> > can come up handy in multiple places.  The existing SRA data
> >> > structures could easily embed that subset for example if sharing
> >> > the whole data structure of [IPA] SRA seems too unwieldly.
> >> 
> >> Understand.
> >> The stmt-level analysis and "access" data structure are similar
> >> between ipa-sra/tree-sra and the expander-sra.
> >> 
> >> I just update the patch, this version does not change the behaviors of
> >> the previous version.  It is just cleaning/merging some functions only.
> >> The patch is attached.
> >> 
> >> This version (and tree-sra/ipa-sra) is still using the similar
> >> "stmt analyze" and "access struct"".  This could be extracted as
> >> shared code.
> >> I'm thinking to update the code to use the same "base_access" and
> >> "walk function".
> >> 
> >> >
> >> > With a stmt-leve API using FOR_EACH_IMM_USE_STMT would still be
> >> > possible (though RTL expansion pre-walks all stmts anyway).
> >> 
> >> Yeap, I also notice that "FOR_EACH_IMM_USE_STMT" is not enough.
> >> For struct parameters, walking stmt is needed.
> >
> > I think I mentioned this before, RTL expansion already
> > pre-walks the whole function looking for variables it has to
> > expand to the stack in discover_nonconstant_array_refs (which is
> > now badly named), I'd appreciate if the "SRA" walk would piggy-back
> > on that existing walk.
> 
> I may misunderstand your meaning about 'stmt-level analysis' and
> 'pre-walk'.
> I understand it as 'walk to analyze each stmt in each bb'.
> In 'discover_nonconstant_array_refs', there is a 'walk_gimple_op':
> 
>  FOR_EACH_BB_FN (bb, cfun)
>   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>{
> gimple *stmt = gsi_stmt (gsi);
> if (!is_gimple_debug (stmt))
>  {
>   walk_gimple_op (stmt, discover_nonconstant_array_refs_r, &wi);
> 
> Maybe, this 'walk_gimple_op' is what you mentioned as 'stmt-level
> analysis' and the 'pre-walk'.  Is this right?
> 
> Here, 'discover_nonconstant_array_refs_r' is the callback of
> 'walk_gimple_op'.
> This callback analyses if an array is accessed through a variant index,
> if so, the array must be put into the stack.
> 
> While it seems 'walk_gimple_op' is not ok for SRA analysis,
> because, in the callback, it is hard to analyze an access is
> 'read' or 'write' to an struct object. But the 'read/write' info
> is needed for SRA.
> 
> 'walk_stmt_load_store_addr_ops' is another 'walk on stmt ops'.
> This 'walk' is not used to analyze SRA access, the primary reason
> would be: the load/store/addr parameter of the callback is a
> DECL, the 'size' and 'offset' are stripped.
> 
> Currently, in tree-sra/ipa-sra/expand-sra, when analyzing stmt
> for SRA access, stmt code is checked for 'return/assign/call/asm'.
> And sub-expressions of these stmt(s) are analyzed.

I realize the stmt walks cannot be shared but the per-stmt
walk of the SRA analysis should still be done there simply for
cache locality reasons (thus compile-time).

Richard.

> 
> BR,
> Jeff (Jiufu Guo)
> 
> >
> > For RTL expansion I think a critical part is to create accesses
> > based on the incoming/outgoing RTL which is specified by the ABI.
> > As I understand we are optimizing the argument se

Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-03 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches
 wrote:
>
> On Mon, 31 Jul 2023, Jeff Law wrote:
>
> >
> >
> > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote:
> > > The following delays sinking of loads within the same innermost
> > > loop when it was unconditional before.  That's a not uncommon
> > > issue preventing vectorization when masked loads are not available.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > > I have a followup patch improving sinking that without this would
> > > cause more of the problematic sinking - now that we have a second
> > > sink pass after loop opts this looks like a reasonable approach?
> > >
> > > OK?
> > >
> > > Thanks,
> > > Richard.
> > >
> > >  PR tree-optimization/92335
> > >  * tree-ssa-sink.cc (select_best_block): Before loop
> > >  optimizations avoid sinking unconditional loads/stores
> > >  in innermost loops to conditional executed places.
> > >
> > >  * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing.
> > >  * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c,
> > >  expect predictive commoning to happen instead of sinking.
> > >  * gcc.dg/vect/pr65947-3.c: Adjust.
> > I think it's reasonable -- there's probably going to be cases where it's not
> > great, but more often than not I think it's going to be a reasonable
> > heuristic.
> >
> > If there is undesirable fallout, better to find it over the coming months 
> > than
> > next spring.  So I'd suggest we go forward now to give more time to find any
> > pathological cases (if they exist).
>
> Agreed, I've pushed this now.
Hi Richard,
After this patch (committed in 399c8dd44ff44f4b496223c7cc980651c4d6f6a0),
pr65947-7.c "failed" for aarch64-linux-gnu:
FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects
scan-tree-dump-not vect "LOOP VECTORIZED"

/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target {
! vect_fold_extract_last } } } } */

With your commit, condition_reduction in pr65947-7.c gets vectorized
regardless of vect_fold_extract_last,
which gates the above test (which is an improvement, because the
function didn't get vectorized before the commit).

The attached patch thus removes the gating on vect_fold_extract_last,
and the test passes again.
OK to commit ?

Thanks,
Prathamesh
>
> Richard.
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-7.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
index 16cdcd1c6eb..7dabae81abf 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-7.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
@@ -52,5 +52,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target 
vect_fold_extract_last } } } */
-/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! 
vect_fold_extract_last } } } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */


Re: [COMMITTEDv3] tree-optimization: [PR100864] `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-08-03 Thread Mikael Morin

Hello,

Le 31/07/2023 à 19:07, Andrew Pinski via Gcc-patches a écrit :

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index a71c0727b0b..ddaf22f2179 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -121,3 +121,45 @@ bitwise_equal_p (tree expr1, tree expr2)
  return wi::to_wide (expr1) == wi::to_wide (expr2);
return operand_equal_p (expr1, expr2, 0);
  }
+
+/* Return true if EXPR1 and EXPR2 have the bitwise opposite value,
+   but not necessarily same type.
+   The types can differ through nop conversions.  */
+
+static inline bool
+bitwise_inverted_equal_p (tree expr1, tree expr2)
+{
+  STRIP_NOPS (expr1);
+  STRIP_NOPS (expr2);
+  if (expr1 == expr2)
+return false;
+  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
+return false;
+  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
+return wi::to_wide (expr1) == ~wi::to_wide (expr2);
+  if (operand_equal_p (expr1, expr2, 0))
+return false;
+  if (TREE_CODE (expr1) == BIT_NOT_EXPR
+  && bitwise_equal_p (TREE_OPERAND (expr1, 0), expr2))
+return true;
+  if (TREE_CODE (expr2) == BIT_NOT_EXPR
+  && bitwise_equal_p (expr1, TREE_OPERAND (expr2, 0)))
+return true;
+  if (COMPARISON_CLASS_P (expr1)
+  && COMPARISON_CLASS_P (expr2))
+{
+  tree op10 = TREE_OPERAND (expr1, 0);
+  tree op20 = TREE_OPERAND (expr2, 0);
+  if (!operand_equal_p (op10, op20))
+   return false;
+  tree op11 = TREE_OPERAND (expr1, 1);
+  tree op21 = TREE_OPERAND (expr2, 1);
+  if (!operand_equal_p (op11, op21))
+   return false;
+  if (invert_tree_comparison (TREE_CODE (expr1),
+ HONOR_NANS (op10))
+ == TREE_CODE (expr2))
+   return true;


So this is trying to match a == b against a != b, or a < b against a >= 
b, or similar; correct?
Shouldn't this be completed with "crossed" checks, that is match a == b 
against b != a, or a < b against b <= a, etc?  Or is there some 
canonicalization making that redundant?


I have given up determining whether these cases were already covered by 
the test or not.


Mikael




Re: PING^2 [PATCH] mklog: fix bugs of --append option

2023-08-03 Thread Lehua Ding
Gentle PING^2, thanks!

[COMMITTED] ada: Fix spurious error on 'Input of private type with Type_Invariant aspect

2023-08-03 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

The problem is that it is necessary to break the privacy during the
expansion of the Input attribute, which may introduce a view mismatch
with the parameter of the routine checking the invariant of the type.

gcc/ada/

* exp_util.adb (Make_Invariant_Call): Convert the expression to
the type of the formal parameter if need be.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_util.adb | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index 9f843d6d71e..a4b5ec366f3 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -9928,11 +9928,16 @@ package body Exp_Util is
-
 
function Make_Invariant_Call (Expr : Node_Id) return Node_Id is
-  Loc : constant Source_Ptr := Sloc (Expr);
-  Typ : constant Entity_Id  := Base_Type (Etype (Expr));
+  Loc  : constant Source_Ptr := Sloc (Expr);
+  Typ  : constant Entity_Id  := Base_Type (Etype (Expr));
   pragma Assert (Has_Invariants (Typ));
-  Proc_Id : constant Entity_Id := Invariant_Procedure (Typ);
+  Proc_Id  : constant Entity_Id := Invariant_Procedure (Typ);
   pragma Assert (Present (Proc_Id));
+  Inv_Typ  : constant Entity_Id
+   := Base_Type (Etype (First_Formal (Proc_Id)));
+
+  Arg : Node_Id;
+
begin
   --  The invariant procedure has a null body if assertions are disabled or
   --  Assertion_Policy Ignore is in effect. In that case, generate a null
@@ -9940,11 +9945,21 @@ package body Exp_Util is
 
   if Has_Null_Body (Proc_Id) then
  return Make_Null_Statement (Loc);
+
   else
+ --  As done elsewhere, for example in Build_Initialization_Call, we
+ --  may need to bridge the gap between views of the type.
+
+ if Inv_Typ /= Typ then
+Arg := OK_Convert_To (Inv_Typ, Expr);
+ else
+Arg := Relocate_Node (Expr);
+ end if;
+
  return
Make_Procedure_Call_Statement (Loc,
  Name   => New_Occurrence_Of (Proc_Id, Loc),
- Parameter_Associations => New_List (Relocate_Node (Expr)));
+ Parameter_Associations => New_List (Arg));
   end if;
end Make_Invariant_Call;
 
-- 
2.40.0



[COMMITTED] ada: Adjust again address arithmetics in System.Dwarf_Lines

2023-08-03 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

Using the operator of System.Storage_Elements has introduced a range check
that may be tripped on, so this removes the intermediate conversion to the
Storage_Count subtype that is responsible for it.

gcc/ada/

* libgnat/s-dwalin.adb ("-"): New subtraction operator.
(Enable_Cache): Use it to compute the offset.
(Symbolic_Address): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-dwalin.adb | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/libgnat/s-dwalin.adb b/gcc/ada/libgnat/s-dwalin.adb
index d35d03a8a2f..405b5d32e24 100644
--- a/gcc/ada/libgnat/s-dwalin.adb
+++ b/gcc/ada/libgnat/s-dwalin.adb
@@ -46,6 +46,10 @@ package body System.Dwarf_Lines is
 
subtype Offset is Object_Reader.Offset;
 
+   function "-" (Left, Right : Address) return uint32;
+   pragma Import (Intrinsic, "-");
+   --  Return the difference between two addresses as an unsigned offset
+
function Get_Load_Displacement (C : Dwarf_Context) return Storage_Offset;
--  Return the displacement between the load address present in the binary
--  and the run-time address at which it is loaded (i.e. non-zero for PIE).
@@ -1542,7 +1546,7 @@ package body System.Dwarf_Lines is
exit when Ar_Start = Null_Address and Ar_Len = 0;
 
Len   := uint32 (Ar_Len);
-   Start := uint32 (Storage_Count'(Ar_Start - C.Low));
+   Start := uint32'(Ar_Start - C.Low);
 
--  Search START in the array
 
@@ -1762,7 +1766,7 @@ package body System.Dwarf_Lines is
 
   if C.Cache /= null then
  declare
-Off : constant uint32 := uint32 (Storage_Count'(Addr - C.Low));
+Off : constant uint32 := uint32'(Addr - C.Low);
 
 First, Last, Mid : Natural;
  begin
-- 
2.40.0



[COMMITTED] ada: Add pragma Annotate for GNATcheck exemptions

2023-08-03 Thread Marc Poulhiès via Gcc-patches
From: Sheri Bernstein 

Exempt the GNATcheck rule "Improper_Returns" with the rationale
"early returns for performance".

gcc/ada/

* libgnat/s-aridou.adb: Add pragma to exempt Improper_Returns.
* libgnat/s-atopri.adb (Lock_Free_Try_Write): Likewise.
* libgnat/s-bitops.adb (Bit_Eq): Likewise.
* libgnat/s-carsi8.adb: Likewise.
* libgnat/s-carun8.adb: Likewise.
* libgnat/s-casi16.adb: Likewise.
* libgnat/s-casi32.adb: Likewise.
* libgnat/s-casi64.adb: Likewise.
* libgnat/s-caun16.adb: Likewise.
* libgnat/s-caun32.adb: Likewise.
* libgnat/s-caun64.adb: Likewise.
* libgnat/s-exponn.adb: Likewise.
* libgnat/s-expont.adb: Likewise.
* libgnat/s-valspe.adb: Likewise.
* libgnat/s-vauspe.adb: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-aridou.adb | 4 
 gcc/ada/libgnat/s-atopri.adb | 5 +
 gcc/ada/libgnat/s-bitops.adb | 5 +
 gcc/ada/libgnat/s-carsi8.adb | 4 
 gcc/ada/libgnat/s-carun8.adb | 4 
 gcc/ada/libgnat/s-casi16.adb | 4 
 gcc/ada/libgnat/s-casi32.adb | 4 
 gcc/ada/libgnat/s-casi64.adb | 4 
 gcc/ada/libgnat/s-caun16.adb | 4 
 gcc/ada/libgnat/s-caun32.adb | 4 
 gcc/ada/libgnat/s-caun64.adb | 4 
 gcc/ada/libgnat/s-exponn.adb | 5 +
 gcc/ada/libgnat/s-expont.adb | 5 +
 gcc/ada/libgnat/s-valspe.adb | 5 +
 gcc/ada/libgnat/s-vauspe.adb | 5 +
 15 files changed, 66 insertions(+)

diff --git a/gcc/ada/libgnat/s-aridou.adb b/gcc/ada/libgnat/s-aridou.adb
index 2f1fbd55453..beb56bfabe1 100644
--- a/gcc/ada/libgnat/s-aridou.adb
+++ b/gcc/ada/libgnat/s-aridou.adb
@@ -90,6 +90,9 @@ is
  (On, "non-preelaborable call not allowed in preelaborated unit");
pragma Warnings (On, "non-static constant in preelaborated unit");
 
+   pragma Annotate (Gnatcheck, Exempt_On, "Improper_Returns",
+"early returns for performance");
+
---
-- Local Subprograms --
---
@@ -3653,4 +3656,5 @@ is
   end if;
end To_Pos_Int;
 
+   pragma Annotate (Gnatcheck, Exempt_Off, "Improper_Returns");
 end System.Arith_Double;
diff --git a/gcc/ada/libgnat/s-atopri.adb b/gcc/ada/libgnat/s-atopri.adb
index 9e23fa0ac91..5fc2a123a71 100644
--- a/gcc/ada/libgnat/s-atopri.adb
+++ b/gcc/ada/libgnat/s-atopri.adb
@@ -59,6 +59,9 @@ package body System.Atomic_Primitives is
 new Atomic_Compare_Exchange (Atomic_Type);
 
begin
+  pragma Annotate (Gnatcheck, Exempt_On, "Improper_Returns",
+   "early returns for performance");
+
   if Expected /= Desired then
  if Atomic_Type'Atomic_Always_Lock_Free then
 return My_Atomic_Compare_Exchange (Ptr, Expected'Address, Desired);
@@ -68,6 +71,8 @@ package body System.Atomic_Primitives is
   end if;
 
   return True;
+
+  pragma Annotate (Gnatcheck, Exempt_Off, "Improper_Returns");
end Lock_Free_Try_Write;
 
 end System.Atomic_Primitives;
diff --git a/gcc/ada/libgnat/s-bitops.adb b/gcc/ada/libgnat/s-bitops.adb
index 30699d73175..acddd52892c 100644
--- a/gcc/ada/libgnat/s-bitops.adb
+++ b/gcc/ada/libgnat/s-bitops.adb
@@ -112,6 +112,9 @@ package body System.Bit_Ops is
   RightB : constant Bits := To_Bits (Right);
 
begin
+  pragma Annotate (Gnatcheck, Exempt_On, "Improper_Returns",
+   "early returns for performance");
+
   if Llen /= Rlen then
  return False;
 
@@ -134,6 +137,8 @@ package body System.Bit_Ops is
 end if;
  end;
   end if;
+
+  pragma Annotate (Gnatcheck, Exempt_Off, "Improper_Returns");
end Bit_Eq;
 
-
diff --git a/gcc/ada/libgnat/s-carsi8.adb b/gcc/ada/libgnat/s-carsi8.adb
index 807dceefc58..839f157a2ee 100644
--- a/gcc/ada/libgnat/s-carsi8.adb
+++ b/gcc/ada/libgnat/s-carsi8.adb
@@ -58,6 +58,9 @@ package body System.Compare_Array_Signed_8 is
function To_Big_Bytes is new
  Ada.Unchecked_Conversion (System.Address, Big_Bytes_Ptr);
 
+   pragma Annotate (Gnatcheck, Exempt_On, "Improper_Returns",
+"early returns for performance");
+
--
-- Compare_Array_S8 --
--
@@ -147,4 +150,5 @@ package body System.Compare_Array_Signed_8 is
   end if;
end Compare_Array_S8_Unaligned;
 
+   pragma Annotate (Gnatcheck, Exempt_Off, "Improper_Returns");
 end System.Compare_Array_Signed_8;
diff --git a/gcc/ada/libgnat/s-carun8.adb b/gcc/ada/libgnat/s-carun8.adb
index b0f2d94bf8a..b20e4e1b922 100644
--- a/gcc/ada/libgnat/s-carun8.adb
+++ b/gcc/ada/libgnat/s-carun8.adb
@@ -57,6 +57,9 @@ package body System.Compare_Array_Unsigned_8 is
function To_Big_Bytes is new
  Ada.Unchecked_Conversion (System.Address, Big_Bytes_Ptr);
 
+   pragma Annotate (Gnatcheck, Exempt_On, "Improper_Returns",
+"early returns for performance");
+
--
-- Compare_Arra

[COMMITTED] ada: Rewrite Set_Image_*_Unsigned routines to remove recursion.

2023-08-03 Thread Marc Poulhiès via Gcc-patches
From: Vasiliy Fofanov 

This rewriting removes algorithm inefficiencies due to unnecessary
recursion and copying. The new version has much smaller and statically known
stack requirements and is additionally up to 2x faster.

gcc/ada/

* libgnat/s-imageb.adb (Set_Image_Based_Unsigned): Rewritten.
* libgnat/s-imagew.adb (Set_Image_Width_Unsigned): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-imageb.adb | 71 --
 gcc/ada/libgnat/s-imagew.adb | 84 
 2 files changed, 55 insertions(+), 100 deletions(-)

diff --git a/gcc/ada/libgnat/s-imageb.adb b/gcc/ada/libgnat/s-imageb.adb
index 6aa311a13e5..037f15b58c7 100644
--- a/gcc/ada/libgnat/s-imageb.adb
+++ b/gcc/ada/libgnat/s-imageb.adb
@@ -88,68 +88,53 @@ package body System.Image_B is
   S : out String;
   P : in out Natural)
is
-  Start : constant Natural := P;
-  F, T  : Natural;
+  Start : constant Natural := P + 1;
   BU: constant Uns := Uns (B);
   Hex   : constant array
 (Uns range 0 .. 15) of Character := "0123456789ABCDEF";
 
-  procedure Set_Digits (T : Uns);
-  --  Set digits of absolute value of T
+  Nb_Digits : Natural := 1;
+  T : Uns := V;
 
-  
-  -- Set_Digits --
-  
+   begin
 
-  procedure Set_Digits (T : Uns) is
-  begin
- if T >= BU then
-Set_Digits (T / BU);
-P := P + 1;
-S (P) := Hex (T mod BU);
- else
-P := P + 1;
-S (P) := Hex (T);
- end if;
-  end Set_Digits;
+  --  First we compute the number of characters needed for representing
+  --  the number.
+  loop
+ T := T / BU;
+ exit when T = 0;
+ Nb_Digits := Nb_Digits + 1;
+  end loop;
 
-   --  Start of processing for Set_Image_Based_Unsigned
+  P := Start;
 
-   begin
+  --  Pad S with spaces up to W reduced by Nb_Digits plus extra 3-4
+  --  characters needed for displaying the base.
+  while P < Start + W - Nb_Digits - 3 - B / 10 loop
+ S (P) := ' ';
+ P := P + 1;
+  end loop;
 
   if B >= 10 then
- P := P + 1;
  S (P) := '1';
+ P := P + 1;
   end if;
 
+  S (P) := Hex (BU mod 10);
   P := P + 1;
-  S (P) := Character'Val (Character'Pos ('0') + B mod 10);
 
-  P := P + 1;
   S (P) := '#';
-
-  Set_Digits (V);
-
   P := P + 1;
-  S (P) := '#';
-
-  --  Add leading spaces if required by width parameter
-
-  if P - Start < W then
- F := P;
- P := Start + W;
- T := P;
 
- while F > Start loop
-S (T) := S (F);
-T := T - 1;
-F := F - 1;
- end loop;
+  --  We now populate digits from the end of the value to the beginning
+  T := V;
+  for J in reverse P .. P + Nb_Digits - 1 loop
+ S (J) := Hex (T mod BU);
+ T := T / BU;
+  end loop;
 
- for J in Start + 1 .. T loop
-S (J) := ' ';
- end loop;
-  end if;
+  P := P + Nb_Digits;
+  S (P) := '#';
 
end Set_Image_Based_Unsigned;
 
diff --git a/gcc/ada/libgnat/s-imagew.adb b/gcc/ada/libgnat/s-imagew.adb
index 00b63eb87d6..28ba37ced1e 100644
--- a/gcc/ada/libgnat/s-imagew.adb
+++ b/gcc/ada/libgnat/s-imagew.adb
@@ -86,66 +86,36 @@ package body System.Image_W is
   S : out String;
   P : in out Natural)
is
-  Start : constant Natural := P;
-  F, T  : Natural;
-
-  procedure Set_Digits (T : Uns);
-  --  Set digits of absolute value of T
-
-  
-  -- Set_Digits --
-  
-
-  procedure Set_Digits (T : Uns) is
-  begin
- if T >= 10 then
-Set_Digits (T / 10);
-pragma Assert (P >= (S'First - 1) and P < S'Last and
-   P < Natural'Last);
---  No check is done since, as documented in the specification,
---  the caller guarantees that S is long enough to hold the result.
-P := P + 1;
-S (P) := Character'Val (T mod 10 + Character'Pos ('0'));
-
- else
-pragma Assert (P >= (S'First - 1) and P < S'Last and
-   P < Natural'Last);
---  No check is done since, as documented in the specification,
---  the caller guarantees that S is long enough to hold the result.
-P := P + 1;
-S (P) := Character'Val (T + Character'Pos ('0'));
- end if;
-  end Set_Digits;
-
-   --  Start of processing for Set_Image_Width_Unsigned
+  Start : constant Natural := P + 1;
+  Nb_Digits : Natural := 1;
+  T : Uns := V;
 
begin
-  Set_Digits (V);
-
-  --  Add leading spaces if required by width parameter
-
-  if P - Start < W then
- F := P;
- P := P + (W - (P 

[PATCH] tree-optimization/110702 - avoid zero-based memory references in IVOPTs

2023-08-03 Thread Richard Biener via Gcc-patches
Sometimes IVOPTs chooses a weird induction variable which downstream
leads to issues.  Most of the times we can fend those off during costing
by rejecting the candidate but it looks like the address description
costing synthesizes is different from what we end up generating so
the following fixes things up at code generation time.  Specifically
we avoid the create_mem_ref_raw fallback which uses a literal zero
address base with the actual base in index2.  For the case in question
we have the address

  type = unsigned long
  offset = 0
  elements = {
[0] = &e * -3,
[1] = (sizetype) a.9_30 * 232,
[2] = ivtmp.28_44 * 4
  }

from which we code generate the problematical

  _3 = MEM[(long int *)0B + ivtmp.36_9 + ivtmp.28_44 * 4];

which references the object at address zero.  The patch below
recognizes the fallback after the fact and transforms the
TARGET_MEM_REF memory reference into a LEA for which this form
isn't problematic:

  _24 = &MEM[(long int *)0B + ivtmp.36_34 + ivtmp.28_44 * 4];
  _3 = *_24;

hereby avoiding the correctness issue.  We'd later conclude the
program terminates at the null pointer dereference and make the
function pure, miscompling the main function of the testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk
sofar.

Richard.

PR tree-optimization/110702
* tree-ssa-loop-ivopts.cc (rewrite_use_address): When
we created a NULL pointer based access rewrite that to
a LEA.

* gcc.dg/torture/pr110702.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr110702.c | 31 +
 gcc/tree-ssa-loop-ivopts.cc | 17 +-
 2 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110702.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110702.c 
b/gcc/testsuite/gcc.dg/torture/pr110702.c
new file mode 100644
index 000..aab9c7d923e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110702.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+
+void abort (void);
+
+int a, b, c, d;
+long e[9][7][4];
+
+void f()
+{
+  for (; a >= 0; a--)
+{
+  b = 0;
+  for (; b <= 3; b++)
+   {
+ c = 0;
+ for (; c <= 3; c++)
+   {
+ int *g = &d;
+ *g = e[0][0][b] | e[a][b][a];
+   }
+   }
+}
+}
+
+int main()
+{
+  f();
+  if (a != -1)
+abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 92fc1c7d734..934897af691 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -7630,7 +7630,22 @@ rewrite_use_address (struct ivopts_data *data,
  true, GSI_SAME_STMT);
 }
   else
-copy_ref_info (ref, *use->op_p);
+{
+  /* When we end up confused enough and have no suitable base but
+stuffed everything to index2 use a LEA for the address and
+create a plain MEM_REF to avoid basing a memory reference
+on address zero which create_mem_ref_raw does as fallback.  */
+  if (TREE_CODE (ref) == TARGET_MEM_REF
+ && TMR_INDEX2 (ref) != NULL_TREE
+ && integer_zerop (TREE_OPERAND (ref, 0)))
+   {
+ ref = fold_build1 (ADDR_EXPR, TREE_TYPE (TREE_OPERAND (ref, 0)), ref);
+ ref = force_gimple_operand_gsi (&bsi, ref, true, NULL_TREE,
+ true, GSI_SAME_STMT);
+ ref = build2 (MEM_REF, type, ref, build_zero_cst (alias_ptr_type));
+   }
+  copy_ref_info (ref, *use->op_p);
+}
 
   *use->op_p = ref;
 }
-- 
2.35.3


RE: [PATCH]AArch64 Undo vec_widen_shiftl optabs [PR106346]

2023-08-03 Thread Tamar Christina via Gcc-patches
> > +
> > +(define_constraint "D3"
> > +  "@internal
> > + A constraint that matches vector of immediates that is with 0 to
> > +(bits(mode)/2)-1."
> > + (and (match_code "const,const_vector")
> > +  (match_test "aarch64_const_vec_all_same_in_range_p (op, 0,
> > +   (GET_MODE_UNIT_BITSIZE (mode) / 2) - 1)")))
> 
> Having this mapping for D2 and D3, with D2 corresponded to prec/2, kind-of
> makes D3 a false mnemonic.  How about DL instead?  (L for "left-shift long" or
> "low-part", take your pick)
> 
> Looks good otherwise.
> 

Wasn't sure if this was an ok with changes or not, so here's the final patch 😊

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/106346
* config/aarch64/aarch64-simd.md (vec_widen_shiftl_lo_,
vec_widen_shiftl_hi_): Remove.
(aarch64_shll_internal): Renamed to...
(aarch64_shll): .. This.
(aarch64_shll2_internal): Renamed to...
(aarch64_shll2): .. This.
(aarch64_shll_n, aarch64_shll2_n): Re-use new
optabs.
* config/aarch64/constraints.md (D2, DL): New.
* config/aarch64/predicates.md (aarch64_simd_shll_imm_vec): New.

gcc/testsuite/ChangeLog:

PR target/106346
* gcc.target/aarch64/pr98772.c: Adjust assembly.
* gcc.target/aarch64/vect-widen-shift.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
d95394101470446e55f25a2397dd112239b6a54d..f67eb70577d0c2d9911d8c867d38a4d0b390337c
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -6387,105 +6387,67 @@ (define_insn "aarch64_qshl"
   [(set_attr "type" "neon_sat_shift_reg")]
 )
 
-(define_expand "vec_widen_shiftl_lo_"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec: [(match_operand:VQW 1 "register_operand" "w")
-(match_operand:SI 2
-  "aarch64_simd_shift_imm_bitsize_" "i")]
-VSHLL))]
-  "TARGET_SIMD"
-  {
-rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
-emit_insn (gen_aarch64_shll_internal (operands[0], operands[1],
-p, operands[2]));
-DONE;
-  }
-)
-
-(define_expand "vec_widen_shiftl_hi_"
-   [(set (match_operand: 0 "register_operand")
-   (unspec: [(match_operand:VQW 1 "register_operand" "w")
-(match_operand:SI 2
-  "immediate_operand" "i")]
- VSHLL))]
-   "TARGET_SIMD"
-   {
-rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
-emit_insn (gen_aarch64_shll2_internal (operands[0], operands[1],
- p, operands[2]));
-DONE;
-   }
-)
-
 ;; vshll_n
 
-(define_insn "aarch64_shll_internal"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec: [(vec_select:
-   (match_operand:VQW 1 "register_operand" "w")
-   (match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
-(match_operand:SI 3
-  "aarch64_simd_shift_imm_bitsize_" "i")]
-VSHLL))]
+(define_insn "aarch64_shll"
+  [(set (match_operand: 0 "register_operand")
+   (ashift: (ANY_EXTEND:
+   (match_operand:VD_BHSI 1 "register_operand"))
+(match_operand: 2
+  "aarch64_simd_shll_imm_vec")))]
   "TARGET_SIMD"
-  {
-if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
-  return "shll\\t%0., %1., %3";
-else
-  return "shll\\t%0., %1., %3";
+  {@ [cons: =0, 1, 2]
+ [w, w, D2] shll\t%0., %1., %I2
+ [w, w, DL] shll\t%0., %1., %I2
   }
   [(set_attr "type" "neon_shift_imm_long")]
 )
 
-(define_insn "aarch64_shll2_internal"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec: [(vec_select:
-   (match_operand:VQW 1 "register_operand" "w")
-   (match_operand:VQW 2 "vect_par_cnst_hi_half" ""))
-(match_operand:SI 3
-  "aarch64_simd_shift_imm_bitsize_" "i")]
+(define_expand "aarch64_shll_n"
+  [(set (match_operand: 0 "register_operand")
+   (unspec: [(match_operand:VD_BHSI 1 "register_operand")
+(match_operand:SI 2
+  "aarch64_simd_shift_imm_bitsize_")]
 VSHLL))]
   "TARGET_SIMD"
   {
-if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
-  return "shll2\\t%0., %1., %3";
-else
-  return "shll2\\t%0., %1., %3";
+rtx shft = gen_const_vec_duplicate (mode, operands[2]);
+emit_insn (gen_aarch64_shll (operands[0], operands[1], shft));
+DONE;
   }
-  [(set_attr "type" "neon_shift_imm_long")]
 )
 
-(define_insn "aarch64_shll_n"
-  [(set (match_oper

[PATCH] poly_int: Handle more can_div_trunc_p cases

2023-08-03 Thread Richard Sandiford via Gcc-patches
can_div_trunc_p (a, b, &Q, &r) tries to compute a Q and r that
satisfy the usual conditions for truncating division:

 (1) a = b * Q + r
 (2) |b * Q| <= |a|
 (3) |r| < |b|

We can compute Q using the constant component (the case when
all indeterminates are zero).  Since |r| < |b| for the constant
case, the requirements for indeterminate xi with coefficients
ai (for a) and bi (for b) are:

 (2') |bi * Q| <= |ai|
 (3') |ai - bi * Q| <= |bi|

(See the big comment for more details, restrictions, and reasoning).

However, the function works on abstract arithmetic types, and so
it has to be careful not to introduce new overflow.  The code
therefore only handled the extreme for (3'), that is:

 |ai - bi * Q| = |bi|

for the case where Q is zero.

Looking at it again, the overflow issue is a bit easier to handle than
I'd originally thought (or so I hope).  This patch therefore extends the
code to handle |ai - bi * Q| = |bi| for all Q, with Q = 0 no longer
being a separate case.

The net effect is to allow the function to succeed for things like:

 (a0 + b1 (Q+1) x) / (b0 + b1 x)

where Q = a0 / b0, with various sign conditions.  E.g. we now handle:

 (7 + 8x) / (4 + 4x)

with Q = 1 and r = 3 + 4x,

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
* poly-int.h (can_div_trunc_p): Succeed for more boundary conditions.

gcc/testsuite/
* gcc.dg/plugin/poly-int-tests.h (test_can_div_trunc_p_const)
(test_can_div_trunc_p_const): Add more tests.
---
 gcc/poly-int.h   | 45 ++-
 gcc/testsuite/gcc.dg/plugin/poly-int-tests.h | 85 +---
 2 files changed, 98 insertions(+), 32 deletions(-)

diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index 12571455081..7bff5e5ad26 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -2355,28 +2355,31 @@ can_div_trunc_p (const poly_int_pod &a,
}
   else
{
- if (q == 0)
-   {
- /* For Q == 0 we simply need: (3') |ai| <= |bi|.  */
- if (a.coeffs[i] != ICa (0))
-   {
- /* Use negative absolute to avoid overflow, i.e.
--|ai| >= -|bi|.  */
- C neg_abs_a = (a.coeffs[i] < 0 ? a.coeffs[i] : -a.coeffs[i]);
- C neg_abs_b = (b.coeffs[i] < 0 ? b.coeffs[i] : -b.coeffs[i]);
- if (neg_abs_a < neg_abs_b)
-   return false;
- rem_p = true;
-   }
-   }
+ /* The only unconditional arithmetic that we can do on ai,
+bi and Q is ai / bi and ai % bi.  (ai == minimum int and
+bi == -1 would be UB in the caller.)  Anything else runs
+the risk of overflow.  */
+ auto qi = NCa (a.coeffs[i]) / NCb (b.coeffs[i]);
+ auto ri = NCa (a.coeffs[i]) % NCb (b.coeffs[i]);
+ /* (2') and (3') are satisfied when ai /[trunc] bi == q.
+So is the stricter condition |ai - bi * Q| < |bi|.  */
+ if (qi == q)
+   rem_p |= (ri != 0);
+ /* The only other case is when:
+
+|bi * Q| + |bi| = |ai| (for (2'))
+and |ai - bi * Q|   = |bi| (for (3'))
+
+The first is equivalent to |bi|(|Q| + 1) == |ai|.
+The second requires ai == bi * (Q + 1) or ai == bi * (Q - 1).  */
+ else if (ri != 0)
+   return false;
+ else if (q <= 0 && qi < q && qi + 1 == q)
+   ;
+ else if (q >= 0 && qi > q && qi - 1 == q)
+   ;
  else
-   {
- /* Otherwise just check for the case in which ai / bi == Q.  */
- if (NCa (a.coeffs[i]) / NCb (b.coeffs[i]) != q)
-   return false;
- if (NCa (a.coeffs[i]) % NCb (b.coeffs[i]) != 0)
-   rem_p = true;
-   }
+   return false;
}
 }
 
diff --git a/gcc/testsuite/gcc.dg/plugin/poly-int-tests.h 
b/gcc/testsuite/gcc.dg/plugin/poly-int-tests.h
index 0b89acd91cd..7af98595a5e 100644
--- a/gcc/testsuite/gcc.dg/plugin/poly-int-tests.h
+++ b/gcc/testsuite/gcc.dg/plugin/poly-int-tests.h
@@ -1899,14 +1899,19 @@ test_can_div_trunc_p_const ()
ph::make (4, 8, 12),
&const_quot));
   ASSERT_EQ (const_quot, C (2));
-  ASSERT_EQ (can_div_trunc_p (ph::make (15, 25, 40),
+  ASSERT_TRUE (can_div_trunc_p (ph::make (15, 25, 40),
+   ph::make (4, 8, 10),
+   &const_quot));
+  ASSERT_EQ (const_quot, C (3));
+  const_quot = 0;
+  ASSERT_EQ (can_div_trunc_p (ph::make (15, 25, 41),
  ph::make (4, 8, 10),
  &const_quot), N <= 2);
-  ASSERT_EQ (const_quot, C (N <= 2 ? 3 : 2));
+  ASSERT_EQ (const_quot, C (N <= 2 ? 3 : 0));
   ASSERT_EQ (can_div_trunc_p (ph::make (43, 79, 80),
  ph::make (4, 8, 10),
  &const_quot), N == 1);
- 

Re: Re: [PATCH V2] RISC-V: Support CALL conditional autovec patterns

2023-08-03 Thread 钟居哲
No. prepare_ternary can not be seperate patch.
It's a bug fix patch which is discovered in autovectorization.

Thanks for comments. I will commit it when middle-end is approved by Richi.

>> As to the lmul = 8 ICE, is the problem that the register allocator
>> would actually need 5 "registers" when doing the merge by itself
>> and we only have 4?
Yes.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-03 19:03
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Support CALL conditional autovec patterns
Hi Juzhe,
 
I would find it a bit clearer if the prepare_ternay part were a
separate patch.  As it's mostly mechanical replacements I don't
mind too much, though so it's LGTM from my side without that.
 
As to the lmul = 8 ICE, is the problem that the register allocator
would actually need 5 "registers" when doing the merge by itself
and we only have 4?
 
Regards
Robin
 


Re: [PATCH] poly_int: Handle more can_div_trunc_p cases

2023-08-03 Thread Richard Biener via Gcc-patches
On Thu, Aug 3, 2023 at 2:46 PM Richard Sandiford via Gcc-patches
 wrote:
>
> can_div_trunc_p (a, b, &Q, &r) tries to compute a Q and r that
> satisfy the usual conditions for truncating division:
>
>  (1) a = b * Q + r
>  (2) |b * Q| <= |a|
>  (3) |r| < |b|
>
> We can compute Q using the constant component (the case when
> all indeterminates are zero).  Since |r| < |b| for the constant
> case, the requirements for indeterminate xi with coefficients
> ai (for a) and bi (for b) are:
>
>  (2') |bi * Q| <= |ai|
>  (3') |ai - bi * Q| <= |bi|
>
> (See the big comment for more details, restrictions, and reasoning).
>
> However, the function works on abstract arithmetic types, and so
> it has to be careful not to introduce new overflow.  The code
> therefore only handled the extreme for (3'), that is:
>
>  |ai - bi * Q| = |bi|
>
> for the case where Q is zero.
>
> Looking at it again, the overflow issue is a bit easier to handle than
> I'd originally thought (or so I hope).  This patch therefore extends the
> code to handle |ai - bi * Q| = |bi| for all Q, with Q = 0 no longer
> being a separate case.
>
> The net effect is to allow the function to succeed for things like:
>
>  (a0 + b1 (Q+1) x) / (b0 + b1 x)
>
> where Q = a0 / b0, with various sign conditions.  E.g. we now handle:
>
>  (7 + 8x) / (4 + 4x)
>
> with Q = 1 and r = 3 + 4x,
>
> Tested on aarch64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> gcc/
> * poly-int.h (can_div_trunc_p): Succeed for more boundary conditions.
>
> gcc/testsuite/
> * gcc.dg/plugin/poly-int-tests.h (test_can_div_trunc_p_const)
> (test_can_div_trunc_p_const): Add more tests.
> ---
>  gcc/poly-int.h   | 45 ++-
>  gcc/testsuite/gcc.dg/plugin/poly-int-tests.h | 85 +---
>  2 files changed, 98 insertions(+), 32 deletions(-)
>
> diff --git a/gcc/poly-int.h b/gcc/poly-int.h
> index 12571455081..7bff5e5ad26 100644
> --- a/gcc/poly-int.h
> +++ b/gcc/poly-int.h
> @@ -2355,28 +2355,31 @@ can_div_trunc_p (const poly_int_pod &a,
> }
>else
> {
> - if (q == 0)
> -   {
> - /* For Q == 0 we simply need: (3') |ai| <= |bi|.  */
> - if (a.coeffs[i] != ICa (0))
> -   {
> - /* Use negative absolute to avoid overflow, i.e.
> --|ai| >= -|bi|.  */
> - C neg_abs_a = (a.coeffs[i] < 0 ? a.coeffs[i] : 
> -a.coeffs[i]);
> - C neg_abs_b = (b.coeffs[i] < 0 ? b.coeffs[i] : 
> -b.coeffs[i]);
> - if (neg_abs_a < neg_abs_b)
> -   return false;
> - rem_p = true;
> -   }
> -   }
> + /* The only unconditional arithmetic that we can do on ai,
> +bi and Q is ai / bi and ai % bi.  (ai == minimum int and
> +bi == -1 would be UB in the caller.)  Anything else runs
> +the risk of overflow.  */
> + auto qi = NCa (a.coeffs[i]) / NCb (b.coeffs[i]);
> + auto ri = NCa (a.coeffs[i]) % NCb (b.coeffs[i]);
> + /* (2') and (3') are satisfied when ai /[trunc] bi == q.
> +So is the stricter condition |ai - bi * Q| < |bi|.  */
> + if (qi == q)
> +   rem_p |= (ri != 0);
> + /* The only other case is when:
> +
> +|bi * Q| + |bi| = |ai| (for (2'))
> +and |ai - bi * Q|   = |bi| (for (3'))
> +
> +The first is equivalent to |bi|(|Q| + 1) == |ai|.
> +The second requires ai == bi * (Q + 1) or ai == bi * (Q - 1).  */
> + else if (ri != 0)
> +   return false;
> + else if (q <= 0 && qi < q && qi + 1 == q)
> +   ;
> + else if (q >= 0 && qi > q && qi - 1 == q)
> +   ;
>   else
> -   {
> - /* Otherwise just check for the case in which ai / bi == Q.  */
> - if (NCa (a.coeffs[i]) / NCb (b.coeffs[i]) != q)
> -   return false;
> - if (NCa (a.coeffs[i]) % NCb (b.coeffs[i]) != 0)
> -   rem_p = true;
> -   }
> +   return false;
> }
>  }
>
> diff --git a/gcc/testsuite/gcc.dg/plugin/poly-int-tests.h 
> b/gcc/testsuite/gcc.dg/plugin/poly-int-tests.h
> index 0b89acd91cd..7af98595a5e 100644
> --- a/gcc/testsuite/gcc.dg/plugin/poly-int-tests.h
> +++ b/gcc/testsuite/gcc.dg/plugin/poly-int-tests.h
> @@ -1899,14 +1899,19 @@ test_can_div_trunc_p_const ()
> ph::make (4, 8, 12),
> &const_quot));
>ASSERT_EQ (const_quot, C (2));
> -  ASSERT_EQ (can_div_trunc_p (ph::make (15, 25, 40),
> +  ASSERT_TRUE (can_div_trunc_p (ph::make (15, 25, 40),
> +   ph::make (4, 8, 10),
> +   &const_quot));
> +  ASSERT_EQ (const_quot, C (3));
> +  const_quot = 0;
> +  ASSERT_EQ (can_div_trunc_p (ph::make (15, 25, 41),
> 

Re: [PATCH]AArch64 Undo vec_widen_shiftl optabs [PR106346]

2023-08-03 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> > +
>> > +(define_constraint "D3"
>> > +  "@internal
>> > + A constraint that matches vector of immediates that is with 0 to
>> > +(bits(mode)/2)-1."
>> > + (and (match_code "const,const_vector")
>> > +  (match_test "aarch64_const_vec_all_same_in_range_p (op, 0,
>> > +  (GET_MODE_UNIT_BITSIZE (mode) / 2) - 1)")))
>> 
>> Having this mapping for D2 and D3, with D2 corresponded to prec/2, kind-of
>> makes D3 a false mnemonic.  How about DL instead?  (L for "left-shift long" 
>> or
>> "low-part", take your pick)
>> 
>> Looks good otherwise.
>> 
>
> Wasn't sure if this was an ok with changes or not, so here's the final patch 😊

I was hoping to have another look before it went in.  But...

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?

...yeah, LGTM, thanks.

Richard

> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/106346
>   * config/aarch64/aarch64-simd.md (vec_widen_shiftl_lo_,
>   vec_widen_shiftl_hi_): Remove.
>   (aarch64_shll_internal): Renamed to...
>   (aarch64_shll): .. This.
>   (aarch64_shll2_internal): Renamed to...
>   (aarch64_shll2): .. This.
>   (aarch64_shll_n, aarch64_shll2_n): Re-use new
>   optabs.
>   * config/aarch64/constraints.md (D2, DL): New.
>   * config/aarch64/predicates.md (aarch64_simd_shll_imm_vec): New.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/106346
>   * gcc.target/aarch64/pr98772.c: Adjust assembly.
>   * gcc.target/aarch64/vect-widen-shift.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> d95394101470446e55f25a2397dd112239b6a54d..f67eb70577d0c2d9911d8c867d38a4d0b390337c
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -6387,105 +6387,67 @@ (define_insn 
> "aarch64_qshl"
>[(set_attr "type" "neon_sat_shift_reg")]
>  )
>  
> -(define_expand "vec_widen_shiftl_lo_"
> -  [(set (match_operand: 0 "register_operand" "=w")
> - (unspec: [(match_operand:VQW 1 "register_operand" "w")
> -  (match_operand:SI 2
> -"aarch64_simd_shift_imm_bitsize_" "i")]
> -  VSHLL))]
> -  "TARGET_SIMD"
> -  {
> -rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
> -emit_insn (gen_aarch64_shll_internal (operands[0], 
> operands[1],
> -  p, operands[2]));
> -DONE;
> -  }
> -)
> -
> -(define_expand "vec_widen_shiftl_hi_"
> -   [(set (match_operand: 0 "register_operand")
> - (unspec: [(match_operand:VQW 1 "register_operand" "w")
> -  (match_operand:SI 2
> -"immediate_operand" "i")]
> -   VSHLL))]
> -   "TARGET_SIMD"
> -   {
> -rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
> -emit_insn (gen_aarch64_shll2_internal (operands[0], 
> operands[1],
> -   p, operands[2]));
> -DONE;
> -   }
> -)
> -
>  ;; vshll_n
>  
> -(define_insn "aarch64_shll_internal"
> -  [(set (match_operand: 0 "register_operand" "=w")
> - (unspec: [(vec_select:
> - (match_operand:VQW 1 "register_operand" "w")
> - (match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
> -  (match_operand:SI 3
> -"aarch64_simd_shift_imm_bitsize_" "i")]
> -  VSHLL))]
> +(define_insn "aarch64_shll"
> +  [(set (match_operand: 0 "register_operand")
> + (ashift: (ANY_EXTEND:
> + (match_operand:VD_BHSI 1 "register_operand"))
> +  (match_operand: 2
> +"aarch64_simd_shll_imm_vec")))]
>"TARGET_SIMD"
> -  {
> -if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
> -  return "shll\\t%0., %1., %3";
> -else
> -  return "shll\\t%0., %1., %3";
> +  {@ [cons: =0, 1, 2]
> + [w, w, D2] shll\t%0., %1., %I2
> + [w, w, DL] shll\t%0., %1., %I2
>}
>[(set_attr "type" "neon_shift_imm_long")]
>  )
>  
> -(define_insn "aarch64_shll2_internal"
> -  [(set (match_operand: 0 "register_operand" "=w")
> - (unspec: [(vec_select:
> - (match_operand:VQW 1 "register_operand" "w")
> - (match_operand:VQW 2 "vect_par_cnst_hi_half" ""))
> -  (match_operand:SI 3
> -"aarch64_simd_shift_imm_bitsize_" "i")]
> +(define_expand "aarch64_shll_n"
> +  [(set (match_operand: 0 "register_operand")
> + (unspec: [(match_operand:VD_BHSI 1 "register_operand")
> +  (match_operand:SI 2
> +"aarch64_simd_shift_imm_bitsize_")]
>VSHLL))]
>"TARGET_SIMD"
>{
> -if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
> -  return "shll2\\t%0., %1., %3";
> -

Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-08-03 Thread Richard Biener via Gcc-patches
On Wed, 2 Aug 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> > [...]
> >> >> in vect_determine_precisions_from_range.  Maybe we should drop
> >> >> the shift handling from there and instead rely on
> >> >> vect_determine_precisions_from_users, extending:
> >> >> 
> >> >> if (TREE_CODE (shift) != INTEGER_CST
> >> >> || !wi::ltu_p (wi::to_widest (shift), precision))
> >> >>   return;
> >> >> 
> >> >> to handle ranges where the max is known to be < precision.
> >> >> 
> >> >> There again, if masking is enough for right shifts and right rotates,
> >> >> maybe we should keep the current handling for then (with your fix)
> >> >> and skip the types_compatible_p check for those cases.
> >> >
> >> > I think it should be enough for left-shifts as well?  If we lshift
> >> > out like 0x100 << 9 so the lhs range is [0,0] the input range from
> >> > op0 will still make us use HImode.  I think we only ever get overly
> >> > conservative answers for left-shifts from this function?
> >> 
> >> But if we have:
> >> 
> >>   short x, y;
> >>   int z = (int) x << (int) y;
> >> 
> >> and at runtime, x == 1, y == 16, (short) z should be 0 (no UB),
> >> whereas x << y would invoke UB and x << (y & 15) would be 1.
> >
> > True, but we start with the range of the LHS which in this case
> > would be of type 'int' and thus 1 << 16 and not zero.  You
> > might call that a failure of vect_determine_precisions_from_range
> > of course, since it makes it not exactly a forward propagation ...
> 
> Ah, right, sorry.  I should have done more checking.
> 
> > [...]
> >> > Originally I completely disabled shift support but that regressed
> >> > the over-widen testcases a lot which at least have widened shifts
> >> > by constants a lot.
> >> >
> >> > x86 has vector rotates only for AMD XOP (which is dead) plus
> >> > some for V1TImode AFAICS, but I think we pattern-match rotates
> >> > to shifts, so maybe the precision stuff is interesting for the
> >> > case where we match the pattern rotate sequence for widenings?
> >> >
> >> > So for the types_compatible_p issue something along
> >> > the following?  We could also exempt the shift operand from
> >> > being covered by min_precision so the consumer would have
> >> > to make sure it can be represented (I think that's never going
> >> > to be an issue in practice until we get 256bit integers vectorized).
> >> > It will have to fixup the shift operands anyway.
> >> >
> >> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> >> > index e4ab8c2d65b..cdeeaf98a47 100644
> >> > --- a/gcc/tree-vect-patterns.cc
> >> > +++ b/gcc/tree-vect-patterns.cc
> >> > @@ -6378,16 +6378,26 @@ vect_determine_precisions_from_range 
> >> > (stmt_vec_info stmt_info, gassign *stmt)
> >> >   }
> >> > else if (TREE_CODE (op) == SSA_NAME)
> >> >   {
> >> > -   /* Ignore codes that don't take uniform arguments.  */
> >> > -   if (!types_compatible_p (TREE_TYPE (op), type))
> >> > +   /* Ignore codes that don't take uniform arguments.  For 
> >> > shifts
> >> > +  the shift amount is known to be in-range.  */
> >> 
> >> I guess it's more "we can assume that the amount is in range"?
> >
> > Yes.
> >
> >> > +   if (code == LSHIFT_EXPR
> >> > +   || code == RSHIFT_EXPR
> >> > +   || code == LROTATE_EXPR
> >> > +   || code == RROTATE_EXPR)
> >> > + {
> >> > +   min_value = wi::min (min_value, 0, sign);
> >> > +   max_value = wi::max (max_value, TYPE_PRECISION (type), 
> >> > sign);
> >> 
> >> LGTM for shifts right.  Because of the above lshift thing, I think we
> >> need something like:
> >> 
> >>   if (code == LSHIFT_EXPR || code == LROTATE_EXPR)
> >> {
> >>   wide_int op_min_value, op_max_value;
> >>   if (!vect_get_range_info (op, &op_min_value, op_max_value))
> >> return;
> >> 
> >>   /* We can ignore left shifts by negative amounts, which are UB.  */
> >>   min_value = wi::min (min_value, 0, sign);
> >> 
> >>   /* Make sure the highest non-UB shift amount doesn't become UB.  */
> >>   op_max_value = wi::umin (op_max_value, TYPE_PRECISION (type));
> >>   auto mask = wi::mask (TYPE_PRECISION (type), false,
> >>op_max_value.to_uhwi ());
> >>   max_value = wi::max (max_value, mask, sign);
> >> }
> >> 
> >> Does that look right?
> >
> > As said it looks overly conservative to me?  For example with my patch
> > for
> >
> > void foo (signed char *v, int s)
> > {
> >   if (s < 1 || s > 7)
> > return;
> >   for (int i = 0; i < 1024; ++i)
> > v[i] = v[i] << s;
> > }
> >
> > I get
> >
> > t.c:5:21: note:   _7 has range [0xc000, 0x3f80]
> > t.c:5:21: note:   can narrow to signed:15 without loss of precision: _7 = 
> > _6 << s_12(D);
> > t.c:5:21: note:   only the low 15 bits of _6 are significant
> > t.c:5:21: note:   _6 has range [0xff80, 0x7f]
> > ...
>

RE: [PATCH]AArch64 update costing for MLA by invariant

2023-08-03 Thread Tamar Christina via Gcc-patches
> >> Do you see vect_constant_defs in practice, or is this just for 
> >> completeness?
> >> I would expect any constants to appear as direct operands.  I don't
> >> mind keeping it if it's just a belt-and-braces thing though.
> >
> > In the latency case where I had allow_constants the early rejection
> > based on the operand itself wouldn't be rejected so in that case I
> > still needed to reject them but do so after the multiply check.  While
> > they do appear as direct operands as well they also have their own
> > nodes, in particular for SLP so the constants are handled as a group.
> 
> Ah, OK, thanks.
> 
> > But can also check CONSTANT_CLASS_P (rhs) if that's preferrable.
> 
> No, what you did is more correct.  I just wasn't sure at first which case it 
> was
> handling.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_multiply_add_p): Update handling
of constants. 
(aarch64_adjust_stmt_cost): Use it.
(aarch64_vector_costs::count_ops): Likewise.
(aarch64_vector_costs::add_stmt_cost): Pass vinfo to
aarch64_adjust_stmt_cost.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
d4d7602554592b9042b8eaf389eff1ec80c2090e..7cc5916ce06b2635346c807da9306738b939ebc6
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16410,10 +16410,6 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
   if (code != PLUS_EXPR && code != MINUS_EXPR)
 return false;
 
-  if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
-  || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
-return false;
-
   for (int i = 1; i < 3; ++i)
 {
   tree rhs = gimple_op (assign, i);
@@ -16441,7 +16437,8 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
return false;
  def_stmt_info = vinfo->lookup_def (rhs);
  if (!def_stmt_info
- || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_constant_def)
return false;
}
 
@@ -16721,8 +16718,9 @@ aarch64_sve_adjust_stmt_cost (class vec_info *vinfo, 
vect_cost_for_stmt kind,
and which when vectorized would operate on vector type VECTYPE.  Add the
cost of any embedded operations.  */
 static fractional_cost
-aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, stmt_vec_info stmt_info,
- tree vectype, fractional_cost stmt_cost)
+aarch64_adjust_stmt_cost (vec_info *vinfo, vect_cost_for_stmt kind,
+ stmt_vec_info stmt_info, tree vectype,
+ unsigned vec_flags, fractional_cost stmt_cost)
 {
   if (vectype)
 {
@@ -16745,6 +16743,14 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info,
  break;
}
 
+  gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
+  if (assign && !vect_is_reduction (stmt_info))
+   {
+ /* For MLA we need to reduce the cost since MLA is 1 instruction.  */
+ if (aarch64_multiply_add_p (vinfo, stmt_info, vec_flags))
+   return 0;
+   }
+
   if (kind == vector_stmt || kind == vec_to_scalar)
if (tree cmp_type = vect_embedded_comparison_type (stmt_info))
  {
@@ -16814,7 +16820,8 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
 }
 
   /* Assume that multiply-adds will become a single operation.  */
-  if (stmt_info && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
+  if (stmt_info
+  && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
 return;
 
   /* Count the basic operation cost associated with KIND.  */
@@ -17060,8 +17067,8 @@ aarch64_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
 {
   /* Account for any extra "embedded" costs that apply additively
 to the base cost calculated above.  */
-  stmt_cost = aarch64_adjust_stmt_cost (kind, stmt_info, vectype,
-   stmt_cost);
+  stmt_cost = aarch64_adjust_stmt_cost (m_vinfo, kind, stmt_info,
+   vectype, m_vec_flags, stmt_cost);
 
   /* If we're recording a nonzero vector loop body cost for the
 innermost loop, also estimate the operations that would need


rb17618.patch
Description: rb17618.patch


Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-08-03 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Tue, 25 Jul 2023 at 18:25, Richard Sandiford
>  wrote:
>>
>> Hi,
>>
>> Thanks for the rework and sorry for the slow review.
> Hi Richard,
> Thanks for the suggestions!  Please find my responses inline below.
>>
>> Prathamesh Kulkarni  writes:
>> > Hi Richard,
>> > This is reworking of patch to extend fold_vec_perm to handle VLA vectors.
>> > The attached patch unifies handling of VLS and VLA vector_csts, while
>> > using fallback code
>> > for ctors.
>> >
>> > For VLS vector, the patch ignores underlying encoding, and
>> > uses npatterns = nelts, and nelts_per_pattern = 1.
>> >
>> > For VLA patterns, if sel has a stepped sequence, then it
>> > only chooses elements from a particular pattern of a particular
>> > input vector.
>> >
>> > To make things simpler, the patch imposes following constraints:
>> > (a) op0_npatterns, op1_npatterns and sel_npatterns are powers of 2.
>> > (b) The step size for a stepped sequence is a power of 2, and
>> >   multiple of npatterns of chosen input vector.
>> > (c) Runtime vector length of sel is a multiple of sel_npatterns.
>> >  So, we don't handle sel.length = 2 + 2x and npatterns = 4.
>> >
>> > Eg:
>> > op0, op1: npatterns = 2, nelts_per_pattern = 3
>> > op0_len = op1_len = 16 + 16x.
>> > sel = { 0, 0, 2, 0, 4, 0, ... }
>> > npatterns = 2, nelts_per_pattern = 3.
>> >
>> > For pattern {0, 2, 4, ...}
>> > Let,
>> > a1 = 2
>> > S = step size = 2
>> >
>> > Let Esel denote number of elements per pattern in sel at runtime.
>> > Esel = (16 + 16x) / npatterns_sel
>> > = (16 + 16x) / 2
>> > = (8 + 8x)
>> >
>> > So, last element of pattern:
>> > ae = a1 + (Esel - 2) * S
>> >  = 2 + (8 + 8x - 2) * 2
>> >  = 14 + 16x
>> >
>> > a1 /trunc arg0_len = 2 / (16 + 16x) = 0
>> > ae /trunc arg0_len = (14 + 16x) / (16 + 16x) = 0
>> > Since both are equal with quotient = 0, we select elements from op0.
>> >
>> > Since step size (S) is a multiple of npatterns(op0), we select
>> > all elements from same pattern of op0.
>> >
>> > res_npatterns = max (op0_npatterns, max (op1_npatterns, sel_npatterns))
>> >= max (2, max (2, 2)
>> >= 2
>> >
>> > res_nelts_per_pattern = max (op0_nelts_per_pattern,
>> > max (op1_nelts_per_pattern,
>> >  
>> > sel_nelts_per_pattern))
>> > = max (3, max (3, 3))
>> > = 3
>> >
>> > So res has encoding with npatterns = 2, nelts_per_pattern = 3.
>> > res: { op0[0], op0[0], op0[2], op0[0], op0[4], op0[0], ... }
>> >
>> > Unfortunately, this results in an issue for poly_int_cst index:
>> > For example,
>> > op0, op1: npatterns = 1, nelts_per_pattern = 3
>> > op0_len = op1_len = 4 + 4x
>> >
>> > sel: { 4 + 4x, 5 + 4x, 6 + 4x, ... } // should choose op1
>> >
>> > In this case,
>> > a1 = 5 + 4x
>> > S = (6 + 4x) - (5 + 4x) = 1
>> > Esel = 4 + 4x
>> >
>> > ae = a1 + (esel - 2) * S
>> >  = (5 + 4x) + (4 + 4x - 2) * 1
>> >  = 7 + 8x
>> >
>> > IIUC, 7 + 8x will always be index for last element of op1 ?
>> > if x = 0, len = 4, 7 + 8x = 7
>> > if x = 1, len = 8, 7 + 8x = 15, etc.
>> > So the stepped sequence will always choose elements
>> > from op1 regardless of vector length for above case ?
>> >
>> > However,
>> > ae /trunc op0_len
>> > = (7 + 8x) / (4 + 4x)
>> > which is not defined because 7/4 != 8/4
>> > and we return NULL_TREE, but I suppose the expected result would be:
>> > res: { op1[0], op1[1], op1[2], ... } ?
>> >
>> > The patch passes bootstrap+test on aarch64-linux-gnu with and without sve,
>> > and on x86_64-unknown-linux-gnu.
>> > I would be grateful for suggestions on how to proceed.
>> >
>> > Thanks,
>> > Prathamesh
>> >
>> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
>> > index a02ede79fed..8028b3e8e9a 100644
>> > --- a/gcc/fold-const.cc
>> > +++ b/gcc/fold-const.cc
>> > @@ -85,6 +85,10 @@ along with GCC; see the file COPYING3.  If not see
>> >  #include "vec-perm-indices.h"
>> >  #include "asan.h"
>> >  #include "gimple-range.h"
>> > +#include 
>> > +#include "tree-pretty-print.h"
>> > +#include "gimple-pretty-print.h"
>> > +#include "print-tree.h"
>> >
>> >  /* Nonzero if we are folding constants inside an initializer or a C++
>> > manifestly-constant-evaluated context; zero otherwise.
>> > @@ -10493,15 +10497,9 @@ fold_mult_zconjz (location_t loc, tree type, tree 
>> > expr)
>> >  static bool
>> >  vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts)
>> >  {
>> > -  unsigned HOST_WIDE_INT i, nunits;
>> > +  unsigned HOST_WIDE_INT i;
>> >
>> > -  if (TREE_CODE (arg) == VECTOR_CST
>> > -  && VECTOR_CST_NELTS (arg).is_constant (&nunits))
>> > -{
>> > -  for (i = 0; i < nunits; ++i)
>> > - elts[i] = VECTOR_CST_ELT (arg, i);
>> > -}
>> > -  else if (TREE_CODE (arg) == CONSTRUCTOR)
>> > +  if (TREE_CODE (arg) == CONSTRUCTOR)
>> >

Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-08-03 Thread Richard Sandiford via Gcc-patches
Richard Sandiford  writes:
> Prathamesh Kulkarni  writes:
>> On Tue, 25 Jul 2023 at 18:25, Richard Sandiford
>>  wrote:
>>>
>>> Hi,
>>>
>>> Thanks for the rework and sorry for the slow review.
>> Hi Richard,
>> Thanks for the suggestions!  Please find my responses inline below.
>>>
>>> Prathamesh Kulkarni  writes:
>>> > Hi Richard,
>>> > This is reworking of patch to extend fold_vec_perm to handle VLA vectors.
>>> > The attached patch unifies handling of VLS and VLA vector_csts, while
>>> > using fallback code
>>> > for ctors.
>>> >
>>> > For VLS vector, the patch ignores underlying encoding, and
>>> > uses npatterns = nelts, and nelts_per_pattern = 1.
>>> >
>>> > For VLA patterns, if sel has a stepped sequence, then it
>>> > only chooses elements from a particular pattern of a particular
>>> > input vector.
>>> >
>>> > To make things simpler, the patch imposes following constraints:
>>> > (a) op0_npatterns, op1_npatterns and sel_npatterns are powers of 2.
>>> > (b) The step size for a stepped sequence is a power of 2, and
>>> >   multiple of npatterns of chosen input vector.
>>> > (c) Runtime vector length of sel is a multiple of sel_npatterns.
>>> >  So, we don't handle sel.length = 2 + 2x and npatterns = 4.
>>> >
>>> > Eg:
>>> > op0, op1: npatterns = 2, nelts_per_pattern = 3
>>> > op0_len = op1_len = 16 + 16x.
>>> > sel = { 0, 0, 2, 0, 4, 0, ... }
>>> > npatterns = 2, nelts_per_pattern = 3.
>>> >
>>> > For pattern {0, 2, 4, ...}
>>> > Let,
>>> > a1 = 2
>>> > S = step size = 2
>>> >
>>> > Let Esel denote number of elements per pattern in sel at runtime.
>>> > Esel = (16 + 16x) / npatterns_sel
>>> > = (16 + 16x) / 2
>>> > = (8 + 8x)
>>> >
>>> > So, last element of pattern:
>>> > ae = a1 + (Esel - 2) * S
>>> >  = 2 + (8 + 8x - 2) * 2
>>> >  = 14 + 16x
>>> >
>>> > a1 /trunc arg0_len = 2 / (16 + 16x) = 0
>>> > ae /trunc arg0_len = (14 + 16x) / (16 + 16x) = 0
>>> > Since both are equal with quotient = 0, we select elements from op0.
>>> >
>>> > Since step size (S) is a multiple of npatterns(op0), we select
>>> > all elements from same pattern of op0.
>>> >
>>> > res_npatterns = max (op0_npatterns, max (op1_npatterns, sel_npatterns))
>>> >= max (2, max (2, 2)
>>> >= 2
>>> >
>>> > res_nelts_per_pattern = max (op0_nelts_per_pattern,
>>> > max 
>>> > (op1_nelts_per_pattern,
>>> >  
>>> > sel_nelts_per_pattern))
>>> > = max (3, max (3, 3))
>>> > = 3
>>> >
>>> > So res has encoding with npatterns = 2, nelts_per_pattern = 3.
>>> > res: { op0[0], op0[0], op0[2], op0[0], op0[4], op0[0], ... }
>>> >
>>> > Unfortunately, this results in an issue for poly_int_cst index:
>>> > For example,
>>> > op0, op1: npatterns = 1, nelts_per_pattern = 3
>>> > op0_len = op1_len = 4 + 4x
>>> >
>>> > sel: { 4 + 4x, 5 + 4x, 6 + 4x, ... } // should choose op1
>>> >
>>> > In this case,
>>> > a1 = 5 + 4x
>>> > S = (6 + 4x) - (5 + 4x) = 1
>>> > Esel = 4 + 4x
>>> >
>>> > ae = a1 + (esel - 2) * S
>>> >  = (5 + 4x) + (4 + 4x - 2) * 1
>>> >  = 7 + 8x
>>> >
>>> > IIUC, 7 + 8x will always be index for last element of op1 ?
>>> > if x = 0, len = 4, 7 + 8x = 7
>>> > if x = 1, len = 8, 7 + 8x = 15, etc.
>>> > So the stepped sequence will always choose elements
>>> > from op1 regardless of vector length for above case ?
>>> >
>>> > However,
>>> > ae /trunc op0_len
>>> > = (7 + 8x) / (4 + 4x)
>>> > which is not defined because 7/4 != 8/4
>>> > and we return NULL_TREE, but I suppose the expected result would be:
>>> > res: { op1[0], op1[1], op1[2], ... } ?
>>> >
>>> > The patch passes bootstrap+test on aarch64-linux-gnu with and without sve,
>>> > and on x86_64-unknown-linux-gnu.
>>> > I would be grateful for suggestions on how to proceed.
>>> >
>>> > Thanks,
>>> > Prathamesh
>>> >
>>> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
>>> > index a02ede79fed..8028b3e8e9a 100644
>>> > --- a/gcc/fold-const.cc
>>> > +++ b/gcc/fold-const.cc
>>> > @@ -85,6 +85,10 @@ along with GCC; see the file COPYING3.  If not see
>>> >  #include "vec-perm-indices.h"
>>> >  #include "asan.h"
>>> >  #include "gimple-range.h"
>>> > +#include 
>>> > +#include "tree-pretty-print.h"
>>> > +#include "gimple-pretty-print.h"
>>> > +#include "print-tree.h"
>>> >
>>> >  /* Nonzero if we are folding constants inside an initializer or a C++
>>> > manifestly-constant-evaluated context; zero otherwise.
>>> > @@ -10493,15 +10497,9 @@ fold_mult_zconjz (location_t loc, tree type, 
>>> > tree expr)
>>> >  static bool
>>> >  vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts)
>>> >  {
>>> > -  unsigned HOST_WIDE_INT i, nunits;
>>> > +  unsigned HOST_WIDE_INT i;
>>> >
>>> > -  if (TREE_CODE (arg) == VECTOR_CST
>>> > -  && VECTOR_CST_NELTS (arg).is_constant (&nunits))
>>> > -{
>>> > -  for (i = 0; i < n

Re: [PATCH V5 1/2] Add overflow API for plus minus mult on range

2023-08-03 Thread Andrew MacLeod via Gcc-patches

This is OK.


On 8/2/23 22:18, Jiufu Guo wrote:

Hi,

I would like to have a ping on this patch.

BR,
Jeff (Jiufu Guo)


Jiufu Guo  writes:


Hi,

As discussed in previous reviews, adding overflow APIs to range-op
would be useful. Those APIs could help to check if overflow happens
when operating between two 'range's, like: plus, minus, and mult.

Previous discussions are here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624701.html

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this patch ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* range-op-mixed.h (operator_plus::overflow_free_p): New declare.
(operator_minus::overflow_free_p): New declare.
(operator_mult::overflow_free_p): New declare.
* range-op.cc (range_op_handler::overflow_free_p): New function.
(range_operator::overflow_free_p): New default function.
(operator_plus::overflow_free_p): New function.
(operator_minus::overflow_free_p): New function.
(operator_mult::overflow_free_p): New function.
* range-op.h (range_op_handler::overflow_free_p): New declare.
(range_operator::overflow_free_p): New declare.
* value-range.cc (irange::nonnegative_p): New function.
(irange::nonpositive_p): New function.
* value-range.h (irange::nonnegative_p): New declare.
(irange::nonpositive_p): New declare.

---
  gcc/range-op-mixed.h |  11 
  gcc/range-op.cc  | 124 +++
  gcc/range-op.h   |   5 ++
  gcc/value-range.cc   |  12 +
  gcc/value-range.h|   2 +
  5 files changed, 154 insertions(+)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index 6944742ecbc..42157ed9061 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -383,6 +383,10 @@ public:
  relation_kind rel) const final override;
void update_bitmask (irange &r, const irange &lh,
   const irange &rh) const final override;
+
+  virtual bool overflow_free_p (const irange &lh, const irange &rh,
+   relation_trio = TRIO_VARYING) const;
+
  private:
void wi_fold (irange &r, tree type, const wide_int &lh_lb,
const wide_int &lh_ub, const wide_int &rh_lb,
@@ -446,6 +450,10 @@ public:
relation_kind rel) const final override;
void update_bitmask (irange &r, const irange &lh,
   const irange &rh) const final override;
+
+  virtual bool overflow_free_p (const irange &lh, const irange &rh,
+   relation_trio = TRIO_VARYING) const;
+
  private:
void wi_fold (irange &r, tree type, const wide_int &lh_lb,
const wide_int &lh_ub, const wide_int &rh_lb,
@@ -525,6 +533,9 @@ public:
const REAL_VALUE_TYPE &lh_lb, const REAL_VALUE_TYPE &lh_ub,
const REAL_VALUE_TYPE &rh_lb, const REAL_VALUE_TYPE &rh_ub,
relation_kind kind) const final override;
+  virtual bool overflow_free_p (const irange &lh, const irange &rh,
+   relation_trio = TRIO_VARYING) const;
+
  };
  
  class operator_addr_expr : public range_operator

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index cb584314f4c..632b044331b 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -366,6 +366,22 @@ range_op_handler::op1_op2_relation (const vrange &lhs) 
const
  }
  }
  
+bool

+range_op_handler::overflow_free_p (const vrange &lh,
+  const vrange &rh,
+  relation_trio rel) const
+{
+  gcc_checking_assert (m_operator);
+  switch (dispatch_kind (lh, lh, rh))
+{
+  case RO_III:
+   return m_operator->overflow_free_p(as_a  (lh),
+  as_a  (rh),
+  rel);
+  default:
+   return false;
+}
+}
  
  // Convert irange bitmasks into a VALUE MASK pair suitable for calling CCP.
  
@@ -688,6 +704,13 @@ range_operator::op1_op2_relation_effect (irange &lhs_range ATTRIBUTE_UNUSED,

return false;
  }
  
+bool

+range_operator::overflow_free_p (const irange &, const irange &,
+relation_trio) const
+{
+  return false;
+}
+
  // Apply any known bitmask updates based on this operator.
  
  void

@@ -4311,6 +4334,107 @@ range_op_table::initialize_integral_ops ()
  
  }
  
+bool

+operator_plus::overflow_free_p (const irange &lh, const irange &rh,
+   relation_trio) const
+{
+  if (lh.undefined_p () || rh.undefined_p ())
+return false;
+
+  tree type = lh.type ();
+  if (TYPE_OVERFLOW_UNDEFINED (type))
+return true;
+
+  wi::overflow_type ovf;
+  signop sgn = TYPE_SIGN (type);
+  wide_int wmax0 = lh.upper_bound ();
+  wide_int wmax1 = rh.upper_bound ();
+  wi::add (wmax0, wmax1, sgn, &ovf);
+  if (ovf != wi:

[PATCH] [libbacktrace] fix up broken test

2023-08-03 Thread Richard Biener via Gcc-patches
zstdtest has some inline data where some testcases lack the
uncompressed length field.  Thus it computes that but still
ends up allocating memory for the uncompressed buffer based on
that (zero) length.  Oops.  Causes memory corruption if the
allocator returns non-NULL.

Tested on x86_64-unknown-linux-gnu, pushed as obvious.

libbacktrace/
* zstdtest.c (test_samples): Properly compute the allocation
size for the uncompressed data.
---
 libbacktrace/zstdtest.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/libbacktrace/zstdtest.c b/libbacktrace/zstdtest.c
index 1b4158a50eb..1a27d90e29e 100644
--- a/libbacktrace/zstdtest.c
+++ b/libbacktrace/zstdtest.c
@@ -197,7 +197,11 @@ test_samples (struct backtrace_state *state)
   unsigned char *uncompressed;
   size_t uncompressed_len;
 
-  uncompressed = (unsigned char *) malloc (tests[i].uncompressed_len);
+  uncompressed_len = tests[i].uncompressed_len;
+  if (uncompressed_len == 0)
+   uncompressed_len = strlen (tests[i].uncompressed);
+
+  uncompressed = (unsigned char *) malloc (uncompressed_len);
   if (uncompressed == NULL)
{
  perror ("malloc");
@@ -206,10 +210,6 @@ test_samples (struct backtrace_state *state)
  continue;
}
 
-  uncompressed_len = tests[i].uncompressed_len;
-  if (uncompressed_len == 0)
-   uncompressed_len = strlen (tests[i].uncompressed);
-
   if (!backtrace_uncompress_zstd (state,
  ((const unsigned char *)
   tests[i].compressed),
-- 
2.35.3


[PATCH] mid-end: Use integral time intervals in timevar.cc

2023-08-03 Thread Matthew Malcomson via Gcc-patches
> 
> I think this is undesriable.  With fused you mean we use FMA?
> I think you could use -ffp-contract=off for the TU instead.
> 
> Note you can't use __attribute__((noinline)) literally since the
> host compiler might not support this.
> 
> Richard.
> 


Trying to make the timevar store integral time intervals.
Hope this is acceptable -- I had originally planned to use
`-ffp-contract` as agreed until I saw the email mentioning the old x86
bug in the same area which was not to do with floating point contraction
of operations (PR 99903) and figured it would be better to try and solve
both at the same time while making things in general a bit more robust.



On some AArch64 bootstrapped builds, we were getting a flaky test
because the floating point operations in `get_time` were being fused
with the floating point operations in `timevar_accumulate`.

This meant that the rounding behaviour of our multiplication with
`ticks_to_msec` was different when used in `timer::start` and when
performed in `timer::stop`.  These extra inaccuracies led to the
testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.

--
Avoiding the inlining which was agreed to be undesirable.  Three
alternative approaches:
1) Use `-ffp-contract=on` to avoid this particular optimisation.
2) Adjusting the code so that the "tolerance" is always of the order of
   a "tick".
3) Recording times and elapsed differences in integral values.
   - Could be in terms of a standard measurement (e.g. nanoseconds or
 microseconds).
   - Could be in terms of whatever integral value ("ticks" /
 secondsµseconds / "clock ticks") is returned from the syscall
 chosen at configure time.

While `-ffp-contract=on` removes the problem that I bumped into, there
has been a similar bug on x86 that was to do with a different floating
point problem that also happens after `get_time` and
`timevar_accumulate` both being inlined into the same function.  Hence
it seems worth choosing a different approach.

Of the two other solutions, recording measurements in integral values
seems the most robust against slightly "off" measurements being
presented to the user -- even though it could avoid the ICE that creates
a flaky test.

I considered storing time in whatever units our syscall returns and
normalising them at the time we print out rather than normalising them
to nanoseconds at the point we record our "current time".  The logic
being that normalisation could have some rounding affect (e.g. if
TICKS_PER_SECOND is 3) that would be taken into account in calculations.

I decided against it in order to give the values recorded in
`timevar_time_def` some interpretive value so it's easier to read the
code.  Compared to the small rounding that would represent a tiny amount
of time and AIUI can not trigger the same kind of ICE's as we are
attempting to fix, said interpretive value seems more valuable.

Recording time in microseconds seemed reasonable since all obvious
values for ticks and `getrusage` are at microsecond granularity or less
precise.  That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are both
variables given to use by the host system I was not sure of that enough
to make this decision.

--
timer::all_zero is ignoring rows which are inconsequential to the user
and would be printed out as all zeros.  Since upon printing rows we
convert to the same double value and print out the same precision as
before, we return true/false based on the same amount of time as before.

timer::print_row casts to a floating point measurement in units of
seconds as was printed out before.

timer::validate_phases -- I'm printing out nanoseconds here rather than
floating point seconds since this is an error message for when things
have "gone wrong" printing out the actual nanoseconds that have been
recorded seems like the best approach.
N.b. since we now print out nanoseconds instead of floating point value
the padding requirements are different.  Originally we were padding to
24 characters and printing 18 decimal places.  This looked odd with the
now visually smaller values getting printed.  I judged 13 characters
(corresponding to 2 hours) to be a reasonable point at which our
alignment could start to degrade and this provides a more compact output
for the majority of cases (checked by triggering the error case via
GDB).

--
N.b. I use a literal 10 for "NANOSEC_PER_SEC".  I believe this
would fit in an integer on all hosts that GCC supports, but am not
certain there are not strange integer sizes we support hence am pointing
it out for special attention during review.

--
No expected change in generated code.
Bootstrapped and regtested on AArch64 with no regressions.
Manually checked that flaky test is no longer flaky on the machine it
was seen before.

gcc/ChangeLog:

PR m

Re: [PATCH]AArch64 update costing for MLA by invariant

2023-08-03 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> >> Do you see vect_constant_defs in practice, or is this just for 
>> >> completeness?
>> >> I would expect any constants to appear as direct operands.  I don't
>> >> mind keeping it if it's just a belt-and-braces thing though.
>> >
>> > In the latency case where I had allow_constants the early rejection
>> > based on the operand itself wouldn't be rejected so in that case I
>> > still needed to reject them but do so after the multiply check.  While
>> > they do appear as direct operands as well they also have their own
>> > nodes, in particular for SLP so the constants are handled as a group.
>> 
>> Ah, OK, thanks.
>> 
>> > But can also check CONSTANT_CLASS_P (rhs) if that's preferrable.
>> 
>> No, what you did is more correct.  I just wasn't sure at first which case it 
>> was
>> handling.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_multiply_add_p): Update handling
>   of constants. 
>   (aarch64_adjust_stmt_cost): Use it.
>   (aarch64_vector_costs::count_ops): Likewise.
>   (aarch64_vector_costs::add_stmt_cost): Pass vinfo to
>   aarch64_adjust_stmt_cost.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> d4d7602554592b9042b8eaf389eff1ec80c2090e..7cc5916ce06b2635346c807da9306738b939ebc6
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16410,10 +16410,6 @@ aarch64_multiply_add_p (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>if (code != PLUS_EXPR && code != MINUS_EXPR)
>  return false;
>  
> -  if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
> -  || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
> -return false;
> -
>for (int i = 1; i < 3; ++i)
>  {
>tree rhs = gimple_op (assign, i);
> @@ -16441,7 +16437,8 @@ aarch64_multiply_add_p (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>   return false;
> def_stmt_info = vinfo->lookup_def (rhs);
> if (!def_stmt_info
> -   || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
> +   || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def
> +   || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_constant_def)
>   return false;
>   }
>  
> @@ -16721,8 +16718,9 @@ aarch64_sve_adjust_stmt_cost (class vec_info *vinfo, 
> vect_cost_for_stmt kind,
> and which when vectorized would operate on vector type VECTYPE.  Add the
> cost of any embedded operations.  */
>  static fractional_cost
> -aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, stmt_vec_info stmt_info,
> -   tree vectype, fractional_cost stmt_cost)
> +aarch64_adjust_stmt_cost (vec_info *vinfo, vect_cost_for_stmt kind,
> +   stmt_vec_info stmt_info, tree vectype,
> +   unsigned vec_flags, fractional_cost stmt_cost)
>  {
>if (vectype)
>  {
> @@ -16745,6 +16743,14 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
> stmt_vec_info stmt_info,
> break;
>   }
>  
> +  gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
> +  if (assign && !vect_is_reduction (stmt_info))
> + {
> +   /* For MLA we need to reduce the cost since MLA is 1 instruction.  */
> +   if (aarch64_multiply_add_p (vinfo, stmt_info, vec_flags))
> + return 0;
> + }
> +
>if (kind == vector_stmt || kind == vec_to_scalar)
>   if (tree cmp_type = vect_embedded_comparison_type (stmt_info))
> {
> @@ -16814,7 +16820,8 @@ aarch64_vector_costs::count_ops (unsigned int count, 
> vect_cost_for_stmt kind,
>  }
>  
>/* Assume that multiply-adds will become a single operation.  */
> -  if (stmt_info && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
> +  if (stmt_info
> +  && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
>  return;
>  
>/* Count the basic operation cost associated with KIND.  */

There's no need for this change now that there's no extra parameter.

OK with that change, thanks.

Richard

> @@ -17060,8 +17067,8 @@ aarch64_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>  {
>/* Account for any extra "embedded" costs that apply additively
>to the base cost calculated above.  */
> -  stmt_cost = aarch64_adjust_stmt_cost (kind, stmt_info, vectype,
> - stmt_cost);
> +  stmt_cost = aarch64_adjust_stmt_cost (m_vinfo, kind, stmt_info,
> + vectype, m_vec_flags, stmt_cost);
>  
>/* If we're recording a nonzero vector loop body cost for the
>innermost loop, also estimate the operations that would need


Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-03 Thread Jeff Law via Gcc-patches




On 8/3/23 03:27, juzhe.zh...@rivai.ai wrote:

https://github.com/gcc-mirror/gcc/commit/e15d0b6680d10d7666195e9db65581364ad5e5df 


This patch causes so many fails in the regression:
Mine.  I'll take care of it.  Probably something slipping through the 
expander that shouldn't.  I've been primarily focused on the execute.exp 
part of the suite to find code correctness issues with the original patch.


jeff


[committed] analyzer: fix ICE on zero-sized arrays [PR110882]

2023-08-03 Thread David Malcolm via Gcc-patches
Successfully bootstrapped and regrtested on x86_64-pc-linux-gnu.

Pushed to trunk as r14-2955-gc62f93d1e0383d.

gcc/analyzer/ChangeLog:
PR analyzer/110882
* region.cc (int_size_in_bits): Fail on zero-sized types.

gcc/testsuite/ChangeLog:
PR analyzer/110882
* gcc.dg/analyzer/pr110882.c: New test.
---
 gcc/analyzer/region.cc   |  6 +-
 gcc/testsuite/gcc.dg/analyzer/pr110882.c | 18 ++
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr110882.c

diff --git a/gcc/analyzer/region.cc b/gcc/analyzer/region.cc
index 9524739c7a4..730dab3d707 100644
--- a/gcc/analyzer/region.cc
+++ b/gcc/analyzer/region.cc
@@ -742,7 +742,11 @@ int_size_in_bits (const_tree type, bit_size_t *out)
 }
 
   tree sz = TYPE_SIZE (type);
-  if (sz && tree_fits_uhwi_p (sz))
+  if (sz
+  && tree_fits_uhwi_p (sz)
+  /* If the size is zero, then we may have a zero-sized
+array; handle such cases by returning false.  */
+  && !integer_zerop (sz))
 {
   *out = TREE_INT_CST_LOW (sz);
   return true;
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr110882.c 
b/gcc/testsuite/gcc.dg/analyzer/pr110882.c
new file mode 100644
index 000..80027184053
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr110882.c
@@ -0,0 +1,18 @@
+/* { dg-additional-options "-Wno-analyzer-too-complex" } */
+
+struct csv_row {
+  char *columns[0];
+};
+
+void
+parse_csv_line (int n_columns, const char *columns[])
+{
+  for (int n = 0; n < n_columns; n++) {
+  columns[n] = ((void *)0);
+  }
+}
+
+void parse_csv_data (int n_columns, struct csv_row *entry)
+{
+  parse_csv_line(n_columns, (const char **)entry->columns);
+}
-- 
2.26.3



Re: Fix profile upate after vectorizer peeling

2023-08-03 Thread Jeff Law




On 8/3/23 04:13, Jan Hubicka wrote:


Note most of the profile consistency checks FAIL when testing with -m32 on
x86_64-unknown-linux-gnu ...

For example vect-11.c has

;;   basic block 4, loop depth 0, count 719407024 (estimated locally,
freq 0.6700), maybe hot
;;   Invalid sum of incoming counts 708669602 (estimated locally, freq
0.6600), should be 719407024 (estimated locally, freq 0.6700)
;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;pred:   3 [always (guessed)]  count:708669602 (estimated
locally, freq 0.6600) (FALSE_VALUE,EXECUTABLE)
   __asm__ __volatile__("cpuid
 " : "=a" a_44, "=b" b_45, "=c" c_46, "=d" d_47 : "0" 1, "2" 0);
   _3 = d_47 & 67108864;

so it looks like it's the check_vect () function that goes wrong
everywhere but only on i?86.
The first dump with the Invalid sum is 095t.fixup_cfg3 already.


Sorry for that, looks like missing/undetected noreturn.  I will take a look.


The mismatch at fixup_cfg3 is harmless since we repropagate frequencies
later now.  The misupdate is caused by jump threading:

vect-11.c.102t.adjust_alignment:;;   Invalid sum of incoming counts 354334800 
(estimated locally, freq 0.3300), should be 233860966 (estimated locally, freq 
0.2178)
vect-11.c.102t.adjust_alignment:;;   Invalid sum of incoming counts 354334800 
(estimated locally, freq 0.3300), should be 474808634 (estimated locally, freq 
0.4422)
vect-11.c.107t.rebuild_frequencies1
vect-11.c.116t.threadfull1:;;   Invalid sum of incoming counts 708669600 
(estimated locally, freq 0.6600), should be 719407024 (estimated locally, freq 
0.6700)

I know that there are problems left in profile threading update.  It was
main pass disturbing profile until gcc13 and now works for basic
testcases but not always.  I already spent quite some time trying to
figure out what is wrong with profile threading (PR103680), so at least
this is small testcase.

Jeff, an help would be appreciated here :)

I will try to debug this.  One option would be to disable branch
prediciton on vect_check for time being - it is not inlined anyway
Not a lot of insight.  The backwards threader uses a totally different 
API for the CFG/SSA updates and that API I don't think has made any 
significant effort to keep the profile up-to-date.


Jeff


Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-03 Thread Kito Cheng via Gcc-patches
I am working on that, it seems the cost of vsetvli instruction become 0 due
to this change, then loop invariant motion won't hoist vsetvli longer.

Jeff Law  於 2023年8月3日 週四 21:49 寫道:

>
>
> On 8/3/23 03:27, juzhe.zh...@rivai.ai wrote:
> >
> https://github.com/gcc-mirror/gcc/commit/e15d0b6680d10d7666195e9db65581364ad5e5df
> <
> https://github.com/gcc-mirror/gcc/commit/e15d0b6680d10d7666195e9db65581364ad5e5df
> >
> >
> > This patch causes so many fails in the regression:
> Mine.  I'll take care of it.  Probably something slipping through the
> expander that shouldn't.  I've been primarily focused on the execute.exp
> part of the suite to find code correctness issues with the original patch.
>
> jeff
>


Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-03 Thread Jeff Law via Gcc-patches




On 8/3/23 07:56, Kito Cheng wrote:
I am working on that, it seems the cost of vsetvli instruction become 0 
due to this change, then loop invariant motion won't hoist vsetvli longer.
I haven't looked yet (generating baseline rvv.exp data right now).  But 
before I went to bed last night I was worried that a change snuck 
through that shouldn't have (changing the toplevel INSN/SET cost 
handling -- that wasn't supposed to be in the commit).  I was too tired 
to verify and correct without possibly mucking it up further.


That'll be the first thing to look at.  THe costing change was supposed 
only affect if-then-else constructs, not sets in general.


Jeff


Re: [PATCH] mid-end: Use integral time intervals in timevar.cc

2023-08-03 Thread David Malcolm via Gcc-patches
On Thu, 2023-08-03 at 14:38 +0100, Matthew Malcomson via Gcc-patches
wrote:
> > 
> > I think this is undesriable.  With fused you mean we use FMA?
> > I think you could use -ffp-contract=off for the TU instead.
> > 
> > Note you can't use __attribute__((noinline)) literally since the
> > host compiler might not support this.
> > 
> > Richard.
> > 
> 
> 
> Trying to make the timevar store integral time intervals.
> Hope this is acceptable -- I had originally planned to use
> `-ffp-contract` as agreed until I saw the email mentioning the old
> x86
> bug in the same area which was not to do with floating point
> contraction
> of operations (PR 99903) and figured it would be better to try and
> solve
> both at the same time while making things in general a bit more
> robust.
> _
> ___
> 
> 
> On some AArch64 bootstrapped builds, we were getting a flaky test
> because the floating point operations in `get_time` were being fused
> with the floating point operations in `timevar_accumulate`.
> 
> This meant that the rounding behaviour of our multiplication with
> `ticks_to_msec` was different when used in `timer::start` and when
> performed in `timer::stop`.  These extra inaccuracies led to the
> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
> 
> --
> Avoiding the inlining which was agreed to be undesirable.  Three
> alternative approaches:
> 1) Use `-ffp-contract=on` to avoid this particular optimisation.
> 2) Adjusting the code so that the "tolerance" is always of the order
> of
>    a "tick".
> 3) Recording times and elapsed differences in integral values.
>    - Could be in terms of a standard measurement (e.g. nanoseconds or
>  microseconds).
>    - Could be in terms of whatever integral value ("ticks" /
>  secondsµseconds / "clock ticks") is returned from the
> syscall
>  chosen at configure time.
> 
> While `-ffp-contract=on` removes the problem that I bumped into,
> there
> has been a similar bug on x86 that was to do with a different
> floating
> point problem that also happens after `get_time` and
> `timevar_accumulate` both being inlined into the same function. 
> Hence
> it seems worth choosing a different approach.
> 
> Of the two other solutions, recording measurements in integral values
> seems the most robust against slightly "off" measurements being
> presented to the user -- even though it could avoid the ICE that
> creates
> a flaky test.
> 
> I considered storing time in whatever units our syscall returns and
> normalising them at the time we print out rather than normalising
> them
> to nanoseconds at the point we record our "current time".  The logic
> being that normalisation could have some rounding affect (e.g. if
> TICKS_PER_SECOND is 3) that would be taken into account in
> calculations.
> 
> I decided against it in order to give the values recorded in
> `timevar_time_def` some interpretive value so it's easier to read the
> code.  Compared to the small rounding that would represent a tiny
> amount
> of time and AIUI can not trigger the same kind of ICE's as we are
> attempting to fix, said interpretive value seems more valuable.
> 
> Recording time in microseconds seemed reasonable since all obvious
> values for ticks and `getrusage` are at microsecond granularity or
> less
> precise.  That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are
> both
> variables given to use by the host system I was not sure of that
> enough
> to make this decision.
> 
> --
> timer::all_zero is ignoring rows which are inconsequential to the
> user
> and would be printed out as all zeros.  Since upon printing rows we
> convert to the same double value and print out the same precision as
> before, we return true/false based on the same amount of time as
> before.
> 
> timer::print_row casts to a floating point measurement in units of
> seconds as was printed out before.
> 
> timer::validate_phases -- I'm printing out nanoseconds here rather
> than
> floating point seconds since this is an error message for when things
> have "gone wrong" printing out the actual nanoseconds that have been
> recorded seems like the best approach.
> N.b. since we now print out nanoseconds instead of floating point
> value
> the padding requirements are different.  Originally we were padding
> to
> 24 characters and printing 18 decimal places.  This looked odd with
> the
> now visually smaller values getting printed.  I judged 13 characters
> (corresponding to 2 hours) to be a reasonable point at which our
> alignment could start to degrade and this provides a more compact
> output
> for the majority of cases (checked by triggering the error case via
> GDB).
> 
> --
> N.b. I use a literal 10 for "NANOSEC_PER_SEC".  I believe
> this
> would fit in an integer on all hosts that GCC supports, but am not
> certain there are not strange integer siz

[PATCH 2/3 v3] genmatch: Reduce variability of generated code

2023-08-03 Thread Andrzej Turko via Gcc-patches
So far genmatch has been using an unordered map to store information about
functions to be generated. Since corresponding locations from match.pd were
used as keys in the map, even small changes to match.pd which caused
line number changes would change the order in which the functions are
generated. This would reshuffle the functions between the generated .cc files.
This way even a minimal modification to match.pd forces recompilation of all
object files originating from match.pd on rebuild.

This commit makes sure that functions are generated in the order of their
processing (in contrast to the random order based on hashes of their
locations in match.pd). This is done by replacing the unordered map with an
ordered one. This way small changes to match.pd does not cause function
renaming and reshuffling among generated source files.
Together with the subsequent change to logging fprintf calls, this
removes unnecessary changes to the files generated by genmatch allowing
for reuse of already built object files during rebuild. The aim is to
make editing of match.pd and subsequent testing easier.

Signed-off-by: Andrzej Turko 

gcc/ChangeLog:

* genmatch.cc: Make sinfo map ordered.
* Makefile.in: Require the ordered map header for genmatch.o.
---
 gcc/Makefile.in | 4 ++--
 gcc/genmatch.cc | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index e99628cec07..2429128cbf2 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3004,8 +3004,8 @@ build/genhooks.o : genhooks.cc $(TARGET_DEF) 
$(C_TARGET_DEF)  \
   $(COMMON_TARGET_DEF) $(D_TARGET_DEF) $(BCONFIG_H) $(SYSTEM_H) errors.h
 build/genmddump.o : genmddump.cc $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)
\
   $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H)
-build/genmatch.o : genmatch.cc $(BCONFIG_H) $(SYSTEM_H) \
-  $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \
+build/genmatch.o : genmatch.cc $(BCONFIG_H) $(SYSTEM_H) $(CORETYPES_H) \
+  errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h ordered-hash-map.h \
   tree.def builtins.def internal-fn.def case-cfn-macros.h $(CPPLIB_H)
 build/gencfn-macros.o : gencfn-macros.cc $(BCONFIG_H) $(SYSTEM_H)  \
   $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def  \
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 2302f2a7ff0..1deca505603 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "hash-table.h"
 #include "hash-set.h"
 #include "is-a.h"
+#include "ordered-hash-map.h"
 
 
 /* Stubs for GGC referenced through instantiations triggered by hash-map.  */
@@ -1684,7 +1685,7 @@ struct sinfo_hashmap_traits : 
simple_hashmap_traits,
   template  static inline void remove (T &) {}
 };
 
-typedef hash_map
+typedef ordered_hash_map
   sinfo_map_t;
 
 /* Current simplifier ID we are processing during insertion into the
-- 
2.34.1



[PATCH 3/3 v3] genmatch: Log line numbers indirectly

2023-08-03 Thread Andrzej Turko via Gcc-patches
Currently fprintf calls logging to a dump file take line numbers
in the match.pd file directly as arguments.
When match.pd is edited, referenced code changes line numbers,
which causes changes to many fprintf calls and, thus, to many
(usually all) .cc files generated by genmatch. This forces make
to (unnecessarily) rebuild many .o files.

This change replaces those logging fprintf calls with calls to
a dedicated logging function. Because it reads the line numbers
from the lookup table, it is enough to pass a corresponding index.
Thanks to this, when match.pd changes, it is enough to rebuild
the file containing the lookup table and, of course, those
actually affected by the change.

Signed-off-by: Andrzej Turko 

gcc/ChangeLog:

* genmatch.cc: Log line numbers indirectly.
---
 gcc/genmatch.cc | 89 -
 1 file changed, 74 insertions(+), 15 deletions(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 1deca505603..63d6ba6dab0 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -217,9 +217,57 @@ fp_decl_done (FILE *f, const char *trailer)
 fprintf (header_file, "%s;", trailer);
 }
 
+/* Line numbers for use by indirect line directives.  */
+static vec dbg_line_numbers;
+
+static void
+write_header_declarations (bool gimple, FILE *f)
+{
+  fprintf (f, "\nextern void\n%s_dump_logs (const char *file1, int line1_id, "
+ "const char *file2, int line2, bool simplify);\n",
+ gimple ? "gimple" : "generic");
+}
+
+static void
+define_dump_logs (bool gimple, FILE *f)
+{
+
+  if (dbg_line_numbers.is_empty ())
+{
+  fprintf (f, "};\n\n");
+  return;
+}
+
+  fprintf (f , "void\n%s_dump_logs (const char *file1, int line1_id, "
+   "const char *file2, int line2, bool simplify)\n{\n",
+   gimple ? "gimple" : "generic");
+
+  fprintf_indent (f, 2, "static int dbg_line_numbers[%d] = {",
+ dbg_line_numbers.length ());
+
+  for (int i = 0; i < (int)dbg_line_numbers.length () - 1; i++)
+{
+  if (i % 20 == 0)
+   fprintf (f, "\n\t");
+
+  fprintf (f, "%d, ", dbg_line_numbers[i]);
+}
+  fprintf (f, "%d\n  };\n\n", dbg_line_numbers.last ());
+
+
+  fprintf_indent (f, 2, "fprintf (dump_file, \"%%s "
+ "%%s:%%d, %%s:%%d\\n\",\n");
+  fprintf_indent (f, 10, "simplify ? \"Applying pattern\" : "
+ "\"Matching expression\", file1, "
+ "dbg_line_numbers[line1_id], file2, line2);");
+
+  fprintf (f, "\n}\n\n");
+}
+
 static void
 output_line_directive (FILE *f, location_t location,
-  bool dumpfile = false, bool fnargs = false)
+ bool dumpfile = false, bool fnargs = false,
+ bool indirect_line_numbers = false)
 {
   const line_map_ordinary *map;
   linemap_resolve_location (line_table, location, LRK_SPELLING_LOCATION, &map);
@@ -239,7 +287,15 @@ output_line_directive (FILE *f, location_t location,
++file;
 
   if (fnargs)
-   fprintf (f, "\"%s\", %d", file, loc.line);
+  {
+if (indirect_line_numbers)
+  {
+   fprintf (f, "\"%s\", %d", file, dbg_line_numbers.length ());
+   dbg_line_numbers.safe_push (loc.line);
+  }
+else
+  fprintf (f, "\"%s\", %d", file, loc.line);
+  }
   else
fprintf (f, "%s:%d", file, loc.line);
 }
@@ -3375,20 +3431,19 @@ dt_operand::gen (FILE *f, int indent, bool gimple, int 
depth)
 }
 }
 
-/* Emit a fprintf to the debug file to the file F, with the INDENT from
+/* Emit a logging call to the debug file to the file F, with the INDENT from
either the RESULT location or the S's match location if RESULT is null. */
 static void
-emit_debug_printf (FILE *f, int indent, class simplify *s, operand *result)
+emit_logging_call (FILE *f, int indent, class simplify *s, operand *result,
+ bool gimple)
 {
   fprintf_indent (f, indent, "if (UNLIKELY (debug_dump)) "
-  "fprintf (dump_file, \"%s ",
-  s->kind == simplify::SIMPLIFY
-  ? "Applying pattern" : "Matching expression");
-  fprintf (f, "%%s:%%d, %%s:%%d\\n\", ");
+  "%s_dump_logs (", gimple ? "gimple" : "generic");
   output_line_directive (f,
-result ? result->location : s->match->location, true,
-true);
-  fprintf (f, ", __FILE__, __LINE__);\n");
+   result ? result->location : s->match->location,
+   true, true, true);
+  fprintf (f, ", __FILE__, __LINE__, %s);\n",
+ s->kind == simplify::SIMPLIFY ? "true" : "false");
 }
 
 /* Generate code for the '(if ...)', '(with ..)' and actual transform
@@ -3524,7 +3579,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   if (!result)
 {
   /* If there is no result then this is a predicate implementation.  */
-  emit_debug_printf (f, indent, s, result);
+  emit_logging_call (f, indent, s, result,

[PATCH 0/3 v3] genmatch: Speed up recompilation after changes to match.pd

2023-08-03 Thread Andrzej Turko via Gcc-patches
The following reduces the number of object files that need to be rebuilt
after match.pd has been modified. Right now a change to match.pd which
adds/removes a line almost always forces recompilation of all files that
genmatch generates from it. This is because of unnecessary changes to
the generated .cc files:

1. Function names and ordering change as does the way the functions are
distributed across multiple source files.
2. Code locations from match.pd are quoted directly (including line
numbers) by logging fprintf calls.

This patch addresses the those issues without changing the behaviour
of the generated code. The first one is solved by making sure that minor
changes to match.pd do not influence the order in which functions are
generated. The second one by using a lookup table with line numbers.

Now a change to a single function will trigger a rebuild of 4 object
files (one with the function  and the one with the lookup table both for
gimple and generic) instead all of them (20 by default).
For reference, this decreased the rebuild time with 48 threads from 3.5
minutes to 1.5 minutes on my machine.

V2:
* Placed the change in Makefile.in in the correct commit.
* Used a separate logging function to reduce size of the
executable.

V3:
* Fix a bug from 'genmatch: Log line numbers indirectly',
which was introduced in V2.
   

As for Richard Biener's remarks on executable size (cc1plus):

1. The first version of the change decreased (sic!) the executable size
by approximately 120 kB (.text and .data sections grew by
correspondingly 14 and 2 kB, but .debug_info section shrank by
roughly 170 kB).
2. In the current version (V3) the binary size increases by 36 kB (.text
grows by 3 kB and .rodata by 14 kB, the rest of the increase can
be mostly attributed to debug sections).

One can choose between those variants just by taking the third commit
either from this or the first version of the patch series.


Possible optimization:

Currently, the lookup table for line numbers contains duplicate values.
If I remove them, the table would shrink by 40-50% reducing the increase
in .data sections. Is it worth pursuing? And if so, would it be better
if I integrate this into this patch series or implement it separately?
Also, can I assume that genmatch is producing source code using a single
input file per invocation? Currently, this is the case.


Note for reviewers: I do not have write access.


Andrzej Turko (3):
  Support get_or_insert in ordered_hash_map
  genmatch: Reduce variability of generated code
  genmatch: Log line numbers indirectly

 gcc/Makefile.in   |  4 +-
 gcc/genmatch.cc   | 92 +--
 gcc/ordered-hash-map-tests.cc | 19 ++--
 gcc/ordered-hash-map.h| 26 ++
 4 files changed, 119 insertions(+), 22 deletions(-)

-- 
2.34.1



[PATCH 1/3 v3] Support get_or_insert in ordered_hash_map

2023-08-03 Thread Andrzej Turko via Gcc-patches
Get_or_insert method is already supported by the unordered hash map.
Adding it to the ordered map enables us to replace the unordered map
with the ordered one in cases where ordering may be useful.

Signed-off-by: Andrzej Turko 

gcc/ChangeLog:

* ordered-hash-map.h: Add get_or_insert.
* ordered-hash-map-tests.cc: Use get_or_insert in tests.
---
 gcc/ordered-hash-map-tests.cc | 19 +++
 gcc/ordered-hash-map.h| 26 ++
 2 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/gcc/ordered-hash-map-tests.cc b/gcc/ordered-hash-map-tests.cc
index 1c26bbfa979..55894c25fa0 100644
--- a/gcc/ordered-hash-map-tests.cc
+++ b/gcc/ordered-hash-map-tests.cc
@@ -58,6 +58,7 @@ static void
 test_map_of_strings_to_int ()
 {
   ordered_hash_map  m;
+  bool existed;
 
   const char *ostrich = "ostrich";
   const char *elephant = "elephant";
@@ -74,17 +75,23 @@ test_map_of_strings_to_int ()
   ASSERT_EQ (false, m.put (ostrich, 2));
   ASSERT_EQ (false, m.put (elephant, 4));
   ASSERT_EQ (false, m.put (ant, 6));
-  ASSERT_EQ (false, m.put (spider, 8));
+  existed = true;
+  int &value = m.get_or_insert (spider, &existed);
+  value = 8;
+  ASSERT_EQ (false, existed);
   ASSERT_EQ (false, m.put (millipede, 750));
   ASSERT_EQ (false, m.put (eric, 3));
 
+
   /* Verify that we can recover the stored values.  */
   ASSERT_EQ (6, m.elements ());
   ASSERT_EQ (2, *m.get (ostrich));
   ASSERT_EQ (4, *m.get (elephant));
   ASSERT_EQ (6, *m.get (ant));
   ASSERT_EQ (8, *m.get (spider));
-  ASSERT_EQ (750, *m.get (millipede));
+  existed = false;
+  ASSERT_EQ (750, m.get_or_insert (millipede, &existed));
+  ASSERT_EQ (true, existed);
   ASSERT_EQ (3, *m.get (eric));
 
   /* Verify that the order of insertion is preserved.  */
@@ -113,6 +120,7 @@ test_map_of_int_to_strings ()
 {
   const int EMPTY = -1;
   const int DELETED = -2;
+  bool existed;
   typedef int_hash  int_hash_t;
   ordered_hash_map  m;
 
@@ -131,7 +139,9 @@ test_map_of_int_to_strings ()
   ASSERT_EQ (false, m.put (2, ostrich));
   ASSERT_EQ (false, m.put (4, elephant));
   ASSERT_EQ (false, m.put (6, ant));
-  ASSERT_EQ (false, m.put (8, spider));
+  const char* &value = m.get_or_insert (8, &existed);
+  value = spider;
+  ASSERT_EQ (false, existed);
   ASSERT_EQ (false, m.put (750, millipede));
   ASSERT_EQ (false, m.put (3, eric));
 
@@ -141,7 +151,8 @@ test_map_of_int_to_strings ()
   ASSERT_EQ (*m.get (4), elephant);
   ASSERT_EQ (*m.get (6), ant);
   ASSERT_EQ (*m.get (8), spider);
-  ASSERT_EQ (*m.get (750), millipede);
+  ASSERT_EQ (m.get_or_insert (750, &existed), millipede);
+  ASSERT_EQ (existed, TRUE);
   ASSERT_EQ (*m.get (3), eric);
 
   /* Verify that the order of insertion is preserved.  */
diff --git a/gcc/ordered-hash-map.h b/gcc/ordered-hash-map.h
index 6b68cc96305..9fc875182e1 100644
--- a/gcc/ordered-hash-map.h
+++ b/gcc/ordered-hash-map.h
@@ -76,6 +76,32 @@ public:
 return m_map.get (k);
   }
 
+  /* Return a reference to the value for the passed in key, creating the entry
+if it doesn't already exist.  If existed is not NULL then it is set to
+false if the key was not previously in the map, and true otherwise.  */
+
+  Value &get_or_insert (const Key &k, bool *existed = NULL)
+  {
+bool _existed;
+Value &ret = m_map.get_or_insert (k, &_existed);
+
+if (!_existed)
+  {
+   bool key_present;
+   int &slot = m_key_index.get_or_insert (k, &key_present);
+   if (!key_present)
+ {
+   slot = m_keys.length ();
+   m_keys.safe_push (k);
+ }
+  }
+
+if (existed)
+  *existed = _existed;
+
+return ret;
+  }
+
   /* Removing a key removes it from the map, but retains the insertion
  order.  */
 
-- 
2.34.1



Re: Fix profile upate after vectorizer peeling

2023-08-03 Thread Jan Hubicka via Gcc-patches
> > Jeff, an help would be appreciated here :)
> > 
> > I will try to debug this.  One option would be to disable branch
> > prediciton on vect_check for time being - it is not inlined anyway
> Not a lot of insight.  The backwards threader uses a totally different API
> for the CFG/SSA updates and that API I don't think has made any significant
> effort to keep the profile up-to-date.

OK, at least some hints where the missing profile updat should be, would
be good. There is update_profile in tree-ssa-threadupdate and
understaning what is missing would be nice
In general it would be nice to mind profile when updating CFG :)

Honza
> 
> Jeff


Re: Fix profile upate after vectorizer peeling

2023-08-03 Thread Jeff Law




On 8/3/23 08:23, Jan Hubicka wrote:

Jeff, an help would be appreciated here :)

I will try to debug this.  One option would be to disable branch
prediciton on vect_check for time being - it is not inlined anyway

Not a lot of insight.  The backwards threader uses a totally different API
for the CFG/SSA updates and that API I don't think has made any significant
effort to keep the profile up-to-date.


OK, at least some hints where the missing profile updat should be, would
be good. There is update_profile in tree-ssa-threadupdate and
understaning what is missing would be nice
In general it would be nice to mind profile when updating CFG :)
THe backwards threader doesn't use much of the code in 
tree-ssa-threadupdate IIRC.  The bulk of the work for the backwards 
threader is done by copy_bbs.  I've actually suggested those two 
implementations be totally separated from each other to avoid confusion. 
 I just haven't had the time to do it (or much of anything with 
threading) myself.


When I last looked at this, the biggest problem was the class of cases 
that are handled by compute_path_counts which was used exclusively by 
the forward threader CFG updating code.  None of those cases in that big 
comment before that function are handled by the copy_bbs paths IIRC.


jeff


Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-03 Thread Kito Cheng via Gcc-patches
> > I am working on that, it seems the cost of vsetvli instruction become 0
> > due to this change, then loop invariant motion won't hoist vsetvli longer.
> I haven't looked yet (generating baseline rvv.exp data right now).  But
> before I went to bed last night I was worried that a change snuck
> through that shouldn't have (changing the toplevel INSN/SET cost
> handling -- that wasn't supposed to be in the commit).  I was too tired
> to verify and correct without possibly mucking it up further.
>
> That'll be the first thing to look at.  THe costing change was supposed
> only affect if-then-else constructs, not sets in general.


If so, I think the most simple fix is adding more checks on the set
cost - only check the SET_SRC is if-then-else?

Let me run the regression to see if that works - although the current
vsetvli cost is too high (4x~5x), but I think it should be fixed later
with a more complete expermantal.


Re: [PATCH 3/3] genmatch: Log line numbers indirectly

2023-08-03 Thread Andrzej Turko via Gcc-patches
Thank you for the review.

Yes, this increases the binary size.
I have implemented this in the third version of this patch series, is that
what you had in mind?

Originally, this change increased the sizes of binaries 8-12 kB, but after
updating the master branch, this change would actually decrease the
executable size. I described in more detail in the new cover letter.

Andrzej

On Mon, Jul 31, 2023, 15:49 Richard Biener 
wrote:

> On Mon, Jul 31, 2023 at 1:07 PM Andrzej Turko via Gcc-patches
>  wrote:
> >
> > Currently fprintf calls logging to a dump file take line numbers
> > in the match.pd file directly as arguments.
> > When match.pd is edited, referenced code changes line numbers,
> > which causes changes to many fprintf calls and, thus, to many
> > (usually all) .cc files generated by genmatch. This forces make
> > to (unnecessarily) rebuild many .o files.
> >
> > With this change those logging fprintf calls reference an array
> > of line numbers, which is defined in one of the produced files.
> > Thanks to this, when match.pd changes, it is enough to rebuild
> > that single file and, of course, those actually affected by the
> > change.
> >
> > Signed-off-by: Andrzej Turko 
>
> How does this affect the size of the executable?  We are replacing
> pushing a small immediate to the stack with an indexed load plus push.
>
> Maybe further indirecting the whole dumping, passing an index of the
> match and __FILE__/__LINE__ would help here, so instead of
>
>   if (UNLIKELY (debug_dump)) fprintf
> (dump_file, "Matching expression %s:%d, %s:%d\n", "match.pd", 2522,
> __FILE__, __LINE__);
>
> we emit sth like
>
>   if (UNLIKELY (debug_dump)) dump_match (2522,
> __FILE__, __LINE__);
>
> with 2522 replaced by the ID?  That would also get rid of the inline
> varargs invocation which might help code size as well (on some targets).
>
> Richard.
>
> > gcc/ChangeLog:
> >
> > * genmatch.cc: Keep line numbers from match.pd in an array.
> >
> > Signed-off-by: Andrzej Turko 
> > ---
> >  gcc/genmatch.cc | 73 +++--
> >  1 file changed, 65 insertions(+), 8 deletions(-)
> >
> > diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> > index 1deca505603..0a480a140c9 100644
> > --- a/gcc/genmatch.cc
> > +++ b/gcc/genmatch.cc
> > @@ -217,9 +217,48 @@ fp_decl_done (FILE *f, const char *trailer)
> >  fprintf (header_file, "%s;", trailer);
> >  }
> >
> > +/* Line numbers for use by indirect line directives.  */
> > +static vec dbg_line_numbers;
> > +
> > +static void
> > +write_header_declarations (bool gimple, FILE *f)
> > +{
> > +  if (gimple)
> > +fprintf (f, "\nextern int __gimple_dbg_line_numbers[];\n");
> > +  else
> > +fprintf (f, "\nextern int __generic_dbg_line_numbers[];\n");
> > +}
> > +
> > +static void
> > +define_dbg_line_numbers (bool gimple, FILE *f)
> > +{
> > +  if (gimple)
> > +fprintf (f, "\nint __gimple_dbg_line_numbers[%d] = {",
> > +   dbg_line_numbers.length ());
> > +  else
> > +fprintf (f, "\nint __generic_dbg_line_numbers[%d] = {",
> > +   dbg_line_numbers.length ());
> > +
> > +   if (dbg_line_numbers.is_empty ())
> > +{
> > +  fprintf (f, "};\n\n");
> > +  return;
> > +}
> > +
> > +  for (int i = 0; i < (int)dbg_line_numbers.length () - 1; i++)
> > +{
> > +  if (i % 20 == 0)
> > +   fprintf (f, "\n\t");
> > +
> > +  fprintf (f, "%d, ", dbg_line_numbers[i]);
> > +}
> > +  fprintf (f, "%d\n};\n\n", dbg_line_numbers.last ());
> > +}
> > +
> >  static void
> >  output_line_directive (FILE *f, location_t location,
> > -  bool dumpfile = false, bool fnargs = false)
> > + bool dumpfile = false, bool fnargs = false,
> > + bool indirect_line_numbers = false, bool gimple =
> false)
> >  {
> >const line_map_ordinary *map;
> >linemap_resolve_location (line_table, location,
> LRK_SPELLING_LOCATION, &map);
> > @@ -239,7 +278,20 @@ output_line_directive (FILE *f, location_t location,
> > ++file;
> >
> >if (fnargs)
> > -   fprintf (f, "\"%s\", %d", file, loc.line);
> > +  {
> > +  if (indirect_line_numbers)
> > +{
> > +  if (gimple)
> > +  fprintf (f, "\"%s\", __gimple_dbg_line_numbers[%d]",
> > + file, dbg_line_numbers.length ());
> > +  else
> > +  fprintf (f, "\"%s\", __generic_dbg_line_numbers[%d]",
> > + file, dbg_line_numbers.length ());
> > +  dbg_line_numbers.safe_push (loc.line);
> > +}
> > +  else
> > +fprintf (f, "\"%s\", %d", file, loc.line);
> > +  }
> >else
> > fprintf (f, "%s:%d", file, loc.line);
> >  }
> > @@ -3378,7 +3430,8 @@ dt_operand::gen (FILE *f, int indent, bool gimple,
> int depth)
> >  /* Emit a fprintf to the debug file to the file F, with the INDENT from
> > either the RESULT location or the S's match location if RESULT is
> null. */
> >  static void
> > -emit_debug_printf (

[PATCH v1] RISC-V: Fix one comment for binop_frm insn

2023-08-03 Thread Pan Li via Gcc-patches
From: Pan Li 

The previous patch missed the vfsub comment for binop_frm, this
patch would like to fix this.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Add vfsub.
---
 gcc/config/riscv/riscv-vector-builtins-bases.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 3adc11138a3..36c9aadd19c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
 
 /* Implements below instructions for now.
- vfadd
+   - vfsub
- vfmul
 */
 template
-- 
2.34.1



Re: [PATCH v1] RISC-V: Fix one comment for binop_frm insn

2023-08-03 Thread Kito Cheng via Gcc-patches
lgtm

On Thu, Aug 3, 2023 at 10:32 PM  wrote:
>
> From: Pan Li 
>
> The previous patch missed the vfsub comment for binop_frm, this
> patch would like to fix this.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc: Add vfsub.
> ---
>  gcc/config/riscv/riscv-vector-builtins-bases.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index 3adc11138a3..36c9aadd19c 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -277,6 +277,7 @@ public:
>
>  /* Implements below instructions for now.
> - vfadd
> +   - vfsub
> - vfmul
>  */
>  template
> --
> 2.34.1
>


Re: [PATCH 1/2] bpf: Implementation of BPF CO-RE builtins

2023-08-03 Thread Jose E. Marchesi via Gcc-patches


> Jose E. Marchesi writes:
>
>>> This patch updates the support for the BPF CO-RE builtins
>>> __builtin_preserve_access_index and __builtin_preserve_field_info,
>>> and adds support for the CO-RE builtins __builtin_btf_type_id,
>>> __builtin_preserve_type_info and __builtin_preserve_enum_value.
>>>
>>> These CO-RE relocations are now converted to __builtin_core_reloc which
>>> abstracts all of the original builtins in a polymorphic relocation
>>> specific builtin.
>>>
>>> The builtin processing is now split in 2 stages, the first (pack) is
>>> executed right after the front-end and the second (process) right before
>>> the asm output.
>>>
>>> In expand pass the __builtin_core_reloc is converted to a
>>> unspec:UNSPEC_CORE_RELOC rtx entry.
>>>
>>> The data required to process the builtin is now collected in the packing
>>> stage (after front-end), not allowing the compiler to optimize any of
>>> the relevant information required to compose the relocation when
>>> necessary.
>>> At expansion, that information is recovered and CTF/BTF is queried to
>>> construct the information that will be used in the relocation.
>>> At this point the relocation is added to specific section and the
>>> builtin is expanded to the expected default value for the builtin.
>>>
>>> In order to process __builtin_preserve_enum_value, it was necessary to
>>> hook the front-end to collect the original enum value reference.
>>> This is needed since the parser folds all the enum values to its
>>> integer_cst representation.
>>>
>>> More details can be found within the core-builtins.cc.
>>>
>>> Regtested in host x86_64-linux-gnu and target bpf-unknown-none.
>>> ---
>>>  gcc/config.gcc|4 +-
>>>  gcc/config/bpf/bpf-passes.def |   20 -
>>>  gcc/config/bpf/bpf-protos.h   |4 +-
>>>  gcc/config/bpf/bpf.cc |  817 +-
>>>  gcc/config/bpf/bpf.md |   17 +
>>>  gcc/config/bpf/core-builtins.cc   | 1397 +
>>>  gcc/config/bpf/core-builtins.h|   36 +
>>>  gcc/config/bpf/coreout.cc |   50 +-
>>>  gcc/config/bpf/coreout.h  |   13 +-
>>>  gcc/config/bpf/t-bpf  |6 +-
>>>  gcc/doc/extend.texi   |   51 +
>>>  ...core-builtin-fieldinfo-const-elimination.c |   29 +
>>>  12 files changed, 1639 insertions(+), 805 deletions(-)
>>>  delete mode 100644 gcc/config/bpf/bpf-passes.def
>>>  create mode 100644 gcc/config/bpf/core-builtins.cc
>>>  create mode 100644 gcc/config/bpf/core-builtins.h
>>>  create mode 100644 
>>> gcc/testsuite/gcc.target/bpf/core-builtin-fieldinfo-const-elimination.c
>>>
>>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>>> index eba69a463be0..c521669e78b1 100644
>>> --- a/gcc/config.gcc
>>> +++ b/gcc/config.gcc
>>> @@ -1597,8 +1597,8 @@ bpf-*-*)
>>>  use_collect2=no
>>>  extra_headers="bpf-helpers.h"
>>>  use_gcc_stdint=provide
>>> -extra_objs="coreout.o"
>>> -target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc"
>>> +extra_objs="coreout.o core-builtins.o"
>>> +target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc 
>>> \$(srcdir)/config/bpf/core-builtins.cc"
>>>  ;;
>>>  cris-*-elf | cris-*-none)
>>> tm_file="elfos.h newlib-stdint.h ${tm_file}"
>>> diff --git a/gcc/config/bpf/bpf-passes.def b/gcc/config/bpf/bpf-passes.def
>>> deleted file mode 100644
>>> index deeaee988a01..
>>> --- a/gcc/config/bpf/bpf-passes.def
>>> +++ /dev/null
>>> @@ -1,20 +0,0 @@
>>> -/* Declaration of target-specific passes for eBPF.
>>> -   Copyright (C) 2021-2023 Free Software Foundation, Inc.
>>> -
>>> -   This file is part of GCC.
>>> -
>>> -   GCC is free software; you can redistribute it and/or modify it
>>> -   under the terms of the GNU General Public License as published by
>>> -   the Free Software Foundation; either version 3, or (at your option)
>>> -   any later version.
>>> -
>>> -   GCC is distributed in the hope that it will be useful, but
>>> -   WITHOUT ANY WARRANTY; without even the implied warranty of
>>> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>> -   General Public License for more details.
>>> -
>>> -   You should have received a copy of the GNU General Public License
>>> -   along with GCC; see the file COPYING3.  If not see
>>> -   .  */
>>> -
>>> -INSERT_PASS_AFTER (pass_df_initialize_opt, 1, pass_bpf_core_attr);
>>> diff --git a/gcc/config/bpf/bpf-protos.h b/gcc/config/bpf/bpf-protos.h
>>> index b484310e8cbf..fbcf5111eb21 100644
>>> --- a/gcc/config/bpf/bpf-protos.h
>>> +++ b/gcc/config/bpf/bpf-protos.h
>>> @@ -30,7 +30,7 @@ extern void bpf_print_operand_address (FILE *, rtx);
>>>  extern void bpf_expand_prologue (void);
>>>  extern void bpf_expand_epilogue (void);
>>>  extern void bpf_expand_cbranch (machine_mode, rtx *)

[PATCH v1] RISC-V: Support RVV VFMACC rounding mode intrinsic API

2023-08-03 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to support the rounding mode API for the
VFMACC for the below samples.

* __riscv_vfmacc_vv_f32m1_rm
* __riscv_vfmacc_vv_f32m1_rm_m
* __riscv_vfmacc_vf_f32m1_rm
* __riscv_vfmacc_vf_f32m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfmacc_frm): New class for vfmacc frm.
(vfmacc_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfmacc_frm): New function definition.
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_ternop_insn): Add frm operand support.
* config/riscv/vector.md: Add vfmuladd to frm_mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-multiply-add.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 25 ++
 .../riscv/riscv-vector-builtins-bases.h   |  1 +
 .../riscv/riscv-vector-builtins-functions.def |  2 +
 gcc/config/riscv/riscv-vector-builtins.cc | 22 +++--
 gcc/config/riscv/vector.md|  2 +-
 .../base/float-point-single-multiply-add.c| 47 +++
 6 files changed, 93 insertions(+), 6 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-multiply-add.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 8d689f0c935..204ea77c372 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -355,6 +355,29 @@ public:
   }
 };
 
+/* Implements below instructions for frm
+   - vfmacc
+*/
+class vfmacc_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (true,
+   code_for_pred_mul_scalar (PLUS,
+ e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (true,
+   code_for_pred_mul (PLUS, e.vector_mode ()));
+gcc_unreachable ();
+  }
+};
+
 /* Implements vrsub.  */
 class vrsub : public function_base
 {
@@ -2115,6 +2138,7 @@ static CONSTEXPR const reverse_binop_frm 
vfrdiv_frm_obj;
 static CONSTEXPR const widen_binop vfwmul_obj;
 static CONSTEXPR const widen_binop_frm vfwmul_frm_obj;
 static CONSTEXPR const vfmacc vfmacc_obj;
+static CONSTEXPR const vfmacc_frm vfmacc_frm_obj;
 static CONSTEXPR const vfnmsac vfnmsac_obj;
 static CONSTEXPR const vfmadd vfmadd_obj;
 static CONSTEXPR const vfnmsub vfnmsub_obj;
@@ -2350,6 +2374,7 @@ BASE (vfrdiv_frm)
 BASE (vfwmul)
 BASE (vfwmul_frm)
 BASE (vfmacc)
+BASE (vfmacc_frm)
 BASE (vfnmsac)
 BASE (vfmadd)
 BASE (vfnmsub)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 2d2b52a312c..67d18412b4c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -160,6 +160,7 @@ extern const function_base *const vfrdiv_frm;
 extern const function_base *const vfwmul;
 extern const function_base *const vfwmul_frm;
 extern const function_base *const vfmacc;
+extern const function_base *const vfmacc_frm;
 extern const function_base *const vfnmsac;
 extern const function_base *const vfmadd;
 extern const function_base *const vfnmsub;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index d43b33ded17..3906f2e6248 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -348,6 +348,8 @@ DEF_RVV_FUNCTION (vfnmadd, alu, full_preds, f__ops)
 DEF_RVV_FUNCTION (vfnmadd, alu, full_preds, f_vvfv_ops)
 DEF_RVV_FUNCTION (vfmsub, alu, full_preds, f__ops)
 DEF_RVV_FUNCTION (vfmsub, alu, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f_vvfv_ops)
 
 // 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
 DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 528dca7ae85..abab06c00ed 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -3730,17 +3730,29 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
insn_code icode)
 }
 
   for (int argno = arg_offset; argno < call_expr_nargs (exp); argno++)
-add_input_operand (argno);
+{
+  if (base->has_rounding_mode_operand_p ()
+ && argno == call_expr_n

Re: [v2 PATCH 1/2] bpf: Implementation of BPF CO-RE builtins

2023-08-03 Thread Jose E. Marchesi via Gcc-patches


Ok.
Thanks!

> From fda9603ded735205b6e20fc5b65a04f8d15685e6 Mon Sep 17 00:00:00 2001
> From: Cupertino Miranda 
> Date: Thu, 6 Apr 2023 15:22:48 +0100
> Subject: [PATCH v2 1/2] bpf: Implementation of BPF CO-RE builtins
>
> This patch updates the support for the BPF CO-RE builtins
> __builtin_preserve_access_index and __builtin_preserve_field_info,
> and adds support for the CO-RE builtins __builtin_btf_type_id,
> __builtin_preserve_type_info and __builtin_preserve_enum_value.
>
> These CO-RE relocations are now converted to __builtin_core_reloc which
> abstracts all of the original builtins in a polymorphic relocation
> specific builtin.
>
> The builtin processing is now split in 2 stages, the first (pack) is
> executed right after the front-end and the second (process) right before
> the asm output.
>
> In expand pass the __builtin_core_reloc is converted to a
> unspec:UNSPEC_CORE_RELOC rtx entry.
>
> The data required to process the builtin is now collected in the packing
> stage (after front-end), not allowing the compiler to optimize any of
> the relevant information required to compose the relocation when
> necessary.
> At expansion, that information is recovered and CTF/BTF is queried to
> construct the information that will be used in the relocation.
> At this point the relocation is added to specific section and the
> builtin is expanded to the expected default value for the builtin.
>
> In order to process __builtin_preserve_enum_value, it was necessary to
> hook the front-end to collect the original enum value reference.
> This is needed since the parser folds all the enum values to its
> integer_cst representation.
>
> More details can be found within the core-builtins.cc.
>
> Regtested in host x86_64-linux-gnu and target bpf-unknown-none.
> ---
>  gcc/config.gcc  |4 +-
>  gcc/config/bpf/bpf-passes.def   |   20 -
>  gcc/config/bpf/bpf-protos.h |4 +-
>  gcc/config/bpf/bpf.cc   |  806 ++
>  gcc/config/bpf/bpf.md   |   17 +
>  gcc/config/bpf/core-builtins.cc | 1394 +++
>  gcc/config/bpf/core-builtins.h  |   35 +
>  gcc/config/bpf/coreout.cc   |   50 +-
>  gcc/config/bpf/coreout.h|   13 +-
>  gcc/config/bpf/t-bpf|6 +-
>  gcc/doc/extend.texi |   51 ++
>  11 files changed, 1595 insertions(+), 805 deletions(-)
>  delete mode 100644 gcc/config/bpf/bpf-passes.def
>  create mode 100644 gcc/config/bpf/core-builtins.cc
>  create mode 100644 gcc/config/bpf/core-builtins.h
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index eba69a463be0..c521669e78b1 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1597,8 +1597,8 @@ bpf-*-*)
>  use_collect2=no
>  extra_headers="bpf-helpers.h"
>  use_gcc_stdint=provide
> -extra_objs="coreout.o"
> -target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc"
> +extra_objs="coreout.o core-builtins.o"
> +target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc 
> \$(srcdir)/config/bpf/core-builtins.cc"
>  ;;
>  cris-*-elf | cris-*-none)
>   tm_file="elfos.h newlib-stdint.h ${tm_file}"
> diff --git a/gcc/config/bpf/bpf-passes.def b/gcc/config/bpf/bpf-passes.def
> deleted file mode 100644
> index deeaee988a01..
> --- a/gcc/config/bpf/bpf-passes.def
> +++ /dev/null
> @@ -1,20 +0,0 @@
> -/* Declaration of target-specific passes for eBPF.
> -   Copyright (C) 2021-2023 Free Software Foundation, Inc.
> -
> -   This file is part of GCC.
> -
> -   GCC is free software; you can redistribute it and/or modify it
> -   under the terms of the GNU General Public License as published by
> -   the Free Software Foundation; either version 3, or (at your option)
> -   any later version.
> -
> -   GCC is distributed in the hope that it will be useful, but
> -   WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   General Public License for more details.
> -
> -   You should have received a copy of the GNU General Public License
> -   along with GCC; see the file COPYING3.  If not see
> -   .  */
> -
> -INSERT_PASS_AFTER (pass_df_initialize_opt, 1, pass_bpf_core_attr);
> diff --git a/gcc/config/bpf/bpf-protos.h b/gcc/config/bpf/bpf-protos.h
> index b484310e8cbf..fbe0d8a0213f 100644
> --- a/gcc/config/bpf/bpf-protos.h
> +++ b/gcc/config/bpf/bpf-protos.h
> @@ -30,7 +30,7 @@ extern void bpf_print_operand_address (FILE *, rtx);
>  extern void bpf_expand_prologue (void);
>  extern void bpf_expand_epilogue (void);
>  extern void bpf_expand_cbranch (machine_mode, rtx *);
> -
> -rtl_opt_pass * make_pass_bpf_core_attr (gcc::context *);
> +const char *bpf_add_core_reloc (rtx *operands, const char *templ);
> +void bpf_replace_core_move_operands (rtx *operands);
>  
>  #endif /* ! GCC_BPF_PROTOS_H */
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> ind

Re: [v2 PATCH 2/2] bpf: CO-RE builtins support tests.

2023-08-03 Thread Jose E. Marchesi via Gcc-patches


OK.
Thanks.

> Hi,
>
> Resending this patch since I have noticed I had a testcase added in
> previous patch. Makes more sense here.
>
> Thanks,
> Cupertino
>
> From 334e9ae0f428f6573f2a5e8a3067a4d181b8b9c5 Mon Sep 17 00:00:00 2001
> From: Cupertino Miranda 
> Date: Thu, 27 Jul 2023 18:05:22 +0100
> Subject: [PATCH v2 2/2] bpf: CO-RE builtins support tests.
>
> This patch adds tests for the following builtins:
>   __builtin_preserve_enum_value
>   __builtin_btf_type_id
>   __builtin_preserve_type_info
> ---
>  .../gcc.target/bpf/core-builtin-enumvalue.c   |  52 +
>  .../bpf/core-builtin-enumvalue_errors.c   |  22 
>  .../bpf/core-builtin-enumvalue_opt.c  |  35 ++
>  ...core-builtin-fieldinfo-const-elimination.c |  29 +
>  .../bpf/core-builtin-fieldinfo-errors-1.c |   2 +-
>  .../bpf/core-builtin-fieldinfo-errors-2.c |   2 +-
>  .../gcc.target/bpf/core-builtin-type-based.c  |  58 ++
>  .../gcc.target/bpf/core-builtin-type-id.c |  40 +++
>  gcc/testsuite/gcc.target/bpf/core-support.h   | 109 ++
>  9 files changed, 347 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue.c
>  create mode 100644 
> gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_errors.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_opt.c
>  create mode 100644 
> gcc/testsuite/gcc.target/bpf/core-builtin-fieldinfo-const-elimination.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-type-based.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-type-id.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/core-support.h
>
> diff --git a/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue.c 
> b/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue.c
> new file mode 100644
> index ..3e3334dc089a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue.c
> @@ -0,0 +1,52 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O0 -dA -gbtf -mco-re" } */
> +
> +#include "core-support.h"
> +
> +extern int *v;
> +
> +int foo(void *data)
> +{
> + int i = 0;
> + enum named_ue64 named_unsigned64 = 0;
> + enum named_se64 named_signed64 = 0;
> + enum named_ue named_unsigned = 0;
> + enum named_se named_signed = 0;
> +
> + v[i++] = bpf_core_enum_value_exists (named_unsigned64, UE64_VAL1);
> + v[i++] = bpf_core_enum_value_exists (enum named_ue64, UE64_VAL2);
> + v[i++] = bpf_core_enum_value_exists (enum named_ue64, UE64_VAL3);
> + v[i++] = bpf_core_enum_value_exists (named_signed64, SE64_VAL1);
> + v[i++] = bpf_core_enum_value_exists (enum named_se64, SE64_VAL2);
> + v[i++] = bpf_core_enum_value_exists (enum named_se64, SE64_VAL3);
> +
> + v[i++] = bpf_core_enum_value (named_unsigned64, UE64_VAL1);
> + v[i++] = bpf_core_enum_value (named_unsigned64, UE64_VAL2);
> + v[i++] = bpf_core_enum_value (named_signed64, SE64_VAL1);
> + v[i++] = bpf_core_enum_value (named_signed64, SE64_VAL2);
> +
> + v[i++] = bpf_core_enum_value_exists (named_unsigned, UE_VAL1);
> + v[i++] = bpf_core_enum_value_exists (enum named_ue, UE_VAL2);
> + v[i++] = bpf_core_enum_value_exists (enum named_ue, UE_VAL3);
> + v[i++] = bpf_core_enum_value_exists (named_signed, SE_VAL1);
> + v[i++] = bpf_core_enum_value_exists (enum named_se, SE_VAL2);
> + v[i++] = bpf_core_enum_value_exists (enum named_se, SE_VAL3);
> +
> + v[i++] = bpf_core_enum_value (named_unsigned, UE_VAL1);
> + v[i++] = bpf_core_enum_value (named_unsigned, UE_VAL2);
> + v[i++] = bpf_core_enum_value (named_signed, SE_VAL1);
> + v[i++] = bpf_core_enum_value (named_signed, SE_VAL2);
> +
> + return 0;
> +}
> +
> +/* { dg-final { scan-assembler-times "\t.4byte\t0x8\t; bpfcr_type 
> \\(named_ue64\\)" 5 } } */
> +/* { dg-final { scan-assembler-times "\t.4byte\t0x9\t; bpfcr_type 
> \\(named_se64\\)" 5} } */
> +/* { dg-final { scan-assembler-times "\t.4byte\t0xb\t; bpfcr_type 
> \\(named_ue\\)" 5 } } */
> +/* { dg-final { scan-assembler-times "\t.4byte\t0xc\t; bpfcr_type 
> \\(named_se\\)" 5} } */
> +/* { dg-final { scan-assembler-times "\t.4byte\t0xa\t; bpfcr_kind" 12 } } 
> BPF_ENUMVAL_EXISTS */
> +/* { dg-final { scan-assembler-times "\t.4byte\t0xb\t; bpfcr_kind" 8 } } 
> BPF_ENUMVAL_VALUE */
> +
> +/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0\"\\)" 8 } } */
> +/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"1\"\\)" 8 } } */
> +/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"2\"\\)" 4 } } */
> diff --git a/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_errors.c 
> b/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_errors.c
> new file mode 100644
> index ..138e99895160
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_errors.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O0 -dA -gbtf -mco-re" } */
> +
> +#include "core-support.h"
> +
> +extern int *v;
> +
> +unsigned long foo(void *data)
> +{
> +  int i = 0;
> +  enum named_ue64 

Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-03 Thread Jeff Law via Gcc-patches




On 8/3/23 08:31, Kito Cheng wrote:

I am working on that, it seems the cost of vsetvli instruction become 0
due to this change, then loop invariant motion won't hoist vsetvli longer.

I haven't looked yet (generating baseline rvv.exp data right now).  But
before I went to bed last night I was worried that a change snuck
through that shouldn't have (changing the toplevel INSN/SET cost
handling -- that wasn't supposed to be in the commit).  I was too tired
to verify and correct without possibly mucking it up further.

That'll be the first thing to look at.  THe costing change was supposed
only affect if-then-else constructs, not sets in general.



If so, I think the most simple fix is adding more checks on the set
cost - only check the SET_SRC is if-then-else?
No, the simple fix is to just remove the errant part of the commit :-0 
My tests aren't done, but that does seem to dramatically help.  Given it 
wasn't supposed to go in as-is and it's causing major problems, I'll 
probably just rip it out even though my testing isn't done.





Let me run the regression to see if that works - although the current
vsetvli cost is too high (4x~5x), but I think it should be fixed later
with a more complete expermantal.
Exactly.  I think we need to do a full audit of the costing paths.  I've 
been slowly devising a way to do that and I'll probably give it to 
Raphael or Jivan once I've fleshed it out a bit more in my head.


The goal is to make sure the costs are sensible and consistent across 
the different interfaces.  A cost failure is actually a bit hard to find 
because all that happens is you get the wrong set of transformations -- 
but the code still works correctly, it's just not as efficient as it 
should be.  It doesn't have to be perfect, but we've clearly got a problem.


WRT vsetvli costing.  That may ultimately be something that's uarch 
dependent.  We're working on the assumption that vsetvlis are common in 
the code stream and they need to be very efficient from the hardware 
standpoint (think as cheap or cheaper than any simple ALU instruction). 
I probably can't say what we're doing, but I bet it wouldn't be a 
surprise to others doing a high performance V implementation.


jeff


RE: [PATCH v1] RISC-V: Fix one comment for binop_frm insn

2023-08-03 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, August 3, 2023 10:35 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Fix one comment for binop_frm insn

lgtm

On Thu, Aug 3, 2023 at 10:32 PM  wrote:
>
> From: Pan Li 
>
> The previous patch missed the vfsub comment for binop_frm, this
> patch would like to fix this.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc: Add vfsub.
> ---
>  gcc/config/riscv/riscv-vector-builtins-bases.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index 3adc11138a3..36c9aadd19c 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -277,6 +277,7 @@ public:
>
>  /* Implements below instructions for now.
> - vfadd
> +   - vfsub
> - vfmul
>  */
>  template
> --
> 2.34.1
>


[committed] testsuite, analyzer: add test case [PR108171]

2023-08-03 Thread David Malcolm via Gcc-patches
The ICE in PR analyzer/108171 appears to be a dup of the recently fixed
PR analyzer/110882 and is likewise fixed by it; adding this test case.

Successfully regrtested on x86_64-pc-linux-gnu.

Pushed to trunk as r14-2957-gf80efa49b7a163.

gcc/testsuite/ChangeLog:
PR analyzer/108171
* gcc.dg/analyzer/pr108171.c: New test.
---
 gcc/testsuite/gcc.dg/analyzer/pr108171.c | 31 
 1 file changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr108171.c

diff --git a/gcc/testsuite/gcc.dg/analyzer/pr108171.c 
b/gcc/testsuite/gcc.dg/analyzer/pr108171.c
new file mode 100644
index 000..5f7b9fd7875
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr108171.c
@@ -0,0 +1,31 @@
+struct nl_context {
+  void *cmd_private;
+};
+
+struct sfeatures_context {
+  int a;
+  int req_mask[0];
+};
+
+int set_sf_req_mask_idx;
+
+extern void fill_legacy_flag();
+
+void
+fill_sfeatures_bitmap(struct nl_context *nlctx) {
+  while (nlctx) {
+fill_legacy_flag();
+struct nl_context __trans_tmp_1 = *nlctx;
+struct sfeatures_context *sfctx = __trans_tmp_1.cmd_private;
+sfctx->req_mask[set_sf_req_mask_idx] |= 1;
+  }
+}
+
+void
+nl_sfeatures() {
+  struct nl_context nlctx;
+  struct sfeatures_context *sfctx;
+  nlctx.cmd_private = &sfctx;
+  sfctx = 0;
+  fill_sfeatures_bitmap(&nlctx);
+}
-- 
2.26.3



Re: [PATCH] mid-end: Use integral time intervals in timevar.cc

2023-08-03 Thread Matthew Malcomson via Gcc-patches

On 8/3/23 15:09, David Malcolm wrote:


Hi Matthew.  I recently touched the timevar code (in r14-2881-
g75d623946d4b6e) to add support for serializing the timevar data in
JSON form as part of the SARIF output (PR analyzer/109361).

Looking at your patch, it looks like the baseline for the patch seems
to predate r14-2881-g75d623946d4b6e.

I don't have a strong opinion on the implementation choices in your
patch, but please can you rebase to beyond my recent change and make
sure that the SARIF serialization still works with your patch.

Specifically, please try compiling with
   -ftime-report -fdiagnostics-format=sarif-file
and have a look at the generated .sarif file, e.g. via
   python -m json.tool foo.c.sarif
which will pretty-print the JSON to stdout.

Currently I'm writing out the values as floating-point seconds, and
AFAIK my analyzer integration testsuite [1] is the only consumer of
this data.


Hi David,

Thanks for the heads-up.  Will update the patch.

I read your last paragraph as suggesting that you'd be open to changing 
the format.  Is that correct?


I would initially assume that writing out the time as floating-point 
seconds would still be most convenient for your use since it looks to be 
like something to be presented to a person.


However, since I don't know much about the intended uses of SARIF in 
general I figured I should double-check -- does that choice to remain 
printing out floating-point seconds seem best to you?




[...snip...]

Thanks
Dave
[1]
https://github.com/davidmalcolm/gcc-analyzer-integration-tests/issues/5





Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-03 Thread Kito Cheng via Gcc-patches
> >> That'll be the first thing to look at.  THe costing change was supposed
> >> only affect if-then-else constructs, not sets in general.
> >
> >
> > If so, I think the most simple fix is adding more checks on the set
> > cost - only check the SET_SRC is if-then-else?
> No, the simple fix is to just remove the errant part of the commit :-0
> My tests aren't done, but that does seem to dramatically help.  Given it
> wasn't supposed to go in as-is and it's causing major problems, I'll
> probably just rip it out even though my testing isn't done.

OK, so I'm going to retreat from there, I've another lld issue that
needs to be fixed before the LLVM 17 release :)

>
> >
> > Let me run the regression to see if that works - although the current
> > vsetvli cost is too high (4x~5x), but I think it should be fixed later
> > with a more complete expermantal.
> Exactly.  I think we need to do a full audit of the costing paths.  I've
> been slowly devising a way to do that and I'll probably give it to
> Raphael or Jivan once I've fleshed it out a bit more in my head.
>
> The goal is to make sure the costs are sensible and consistent across
> the different interfaces.  A cost failure is actually a bit hard to find
> because all that happens is you get the wrong set of transformations --
> but the code still works correctly, it's just not as efficient as it
> should be.  It doesn't have to be perfect, but we've clearly got a problem.
>
> WRT vsetvli costing.  That may ultimately be something that's uarch
> dependent.  We're working on the assumption that vsetvlis are common in
> the code stream and they need to be very efficient from the hardware
> standpoint (think as cheap or cheaper than any simple ALU instruction).
> I probably can't say what we're doing, but I bet it wouldn't be a
> surprise to others doing a high performance V implementation.

Yeah, it should be cheap, but might be expensive on some HW implementation,
anyway our cost model really needs to be tidy up at some point...:P

> jeff


Re: [PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2023-08-03 Thread Qing Zhao via Gcc-patches


> On Aug 3, 2023, at 3:10 AM, Richard Biener  wrote:
> 
> On Mon, Jul 10, 2023 at 9:12 PM Qing Zhao via Gcc-patches
>  wrote:
>> 
>> Hi,
>> 
>> This is the change for the GCC14 releaes Notes on the deprecating of a C
>> extension about flexible array members.
>> 
>> Okay for committing?
>> 
>> thanks.
>> 
>> Qing
>> 
>> 
>> 
>> *htdocs/gcc-14/changes.html (Caveats): Add notice about deprecating a C
>> extension about flexible array members.
>> ---
>> htdocs/gcc-14/changes.html | 10 +-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>> 
>> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
>> index 3f797642..c7f2ce4d 100644
>> --- a/htdocs/gcc-14/changes.html
>> +++ b/htdocs/gcc-14/changes.html
>> @@ -30,7 +30,15 @@ a work-in-progress.
>> 
>> Caveats
>> 
>> -  ...
>> +  C:
>> +  Support for the GCC extension, a structure containing a C99 flexible 
>> array
>> +  member, or a union containing such a structure, is not the last field 
>> of
>> +  another structure, is deprecated. Refer to
>> +  https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html";>
>> +  Zero Length Arrays.
>> +  Any code relying on this extension should be modifed to ensure that
>> +  C99 flexible array members only end up at the ends of structures.
> 
> If it's deprecated any use should be diagnosed by default with a
> warning, can you
> mention that and make sure we do so?

I have added such warning option: -Wflex-array-member-not-at-end
In commit https://gcc.gnu.org/pipermail/gcc-cvs/2023-June/385730.html

I should add this warning option to the release notes. I will do that.


Another thing is, I just realized that the doc of this new option is missing 
from the invoke.texi.

I will send a new patch for the documentation of this new option first.

>  What would be the most surprising
> example of code that's going to be rejected?

 struct flex  { int length; char data[]; };

  struct mid_flex { int m; struct flex flex_data; int n; };

The above example, the 2nd structure mid_flex will be warned.

Qing
 
> 
> Richard.
> 
>> +  
>> 
>> 
>> 
>> --
>> 2.31.1



[committed][RISC-V] Remove errant hunk of code

2023-08-03 Thread Jeff Law via Gcc-patches


I'm using this hunk locally to more thoroughly exercise the zicond paths 
due to inaccuracies elsewhere in the costing model.  It was never 
supposed to be part of the costing commit though.  And as we've seen 
it's causing problems with the vector bits.


While my testing isn't complete, this hunk was never supposed to be 
pushed and it's causing problems.  So I'm just ripping it out.


There's a bigger TODO in this space WRT a top-to-bottom evaluation of 
the costing on RISC-V.  I'm still formulating what that evaluation is 
going to look like, so don't hold your breath waiting on it.


Pushed to the trunk.
commit d61efa3cd3378be38738bfb5139925d1505c1325
Author: Jeff Law 
Date:   Thu Aug 3 10:57:23 2023 -0400

[committed][RISC-V] Remove errant hunk of code

I'm using this hunk locally to more thoroughly exercise the zicond paths
due to inaccuracies elsewhere in the costing model.  It was never
supposed to be part of the costing commit though.  And as we've seen
it's causing problems with the vector bits.

While my testing isn't complete, this hunk was never supposed to be
pushed and it's causing problems.  So I'm just ripping it out.

There's a bigger TODO in this space WRT a top-to-bottom evaluation of
the costing on RISC-V.  I'm still formulating what that evaluation is
going to look like, so don't hold your breath waiting on it.

Pushed to the trunk.

gcc/

* config/riscv/riscv.cc (riscv_rtx_costs): Remove errant hunk from
recent commit.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9e75450aa97..d8fab68dbb4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2913,16 +2913,6 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
}
   return false;
 
-case SET:
-  /* A simple SET with a register destination takes its cost solely from
-the SET_SRC operand.  */
-  if (outer_code == INSN && REG_P (SET_DEST (x)))
-   {
- *total = riscv_rtx_costs (SET_SRC (x), mode, SET, opno, total, speed);
- return true;
-   }
-  return false;
-
 default:
   return false;
 }


Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-03 Thread Jeff Law via Gcc-patches




On 8/3/23 08:56, Kito Cheng wrote:

That'll be the first thing to look at.  THe costing change was supposed
only affect if-then-else constructs, not sets in general.



If so, I think the most simple fix is adding more checks on the set
cost - only check the SET_SRC is if-then-else?

No, the simple fix is to just remove the errant part of the commit :-0
My tests aren't done, but that does seem to dramatically help.  Given it
wasn't supposed to go in as-is and it's causing major problems, I'll
probably just rip it out even though my testing isn't done.


OK, so I'm going to retreat from there, I've another lld issue that
needs to be fixed before the LLVM 17 release :)

Reversion of errant hunk has been pushed.  Sorry for the problems folks.

Had I known it was going to have this kind of fallout, I would have 
slammed a coke and fixed it last night before doing to sleep :-0



And yes, focusing on the lld issue seems wise given what I'm hearing in 
the LLVM meeting.


Jeff


Re: Fix profile upate after vectorizer peeling

2023-08-03 Thread Aldy Hernandez via Gcc-patches




On 8/3/23 16:29, Jeff Law wrote:



On 8/3/23 08:23, Jan Hubicka wrote:

Jeff, an help would be appreciated here :)

I will try to debug this.  One option would be to disable branch
prediciton on vect_check for time being - it is not inlined anyway
Not a lot of insight.  The backwards threader uses a totally 
different API
for the CFG/SSA updates and that API I don't think has made any 
significant

effort to keep the profile up-to-date.


OK, at least some hints where the missing profile updat should be, would
be good. There is update_profile in tree-ssa-threadupdate and
understaning what is missing would be nice
In general it would be nice to mind profile when updating CFG :)
THe backwards threader doesn't use much of the code in 
tree-ssa-threadupdate IIRC.  The bulk of the work for the backwards 
threader is done by copy_bbs.  I've actually suggested those two 
implementations be totally separated from each other to avoid confusion. 
  I just haven't had the time to do it (or much of anything with 
threading) myself.


A couple cycles ago I separated most of code to distinguish between the 
back and forward threaders.  There is class jt_path_registry that is 
common to both, and {fwd,back}_jt_path_registry for the forward and 
backward threaders respectively.  It's not perfect, but it's a start.


Aldy



Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.

2023-08-03 Thread Jeff Law via Gcc-patches




On 8/3/23 01:04, Richard Biener wrote:

On Wed, Aug 2, 2023 at 4:08 PM Manolis Tsamis  wrote:


Hi all,

I'm pinging to discuss again if we want to move this forward for GCC14.

I did some testing again and I haven't been able to find obvious
regressions, including testing the code from PR86270 and PR70359 that
Richard mentioned.
I still believe that zero can be considered a special case even for
hardware that doesn't directly benefit in the comparison.
For example it happens that the testcase from the commit compiles to
one instruction less in x86:

.LFB0:
 movl(%rdi), %eax
 leal1(%rax), %edx
 movl%edx, (%rdi)
 testl%eax, %eax
 je.L4
 ret
.L4:
 jmpg

vs

.LFB0:
 movl(%rdi), %eax
 addl$1, %eax
 movl%eax, (%rdi)
 cmpl$1, %eax
 je.L4
 ret
.L4:
 xorl%eax, %eax
 jmpg

(The xorl is not emitted  when testl is used. LLVM uses testl but also
does xor eax, eax :) )
Although this is accidental, I believe it also showcases that zero is
a preferential value in various ways.

I'm running benchmarks comparing the effects of this change and I'm
also still looking for testcases that result in problematic
regressions.
Any feedback or other concerns about this are appreciated!


My comment from Apr 24th still holds, IMO this is something for
instruction selection (aka the ISEL pass) or the out-of-SSA tweaks
we do during RTL expansion (see insert_backedge_copies)
I'm still generally supportive of biasing to zero, but as Richi has 
noted the current implementation needs to be pushed further back into 
the pipeline, preferably all the way to isel or gimple->rtl expansion.


Jeff


Re: [PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-03 Thread Eric Feng via Gcc-patches
On Wed, Aug 2, 2023 at 5:09 PM David Malcolm  wrote:
>
> On Wed, 2023-08-02 at 14:46 -0400, Eric Feng wrote:
> > On Wed, Aug 2, 2023 at 1:20 PM Marek Polacek 
> > wrote:
> > >
> > > On Wed, Aug 02, 2023 at 12:59:28PM -0400, David Malcolm wrote:
> > > > On Wed, 2023-08-02 at 12:20 -0400, Eric Feng wrote:
> > > >
>
> [Dropping Joseph and Marek from the CC]
>
> [...snip...]
>
> >
> >
> > Thank you, everyone. I've submitted a new patch with the described
> > changes.
>
> Thanks.
>
> > As I do not yet have write access, could someone please help
> > me commit it?
>
> I've pushed the v3 trunk to patch, as r14-2933-gfafe2d18f791c6; you can
> see it at [1], so you're now officially a GCC contributor,
> congratulation!
>
> FWIW I had to do a little whitespace fixing on the ChangeLog entries
> before the server-side hooks.commit-extra-checker would pass, as they
> were indented with spaces, rather than tabs, so it complained thusly:
>
> remote: *** The following commit was rejected by your 
> hooks.commit-extra-checker script (status: 1)
> remote: *** commit: 0a4a2dc7dad1dfe22be0b48fe0d8c50d216c8349
> remote: *** ChangeLog format failed:
> remote: *** ERR: line should start with a tab: "PR analyzer/107646"
> remote: *** ERR: line should start with a tab: "* 
> analyzer-language.cc (run_callbacks): New function."
> remote: *** ERR: line should start with a tab: "
> (on_finish_translation_unit): New function."
> remote: *** ERR: line should start with a tab: "* analyzer-language.h 
> (GCC_ANALYZER_LANGUAGE_H): New include."
> remote: *** ERR: line should start with a tab: "(class 
> translation_unit): New vfuncs."
> remote: *** ERR: line should start with a tab: "PR analyzer/107646"
> remote: *** ERR: line should start with a tab: "* c-parser.cc: New 
> functions on stashing values for the"
> remote: *** ERR: line should start with a tab: "  analyzer."
> remote: *** ERR: line should start with a tab: "PR analyzer/107646"
> remote: *** ERR: line should start with a tab: "* 
> gcc.dg/plugin/plugin.exp: Add new plugin and test."
> remote: *** ERR: line should start with a tab: "* 
> gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin."
> remote: *** ERR: line should start with a tab: "* 
> gcc.dg/plugin/cpython-plugin-test-1.c: New test."
> remote: *** ERR: PR 107646 in subject but not in changelog: "analyzer: stash 
> values for CPython plugin [PR107646]"
> remote: ***
> remote: *** Please see: https://gcc.gnu.org/codingconventions.html#ChangeLogs
> remote: ***
> remote: error: hook declined to update refs/heads/master
> To git+ssh://gcc.gnu.org/git/gcc.git
>  ! [remote rejected] master -> master (hook declined)
> error: failed to push some refs to 
> 'git+ssh://dmalc...@gcc.gnu.org/git/gcc.git'
>
> ...but this was a trivial fix.  You can test that patches are properly
> formatted by running:
>
>   ./contrib/gcc-changelog/git_check_commit.py HEAD
>
> locally.
Sorry about that — will do. Thanks!
>
>
> >  Otherwise, please let me know if I should request write
> > access first (the GettingStarted page suggested requesting someone
> > commit the patch for the first few patches before requesting write
> > access).
>
> Please go ahead and request write access now; we should have done this
> in the "community bonding" phase of GSoC; sorry for not catching this.
Sounds good.
>
> Thanks again for the patch.  How's the followup work?  Are you close to
> being able to post one or more of the simpler known_function
> subclasses?
Yes, I will submit another patch for review very soon. Thank you for
helping me push this one!

Best,
Eric
>
> Dave
>
> [1]
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=fafe2d18f791c6b97b49af7c84b1b5703681c3af
>


Re: [COMMITTEDv3] tree-optimization: [PR100864] `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-08-03 Thread Andrew Pinski via Gcc-patches
On Thu, Aug 3, 2023 at 4:58 AM Mikael Morin  wrote:
>
> Hello,
>
> Le 31/07/2023 à 19:07, Andrew Pinski via Gcc-patches a écrit :
> > diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> > index a71c0727b0b..ddaf22f2179 100644
> > --- a/gcc/generic-match-head.cc
> > +++ b/gcc/generic-match-head.cc
> > @@ -121,3 +121,45 @@ bitwise_equal_p (tree expr1, tree expr2)
> >   return wi::to_wide (expr1) == wi::to_wide (expr2);
> > return operand_equal_p (expr1, expr2, 0);
> >   }
> > +
> > +/* Return true if EXPR1 and EXPR2 have the bitwise opposite value,
> > +   but not necessarily same type.
> > +   The types can differ through nop conversions.  */
> > +
> > +static inline bool
> > +bitwise_inverted_equal_p (tree expr1, tree expr2)
> > +{
> > +  STRIP_NOPS (expr1);
> > +  STRIP_NOPS (expr2);
> > +  if (expr1 == expr2)
> > +return false;
> > +  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
> > +return false;
> > +  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
> > +return wi::to_wide (expr1) == ~wi::to_wide (expr2);
> > +  if (operand_equal_p (expr1, expr2, 0))
> > +return false;
> > +  if (TREE_CODE (expr1) == BIT_NOT_EXPR
> > +  && bitwise_equal_p (TREE_OPERAND (expr1, 0), expr2))
> > +return true;
> > +  if (TREE_CODE (expr2) == BIT_NOT_EXPR
> > +  && bitwise_equal_p (expr1, TREE_OPERAND (expr2, 0)))
> > +return true;
> > +  if (COMPARISON_CLASS_P (expr1)
> > +  && COMPARISON_CLASS_P (expr2))
> > +{
> > +  tree op10 = TREE_OPERAND (expr1, 0);
> > +  tree op20 = TREE_OPERAND (expr2, 0);
> > +  if (!operand_equal_p (op10, op20))
> > + return false;
> > +  tree op11 = TREE_OPERAND (expr1, 1);
> > +  tree op21 = TREE_OPERAND (expr2, 1);
> > +  if (!operand_equal_p (op11, op21))
> > + return false;
> > +  if (invert_tree_comparison (TREE_CODE (expr1),
> > +   HONOR_NANS (op10))
> > +   == TREE_CODE (expr2))
> > + return true;
>
> So this is trying to match a == b against a != b, or a < b against a >=
> b, or similar; correct?
> Shouldn't this be completed with "crossed" checks, that is match a == b
> against b != a, or a < b against b <= a, etc?  Or is there some
> canonicalization making that redundant?

There is some canonicalization that does happen so you don't need to
do the cross checking.
tree_swap_operands_p defines that order .
In that the lower ssa names are always first operands and constants
are always last.

Thanks,
Andrew


>
> I have given up determining whether these cases were already covered by
> the test or not.
>
> Mikael
>
>


Re: [PATCH] mid-end: Use integral time intervals in timevar.cc

2023-08-03 Thread David Malcolm via Gcc-patches
On Thu, 2023-08-03 at 15:54 +0100, Matthew Malcomson wrote:
> On 8/3/23 15:09, David Malcolm wrote:
> > 
> > Hi Matthew.  I recently touched the timevar code (in r14-2881-
> > g75d623946d4b6e) to add support for serializing the timevar data in
> > JSON form as part of the SARIF output (PR analyzer/109361).
> > 
> > Looking at your patch, it looks like the baseline for the patch
> > seems
> > to predate r14-2881-g75d623946d4b6e.
> > 
> > I don't have a strong opinion on the implementation choices in your
> > patch, but please can you rebase to beyond my recent change and
> > make
> > sure that the SARIF serialization still works with your patch.
> > 
> > Specifically, please try compiling with
> >    -ftime-report -fdiagnostics-format=sarif-file
> > and have a look at the generated .sarif file, e.g. via
> >    python -m json.tool foo.c.sarif
> > which will pretty-print the JSON to stdout.
> > 
> > Currently I'm writing out the values as floating-point seconds, and
> > AFAIK my analyzer integration testsuite [1] is the only consumer of
> > this data.
> 
> Hi David,
> 
> Thanks for the heads-up.  Will update the patch.
> 
> I read your last paragraph as suggesting that you'd be open to
> changing 
> the format.  Is that correct?

I suppose, but I'd prefer to keep the existing format.

> 
> I would initially assume that writing out the time as floating-point 
> seconds would still be most convenient for your use since it looks to
> be 
> like something to be presented to a person.

Yes.  I may be biased in that with -fanalyzer the times tend to be
measured in seconds rather than fractions of seconds, alas.

> 
> However, since I don't know much about the intended uses of SARIF in 
> general I figured I should double-check -- does that choice to remain
> printing out floating-point seconds seem best to you?

I'd prefer to keep the JSON output as floating-point seconds, if that's
not too much of a pain.

Dave



> 
> > 
> > [...snip...]
> > 
> > Thanks
> > Dave
> > [1]
> > https://github.com/davidmalcolm/gcc-analyzer-integration-tests/issues/5
> > 
> 



Re: [PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-03 Thread David Malcolm via Gcc-patches
On Thu, 2023-08-03 at 11:28 -0400, Eric Feng wrote:
> On Wed, Aug 2, 2023 at 5:09 PM David Malcolm 
> wrote:
> > 
> > On Wed, 2023-08-02 at 14:46 -0400, Eric Feng wrote:
> > 

[...snip...]

> > 
> > >  Otherwise, please let me know if I should request write
> > > access first (the GettingStarted page suggested requesting
> > > someone
> > > commit the patch for the first few patches before requesting
> > > write
> > > access).
> > 
> > Please go ahead and request write access now; we should have done
> > this
> > in the "community bonding" phase of GSoC; sorry for not catching
> > this.
> Sounds good.

FWIW once you have an @gcc.gnu.org account, I'd like to set you as the
"assignee" of PR107646 in bugzilla.

[...snip...]

Dave



Re: One question on the source code of tree-object-size.cc

2023-08-03 Thread Siddhesh Poyarekar

On 2023-08-02 10:02, Qing Zhao wrote:

   /*when checking the observed access p->array, we only have info on the
 observed access, i.e, the TYPE_SIZE info from the access. We don't have
 info on the whole object.  */
   expect(__builtin_dynamic_object_size(q->array, 1), q->foo * sizeof(int));
   expect(__builtin_dynamic_object_size(q->array, 0), -1);
   expect(__builtin_dynamic_object_size(q->array, 3), q->foo * sizeof(int));
   expect(__builtin_dynamic_object_size(q->array, 2), 0);
   /*when checking the pointer p, we have no observed allocation nor observed 
access.
 therefore, we cannot determine the size info here.  */
   expect(__builtin_dynamic_object_size(q, 1), -1);
   expect(__builtin_dynamic_object_size(q, 0), -1);
   expect(__builtin_dynamic_object_size(q, 3), 0);
   expect(__builtin_dynamic_object_size(q, 2), 0);


I'm wondering if we could sizeof (*q) + q->foo for __bdos(q, 0), but I 
suppose it could mean generating code that potentially dereferences an 
invalid pointer.  Surely we could emit that for __bdos(q->array, 0) 
though, couldn't we?


Thanks,
Sid


[PATCH] Add documentation for -Wflex-array-member-not-at-end.

2023-08-03 Thread Qing Zhao via Gcc-patches
When adding the option -Wflex-array-member-not-at-end in the commit
https://gcc.gnu.org/pipermail/gcc-cvs/2023-June/385730.html

the documentation for this new option was missing.

This patch is to add the documentation for this warning option.

bootstrapped and also checked the documentation, no issue.

Okay for committing?

thanks.

Qing

==


'-Wflex-array-member-not-at-end'
 Warn when a structure containing a C99 flexible array member as the
 last field is not at the end of another structure.  This warning
 warns e.g.  about

  struct flex  { int length; char data[]; };
  struct mid_flex { int m; struct flex flex_data; int n; };

gcc/ChangeLog:

* doc/invoke.texi (-Wflex-array-member-not-at-end): Document
new option.
---
 gcc/doc/invoke.texi | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index adb10a3528da..0e7d827d355f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -357,6 +357,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wempty-body  -Wno-endif-labels  -Wenum-compare  -Wenum-conversion
 -Wenum-int-mismatch
 -Werror  -Werror=*  -Wexpansion-to-defined  -Wfatal-errors
+-Wflex-array-member-not-at-end
 -Wfloat-conversion  -Wfloat-equal  -Wformat  -Wformat=2
 -Wno-format-contains-nul  -Wno-format-extra-args
 -Wformat-nonliteral  -Wformat-overflow=@var{n}
@@ -9312,6 +9313,18 @@ value, like assigning a signed integer expression to an 
unsigned
 integer variable. An explicit cast silences the warning. In C, this
 option is enabled also by @option{-Wconversion}.
 
+@opindex Wflex-array-member-not-at-end
+@opindex Wno-flex-array-member-not-at-end
+@item -Wflex-array-member-not-at-end
+Warn when a structure containing a C99 flexible array member as the last
+field is not at the end of another structure.
+This warning warns e.g. about
+
+@smallexample
+struct flex  @{ int length; char data[]; @};
+struct mid_flex @{ int m; struct flex flex_data; int n; @};
+@end smallexample
+
 @opindex Wfloat-conversion
 @opindex Wno-float-conversion
 @item -Wfloat-conversion
-- 
2.31.1



  1   2   >