Re: [PATCH] Control flow redundancy hardening

2022-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 7, 2022 at 10:04 PM Alexandre Oliva via Gcc-patches
 wrote:
>
>
> This patch introduces an optional hardening pass to catch unexpected
> execution flows.  Functions are transformed so that basic blocks set a
> bit in an automatic array, and (non-exceptional) function exit edges
> check that the bits in the array represent an expected execution path
> in the CFG.
>
> Functions with multiple exit edges, or with too many blocks, call an
> out-of-line checker builtin implemented in libgcc.  For simpler
> functions, the verification is performed in-line.
>
> -fharden-control-flow-redundancy enables the pass for eligible
> functions, --param hardcfr-max-blocks sets a block count limit for
> functions to be eligible, and --param hardcfr-max-inline-blocks
> tunes the "too many blocks" limit for in-line verification.
>
> Regstrapped on x86_64-linux-gnu.  Also bootstrapped with a patchlet that
> enables it by default, with --param hardcfr-max-blocks=32.  Ok to
> install?

I'm possibly missing the importance of 'redundancy' in -fharden-control-flow
but how can you, from a set of visited blocks local to a function, determine
whether the control flow through the function is "expected" (maybe I'm
missing the fine detail on "expected" as well ;))?  Can you elaborate on
what kind of "derailed" control flow this catches (example?) and what
cases it does not?  I'm also curious as of how this compares to hardware
mitigations like x86 indirect branch tracking and shadow stack and how
this relates to the LLVM control flow hardening (ISTR such thing exists).

Richard.

> for  gcc/ChangeLog
>
> * Makefile.in (OBJS): Add gimple-harden-control-flow.o.
> * builtins.def (BUILT_IN___HARDCFR_CHECK): New.
> * common.opt (fharden-control-flow-redundancy): New.
> * doc/invoke.texi (fharden-control-flow-redundancy): New.
> (hardcfr-max-blocks, hardcfr-max-inline-blocks): New params.
> * gimple-harden-control-flow.cc: New.
> * params.opt (-param=hardcfr-max-blocks=): New.
> (-param=hradcfr-max-inline-blocks=): New.
> * passes.def (pass_harden_control_flow_redundancy): Add.
> * tree-pass.h (make_pass_harden_control_flow_redundancy):
> Declare.
>
> for  gcc/testsuite/ChangeLog
>
> * c-c++-common/torture/harden-cfr.c: New.
> * c-c++-common/torture/harden-abrt.c: New.
> * c-c++-common/torture/harden-bref.c: New.
> * c-c++-common/torture/harden-tail.c: New.
> * gnat.dg/hardcfr.adb: New.
>
> for  libgcc/ChangeLog
>
> * Makefile.in (LIB2ADD): Add hardcfr.c.
> * hardcfr.c: New.
> ---
>  gcc/Makefile.in|1
>  gcc/builtins.def   |3
>  gcc/common.opt |4
>  gcc/doc/invoke.texi|   19 +
>  gcc/gimple-harden-control-flow.cc  |  713 
> 
>  gcc/params.opt |8
>  gcc/passes.def |1
>  .../c-c++-common/torture/harden-cfr-abrt.c |   11
>  .../c-c++-common/torture/harden-cfr-bret.c |   11
>  .../c-c++-common/torture/harden-cfr-tail.c |   17
>  gcc/testsuite/c-c++-common/torture/harden-cfr.c|   81 ++
>  gcc/testsuite/gnat.dg/hardcfr.adb  |   76 ++
>  gcc/tree-pass.h|2
>  libgcc/Makefile.in |3
>  libgcc/hardcfr.c   |  176 +
>  15 files changed, 1126 insertions(+)
>  create mode 100644 gcc/gimple-harden-control-flow.cc
>  create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cfr-abrt.c
>  create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cfr-bret.c
>  create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cfr-tail.c
>  create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cfr.c
>  create mode 100644 gcc/testsuite/gnat.dg/hardcfr.adb
>  create mode 100644 libgcc/hardcfr.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 3ae237024265c..2a15e6ecf0802 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1403,6 +1403,7 @@ OBJS = \
> gimple-iterator.o \
> gimple-fold.o \
> gimple-harden-conditionals.o \
> +   gimple-harden-control-flow.o \
> gimple-laddress.o \
> gimple-loop-interchange.o \
> gimple-loop-jam.o \
> diff --git a/gcc/builtins.def b/gcc/builtins.def
> index 005976f34e913..b987f9af425fd 100644
> --- a/gcc/builtins.def
> +++ b/gcc/builtins.def
> @@ -1055,6 +1055,9 @@ DEF_GCC_BUILTIN (BUILT_IN_FILE, "FILE", 
> BT_FN_CONST_STRING, ATTR_NOTHROW_LEAF_LI
>  DEF_GCC_BUILTIN (BUILT_IN_FUNCTION, "FUNCTION", BT_FN_CONST_STRING, 
> ATTR_NOTHROW_LEAF_LIST)
>  DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, ATTR_NOTHROW_LEAF_LIST)
>
> +/* Control Flow Redundancy hardening out-of-line checker.  */
> +DEF_BU

Re: [RFA] Improve initialization of objects when the initializer has trailing zeros.

2022-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 7, 2022 at 4:46 PM Jeff Law via Gcc-patches
 wrote:
>
> This is an update to a patch originally posted by Takayuki Suwa a few
> months ago.
>
> When we initialize an array from a STRING_CST we perform the
> initialization in two steps.  The first step copies the STRING_CST to
> the destination.  The second step uses clear_storage to initialize
> storage in the array beyond TREE_STRING_LENGTH of the initializer.
>
> Takayuki's patch added a special case when the STRING_CST itself was all
> zeros which would avoid the copy from the STRING_CST and instead do all
> the initialization via clear_storage which is clearly more runtime
> efficient.
>
> Richie had the suggestion that instead of special casing when the entire
> STRING_CST was NULs  to instead identify when the tail of the STRING_CST
> was NULs.   That's more general and handles Takayuki's case as well.
>
> Bootstrapped and regression tested on x86_64-linux-gnu.  Given I rewrote
> Takayuki's patch I think it needs someone else to review rather than
> self-approving.
>
> OK for the trunk?

OK.

Thanks,
Richard.

> Jeff
>


[x86 PATCH] Fun with flags: Adding stc/clc instructions to i386.md.

2022-07-08 Thread Roger Sayle

This patch adds support for x86's single-byte encoded stc (set carry flag)
and clc (clear carry flag) instructions to i386.md.

The motivating example is the simple code snippet:

unsigned int foo (unsigned int a, unsigned int b, unsigned int *c)
{
  return __builtin_ia32_addcarryx_u32 (1, a, b, c);
}

which uses the target built-in to generate an adc instruction, adding
together A and B with the incoming carry flag already set.  Currently
for this mainline GCC generates (with -O2):

movl$1, %eax
addb$-1, %al
adcl%esi, %edi
setc%al
movl%edi, (%rdx)
movzbl  %al, %eax
ret

where the first two instructions (to load 1 into a byte register and
then add 255 to it) are the idiom used to set the carry flag.  This
is a little inefficient as x86 has a "stc" instruction for precisely
this purpose.  With the attached patch we now generate:

stc
adcl%esi, %edi
setc%al
movl%edi, (%rdx)
movzbl  %al, %eax
ret

The central part of the patch is the addition of x86_stc and x86_clc
define_insns, represented as "(set (reg:CCC FLAGS_REG) (const_int 1))"
and "(set (reg:CCC FLAGS_REG) (const_int 0))" respectively, then using
x86_stc appropriately in the ix86_expand_builtin.

Alas this change exposes two latent bugs/issues in the compiler.
The first is that there are several peephole2s in i386.md that propagate
the flags register, but take its mode from the SET_SRC rather than
preserve the mode of the original SET_DEST.  The other, which is
being discussed with Segher, is that the middle-end's simplify-rtx
inappropriately tries to interpret/optimize MODE_CC comparisons,
converting the above adc into an add, as it mistakenly believes
(ltu:SI (const_int 1) (const_int 0))" is always const0_rtx even when
the mode of the comparison is MODE_CCC.

I believe Segher will review (and hopefully approve) the middle-end
chunk of this patch independently, but hopefully this backend patch
provides the necessary context to explain why that change is needed.


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32} with
no new failures.  Ok for mainline?


2022-07-08  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_builtin) :
Use new x86_stc or negqi_ccc_1 instructions to set the carry flag.
* config/i386/i386.md (x86_clc): New define_insn.
(x86_stc): Likewise, new define_insn to set the carry flag.
(*setcc_qi_negqi_ccc_1_): New define_insn_and_split to
recognize (and eliminate) the carry flag being copied to itself.
(neg_ccc_1): Renamed from *neg_ccc_1 for gen function.
(define_peephole2): Use match_operand of flags_reg_operand to
capture and preserve the mode of FLAGS_REG.
(define_peephole2): Likewise.

* simplify-rtx.cc (simplify_const_relational_operation): Handle
case where both operands of a MODE_CC comparison have been
simplified to constant integers.

gcc/testsuite/ChangeLog
* gcc.target/i386/stc-1.c: New test case.


Thanks in advance (both Uros and Segher),
Roger
--

> -Original Message-
> From: Segher Boessenkool 
> Sent: 07 July 2022 23:39
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Be careful with MODE_CC in
> simplify_const_relational_operation.
> 
> Hi!
> 
> On Thu, Jul 07, 2022 at 10:08:04PM +0100, Roger Sayle wrote:
> > I think it's fair to describe RTL's representation of condition flags
> > using MODE_CC as a little counter-intuitive.
> 
> "A little challenging", and you should see that as a good thing, as a
puzzle to
> crack :-)
> 
> > For example, the i386
> > backend represents the carry flag (in adc instructions) using RTL of
> > the form "(ltu:SI (reg:CCC) (const_int 0))", where great care needs to
> > be taken not to treat this like a normal RTX expression, after all LTU
> > (less-than-unsigned) against const0_rtx would normally always be
> > false.
> 
> A comparison of a MODE_CC thing against 0 means the result of a
> *previous* comparison (or other cc setter) is looked at.  Usually it
simply looks
> at some condition bits in a flags register.  It does not do any actual
comparison:
> that has been done before (if at all even).
> 
> > Hence, MODE_CC comparisons need to be treated with caution, and
> > simplify_const_relational_operation returns early (to avoid
> > problems) when GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC.
> 
> Not just to avoid problems: there simply isn't enough information to do a
> correct job.
> 
> > However, consider the (currently) hypothetical situation, where the
> > RTL optimizers determine that a previous instruction unconditionally
> > sets or clears the carry flag, and this gets propagated by combine
> > into the above expression, we'd end up with something that looks like
> > (ltu:SI (const_int 1) (const_int 0))

Re: [PATCH] Fix tree-opt/PR106087: ICE with inline-asm with multiple output and assigned only static vars

2022-07-08 Thread Richard Biener via Gcc-patches
On Fri, Jul 8, 2022 at 6:33 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> The problem here is that when we mark the ssa name that was referenced in the 
> now removed
> dead store (to a write only static variable), the inline-asm would also be 
> removed
> even though it was defining another ssa name. This fixes the problem by 
> checking
> to make sure that the statement was only defining one ssa name.
>
> OK? Bootstrapped and tested on x86_64 with no regressions.
>
> PR tree-optimization/106087
>
> gcc/ChangeLog:
>
> * tree-ssa-dce.cc (simple_dce_from_worklist): Check
> to make sure the statement is only defining one operand.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/compile/inline-asm-1.c: New test.
> ---
>  gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c | 14 ++
>  gcc/tree-ssa-dce.cc|  5 +
>  2 files changed, 19 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c
>
> diff --git a/gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c 
> b/gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c
> new file mode 100644
> index 000..0044cb761b6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c
> @@ -0,0 +1,14 @@
> +/* PR tree-opt/106087,
> +   simple_dce_from_worklist would delete the
> +   inline-asm when it was still being referenced
> +   by the other ssa name. */
> +
> +static int t;
> +
> +int f(void)
> +{
> +  int tt, tt1;
> +  asm("":"=r"(tt), "=r"(tt1));
> +  t = tt1;
> +  return tt;
> +}
> diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
> index bc533582673..602cdb30ceb 100644
> --- a/gcc/tree-ssa-dce.cc
> +++ b/gcc/tree-ssa-dce.cc
> @@ -2061,6 +2061,11 @@ simple_dce_from_worklist (bitmap worklist)
>if (gimple_has_side_effects (t))
> continue;
>
> +  /* The defining statement needs to be defining one this name. */

only this name?

> +  if (!is_a(t)

I suppose we could turn this into is_a(t) since that's
the only stmt kind with multiple (non-virtual) definitions?

OK with those changes.

> + && !single_ssa_def_operand (t, SSA_OP_DEF))
> +   continue;
> +
>/* Don't remove statements that are needed for non-call
>  eh to work.  */
>if (stmt_unremovable_because_of_non_call_eh_p (cfun, t))
> --
> 2.17.1
>


RE: [PATCH]middle-end: don't lower past veclower [PR106063]

2022-07-08 Thread Richard Biener via Gcc-patches
On Fri, 8 Jul 2022, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, July 7, 2022 8:47 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: RE: [PATCH]middle-end: don't lower past veclower [PR106063]
> > 
> > On Thu, 7 Jul 2022, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Thursday, July 7, 2022 8:19 AM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd 
> > > > Subject: Re: [PATCH]middle-end: don't lower past veclower [PR106063]
> > > >
> > > > On Tue, 5 Jul 2022, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > My previous patch can cause a problem if the pattern matches after
> > > > > veclower as it may replace the construct with a vector sequence
> > > > > which the target may not directly support.
> > > > >
> > > > > As such don't perform the rewriting if after veclower.
> > > >
> > > > Note that when doing the rewriting before veclower to a variant not
> > > > supported by the target can cause veclower to generate absymal code.
> > > > In some cases we are very careful and try to at least preserve code
> > > > supported by the target over transforming that into a variant not
> > supported.
> > > >
> > > > That said, a better fix would be to check whether the target can
> > > > perform the new comparison.  Before veclower it would be OK to do
> > > > the transform nevertheless in case it cannot do the original transform.
> > >
> > > This last statement is somewhat confusing. Did you want me to change
> > > it such that before veclower the rewrite is always done and after
> > > veclowering only if the target supports it?
> > >
> > > Or did you want me to never do the rewrite if the target doesn't support 
> > > it?
> > 
> > I meant before veclower you can do the rewrite if either the rewriting 
> > result
> > is supported by the target OR if the original expression is _not_ supported 
> > by
> > the target.  The latter case might be not too important to worry doing (it
> > would still canonicalize for those targets then).  After veclower you can 
> > only
> > rewrite under the former condition.
> > 
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> 
> Ok for master? and backport to GCC 12?

OK for master, backport to GCC 12 after a few days of soaking.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> 
> gcc/ChangeLog:
> 
> PR tree-optimization/106063
>   * match.pd: Only rewrite if target support it.
> 
> gcc/testsuite/ChangeLog:
> 
> PR tree-optimization/106063
>   * gcc.dg/pr106063.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 40c09bedadb89dabb6622559a8f69df5384e61fd..5800a105c3cdada9d5e1d8019176ebbe5969ccb0
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6041,10 +6041,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
>  (with { tree csts = bitmask_inv_cst_vector_p (@1); }
>   (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> -  (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
> -   (icmp @0 { csts; })
> -   (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
> -  (icmp (view_convert:utype @0) { csts; }
> +  (with { auto optab = VECTOR_TYPE_P (TREE_TYPE (@1))
> +  ? optab_vector : optab_default;
> +   tree utype = unsigned_type_for (TREE_TYPE (@1)); }
> +   (if (target_supports_op_p (utype, icmp, optab)
> + || (optimize_vectors_before_lowering_p ()
> + && (!target_supports_op_p (type, cmp, optab)
> + || !target_supports_op_p (type, BIT_AND_EXPR, optab
> + (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
> +  (icmp @0 { csts; })
> +  (icmp (view_convert:utype @0) { csts; })
>  
>  /* When one argument is a constant, overflow detection can be simplified.
> Currently restricted to single use so as not to interfere too much with
> diff --git a/gcc/testsuite/gcc.dg/pr106063.c b/gcc/testsuite/gcc.dg/pr106063.c
> new file mode 100644
> index 
> ..b23596724f6bb98c53af2dce77d31509bab10378
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr106063.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-tree-forwprop --disable-tree-evrp" } */
> +typedef __int128 __attribute__((__vector_size__ (16))) V;
> +
> +V
> +foo (V v)
> +{
> +  return (v & (V){15}) == v;
> +}
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstra


Re: [PATCH/RFC] combine_completed global variable.

2022-07-08 Thread Kewen.Lin via Gcc-patches
Hi Roger,

on 2022/7/8 03:40, Roger Sayle wrote:
> 
> Hi Kewen (and Segher),
> Many thanks for stress testing my patch to improve multiplication
> by integer constants on rs6000 by using the rldmi instruction.
> Although I've not been able to reproduce your ICE (using gcc135
> on the compile farm), I completely agree with Segher's analysis
> that the Achilles heel with my approach/patch is that there's
> currently no way for the backend/recog to know that we're in a
> pass after combine.
> 

It's weird that it can't be reproduced on your side, did you try
with -m32 explicitly?  Sorry that I didn't say the used options
clearly in the previous reply, they are "-O2 -m32".

> Rather than give up on this optimization (and a similar one for
> I386.md where test;sete can be replaced by xor $1 when combine
> knows that nonzero_bits is 1, but loses that information afterwards),
> I thought I'd post this "strawman" proposal to add a combine_completed
> global variable, matching the reload_completed and regstack_completed
> global variables already used (to track progress) by the middle-end.
> 
> I was wondering if I could ask you could test the attached patch
> in combination with my previous rs6000.md patch (with the obvious
> change of reload_completed to combine_completed) to confirm
> that it fixes the problems you were seeing.

I just had a try, it still failed.  I checked the unrecognizable
pattern and the original patch, I guessed it needs a tiny adjustment
like below:

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index dde123e87b8..0a089f12510 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4216,7 +4216,7 @@ (define_insn_and_split "*rotl3_insert_3b_"
  (match_operand:SI 2 "const_int_operand" "n"))
  (match_operand:GPR 3 "gpc_reg_operand" "0")))]
   "INTVAL (operands[2]) > 0
-   && INTVAL (operands[2]) < 64
+   && INTVAL (operands[2]) < GET_MODE_PRECISION (mode)
&& ((nonzero_bits (operands[3], mode)
< HOST_WIDE_INT_1U << INTVAL (operands[2]))
|| combine_completed)"

the hardcoded value 64 is too big for SImode in the failure, it seems
we should use the mode precision instead?  I confirmed the failures are
gone with this proposal and the tiny change.

BR,
Kewen


Re: [x86 PATCH] Fun with flags: Adding stc/clc instructions to i386.md.

2022-07-08 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 8, 2022 at 9:15 AM Roger Sayle  wrote:
>
>
> This patch adds support for x86's single-byte encoded stc (set carry flag)
> and clc (clear carry flag) instructions to i386.md.
>
> The motivating example is the simple code snippet:
>
> unsigned int foo (unsigned int a, unsigned int b, unsigned int *c)
> {
>   return __builtin_ia32_addcarryx_u32 (1, a, b, c);
> }
>
> which uses the target built-in to generate an adc instruction, adding
> together A and B with the incoming carry flag already set.  Currently
> for this mainline GCC generates (with -O2):
>
> movl$1, %eax
> addb$-1, %al
> adcl%esi, %edi
> setc%al
> movl%edi, (%rdx)
> movzbl  %al, %eax
> ret
>
> where the first two instructions (to load 1 into a byte register and
> then add 255 to it) are the idiom used to set the carry flag.  This
> is a little inefficient as x86 has a "stc" instruction for precisely
> this purpose.  With the attached patch we now generate:
>
> stc
> adcl%esi, %edi
> setc%al
> movl%edi, (%rdx)
> movzbl  %al, %eax
> ret
>
> The central part of the patch is the addition of x86_stc and x86_clc
> define_insns, represented as "(set (reg:CCC FLAGS_REG) (const_int 1))"
> and "(set (reg:CCC FLAGS_REG) (const_int 0))" respectively, then using
> x86_stc appropriately in the ix86_expand_builtin.
>
> Alas this change exposes two latent bugs/issues in the compiler.
> The first is that there are several peephole2s in i386.md that propagate
> the flags register, but take its mode from the SET_SRC rather than
> preserve the mode of the original SET_DEST.  The other, which is
> being discussed with Segher, is that the middle-end's simplify-rtx
> inappropriately tries to interpret/optimize MODE_CC comparisons,
> converting the above adc into an add, as it mistakenly believes
> (ltu:SI (const_int 1) (const_int 0))" is always const0_rtx even when
> the mode of the comparison is MODE_CCC.
>
> I believe Segher will review (and hopefully approve) the middle-end
> chunk of this patch independently, but hopefully this backend patch
> provides the necessary context to explain why that change is needed.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32} with
> no new failures.  Ok for mainline?
>
>
> 2022-07-08  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_expand_builtin) :
> Use new x86_stc or negqi_ccc_1 instructions to set the carry flag.
> * config/i386/i386.md (x86_clc): New define_insn.
> (x86_stc): Likewise, new define_insn to set the carry flag.
> (*setcc_qi_negqi_ccc_1_): New define_insn_and_split to
> recognize (and eliminate) the carry flag being copied to itself.
> (neg_ccc_1): Renamed from *neg_ccc_1 for gen function.
> (define_peephole2): Use match_operand of flags_reg_operand to
> capture and preserve the mode of FLAGS_REG.
> (define_peephole2): Likewise.
>
> * simplify-rtx.cc (simplify_const_relational_operation): Handle
> case where both operands of a MODE_CC comparison have been
> simplified to constant integers.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/stc-1.c: New test case.

Please split out the peephole2 part of the patch. This part is
pre-approved and should be committed independently of the main part of
the patch.

Thanks,
Uros.

>
>
> Thanks in advance (both Uros and Segher),
> Roger
> --
>
> > -Original Message-
> > From: Segher Boessenkool 
> > Sent: 07 July 2022 23:39
> > To: Roger Sayle 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] Be careful with MODE_CC in
> > simplify_const_relational_operation.
> >
> > Hi!
> >
> > On Thu, Jul 07, 2022 at 10:08:04PM +0100, Roger Sayle wrote:
> > > I think it's fair to describe RTL's representation of condition flags
> > > using MODE_CC as a little counter-intuitive.
> >
> > "A little challenging", and you should see that as a good thing, as a
> puzzle to
> > crack :-)
> >
> > > For example, the i386
> > > backend represents the carry flag (in adc instructions) using RTL of
> > > the form "(ltu:SI (reg:CCC) (const_int 0))", where great care needs to
> > > be taken not to treat this like a normal RTX expression, after all LTU
> > > (less-than-unsigned) against const0_rtx would normally always be
> > > false.
> >
> > A comparison of a MODE_CC thing against 0 means the result of a
> > *previous* comparison (or other cc setter) is looked at.  Usually it
> simply looks
> > at some condition bits in a flags register.  It does not do any actual
> comparison:
> > that has been done before (if at all even).
> >
> > > Hence, MODE_CC comparisons need to be treated with caution, and
> > > simplify_const_relational_operation returns early (to avoid
> > > problems) when GET_MODE_CLASS (GET_MODE (op0)) == MODE_

Re: [x86 PATCH] Fun with flags: Adding stc/clc instructions to i386.md.

2022-07-08 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 8, 2022 at 9:15 AM Roger Sayle  wrote:
>
>
> This patch adds support for x86's single-byte encoded stc (set carry flag)
> and clc (clear carry flag) instructions to i386.md.
>
> The motivating example is the simple code snippet:
>
> unsigned int foo (unsigned int a, unsigned int b, unsigned int *c)
> {
>   return __builtin_ia32_addcarryx_u32 (1, a, b, c);
> }
>
> which uses the target built-in to generate an adc instruction, adding
> together A and B with the incoming carry flag already set.  Currently
> for this mainline GCC generates (with -O2):
>
> movl$1, %eax
> addb$-1, %al
> adcl%esi, %edi
> setc%al
> movl%edi, (%rdx)
> movzbl  %al, %eax
> ret
>
> where the first two instructions (to load 1 into a byte register and
> then add 255 to it) are the idiom used to set the carry flag.  This
> is a little inefficient as x86 has a "stc" instruction for precisely
> this purpose.  With the attached patch we now generate:
>
> stc
> adcl%esi, %edi
> setc%al
> movl%edi, (%rdx)
> movzbl  %al, %eax
> ret

Please note that STC/CLC is quite unoptimal on some older
architectures. For example, Pentium4 has a latency of 10 due to false
dependency of flags [1].

[1] https://agner.org/optimize/instruction_tables.pdf


Uros.


Re: [PATCH 3/3] lto-plugin: implement LDPT_GET_API_VERSION

2022-07-08 Thread Martin Liška
Hi.

All right, there's updated version of the patch that reflects the following 
suggestions:

1) strings are used for version identification
2) thread-safe API version (1) is not used if target does not support locking 
via pthreads

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

On 7/7/22 04:19, Rui Ueyama wrote:
> On Mon, Jul 4, 2022 at 10:17 PM Martin Liška  > wrote:
> 
> On 7/1/22 08:36, Richard Biener wrote:
> > On Thu, Jun 30, 2022 at 10:42 AM Martin Liška  > wrote:
> >>
> >> On 6/30/22 08:43, Rui Ueyama wrote:
> >>> Thanks Martin for creating this patch.
> >>
> >> You're welcome.
> >>
> >>>
> >>> Here is a preliminary change for the mold side: 
> https://github.com/rui314/mold/commit/9ad49d1c556bc963d06cca8233535183490de605
>  
> 
>  
>   
> >
> >>>
> >>> Overall the API is looking fine,
> >>
> >> Good then!
> >>
> >>> though it is not clear what kind of value is expected as a linker 
> version. A linker version is not a single unsigned integer but something like 
> "1.3.0". Something like "1.3.0-rc2" can also be a linker version. So I don't 
> think we can represent a linker version as a single integer.
> >>
> >> Well, you can use the same what we use GCC_VERSION (plugin_version):
> >>
> >> 1000 * MAJOR + MINOR
> >>
> >> Let me adjust the documentation of the API.
> >
> > Hmm, but then why not go back to the original suggestion merging
> > linker_identifier and linker_version into
> > a single string.  That of course puts the burden of parsing to the
> > consumer - still that's probably better
> > than imposing the constraint of encoding the version in an unsigned
> > integer.  Alternatively easing
> > parsing by separating out the version in a string would be possible as
> > well (but then you'd have
> > to care for 1.3.0-rc2+gitab4316174 or so, not sure what the advantage
> > over putting everything in
> > the identifier would be).
> 
> I'm fine with the suggested 2 strings (linker_identifier and 
> linker_version).
> 
> Does it work for you Rui?
> 
> 
> Yes.
>  
> 
> Cheers,
> Martin
> 
> >
> > You usually cannot rely on a version anyway since distributors usually
> > apply patches.
> >
> >> Richi: May I install the patch?
> >
> > Let's sort out the version thing and consider simplifying the API.
> >
> > Richard.
> >
> >> Thanks,
> >> Martin
> >>
> >>>
> >>> On Mon, Jun 20, 2022 at 9:01 PM Martin Liška   >> 
> wrote:
> >>>
> >>>     On 6/20/22 11:35, Richard Biener wrote:
> >>>     > I think this is OK.  Can we get buy-in from mold people?
> >>>
> >>>     Sure, I've just pinged Rui:
> >>>     https://github.com/rui314/mold/issues/454#issuecomment-1160419030 
>  
>  >
> >>>
> >>>     Martin
> >>>
> 
From c105ee05439929f1c1fd22d15f56cf398b5a8a0d Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 16 May 2022 14:01:52 +0200
Subject: [PATCH] lto-plugin: implement LDPT_GET_API_VERSION

include/ChangeLog:

	* plugin-api.h (enum linker_api_version): New enum.
	(ld_plugin_get_api_version): New.
	(enum ld_plugin_tag): Add LDPT_GET_API_VERSION.
	(struct ld_plugin_tv): Add tv_get_api_version.

lto-plugin/ChangeLog:

	* lto-plugin.c (negotiate_api_version): New.
	(onload): Negotiate API version.
	* Makefile.am: Add -DBASE_VERSION.
	* Makefile.in: Regenerate.
---
 include/plugin-api.h| 32 +++
 lto-plugin/Makefile.am  |  2 +-
 lto-plugin/Makefile.in  |  2 +-
 lto-plugin/lto-plugin.c | 42 +
 4 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/include/plugin-api.h b/include/plugin-api.h
index 8aebe2ff267..1deda680081 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -483,6 +483,36 @@ enum ld_plugin_level
   LDPL_FATAL
 };
 
+/* Contract between a plug-in and a linker.  */
+
+enum linker_api_version
+{
+   /* The linker/plugin do not implement any of the API levels below, the API
+   is determined solely via the transfer vector.  */
+   LAPI_UNSPECIFIED = 0,
+
+   /* API level v1.  The linker provides get_symbols_v3, add_symbols_v2,
+  the plugin will use that and not any lower versions.
+  claim_file is thread-safe on the plugin side and
+

Re: [PATCH 08/17] openmp: -foffload-memory=pinned

2022-07-08 Thread Tobias Burnus

On 08.07.22 00:18, Andrew Stubbs wrote:

Likewise, the 'requires' mechanism could then also be used in '[PATCH
16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'.


No, I don't think so; that environment variable needs to be set before
the libraries are loaded or it's too late.  There are other ways to
achieve the same thing, by leaving messages for the libgomp plugin to
pick up, perhaps, but it's all extra complexity for no real gain.


I think we talk about two different things:


(a) where (and when) to check/set the environment variable. I think this
part is fine. You could consider moving the generated code for
'configure_xnack' code into the existing 'init' constructor function,
but it does not really matter. (Nor does the order in which the
constructor function runs.)

(I also do not see any benefit of moving it to libgomp. The message
could then be suppressed if no device available or similar tricky, but I
do not see any real advantage of moving it.)

Longer side note: I think the message "error: HSA_XNACK=%%s is
incompatible; please unset" could be clearer. Both in terms who issues
it and that it talks about an environment variable. Maybe:

"|libgomp: fatal error: Environment variable HSA_XNACK=%s is
incompatible with GCN offloading; please unset"|

|or something like that. (I did misuse 'libgomp:' for this; I am not
sure that makes sense or is even more misleading.) – I am also not sure
GCN fits that well, given that CDNA is not GCN. But that is a general
problem. But in any case, adding "fatal", "environment variable" and ...
offloading makes surely sense, IMHO.
|


(b) How the value is made available inside both gcc/config/gcn/gcn.cc
and in mkoffload.cc.

I was talking about (b). Namely:

omp_requires_mask is already available in gcc/config/gcn/gcn.cc and
mkoffload.cc. Thus, there is no reason to reinvent the wheel and coming
up with another means to pass the same kind of data to the very same files.

(You still might want to add another flag to it (assuming 'omp requires
unified_shared_memory' alias OMP_REQUIRES_UNIFIED_SHARED_MEMORY is
insufficient.)

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH 08/17] openmp: -foffload-memory=pinned

2022-07-08 Thread Andrew Stubbs

On 08/07/2022 10:00, Tobias Burnus wrote:

On 08.07.22 00:18, Andrew Stubbs wrote:
Likewise, the 'requires' mechanism could then also be used in '[PATCH 
16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'.


No, I don't think so; that environment variable needs to be set before 
the libraries are loaded or it's too late.  There are other ways to 
achieve the same thing, by leaving messages for the libgomp plugin to 
pick up, perhaps, but it's all extra complexity for no real gain. 


I think we talk about two different things:


(a) where (and when) to check/set the environment variable. I think this 
part is fine. You could consider moving the generated code for 
'configure_xnack' code into the existing 'init' constructor function, 
but it does not really matter. (Nor does the order in which the 
constructor function runs.)


(I also do not see any benefit of moving it to libgomp. The message 
could then be suppressed if no device available or similar tricky, but I 
do not see any real advantage of moving it.)


Longer side note: I think the message "error: HSA_XNACK=%%s is 
incompatible; please unset" could be clearer. Both in terms who issues 
it and that it talks about an environment variable. Maybe:


"|libgomp: fatal error: Environment variable HSA_XNACK=%s is 
incompatible with GCN offloading; please unset"|


|or something like that. (I did misuse 'libgomp:' for this; I am not 
sure that makes sense or is even more misleading.) – I am also not sure 
GCN fits that well, given that CDNA is not GCN. But that is a general
problem. But in any case, adding "fatal", "environment variable" and ... 
offloading makes surely sense, IMHO.


It's not incompatible with GCN offloading, only with the XNACK mode in 
which the binary was compiled (i.e. USM on or off).


The message could be less terse, indeed. I went through a variety of 
messages for this and couldn't find one that I liked. How about...


  fatal error: HSA_XNACK=%s is set but this program was compiled for 
HSA_XNACK=%s; please unset your environment variable.


(b) How the value is made available inside both gcc/config/gcn/gcn.cc 
and in mkoffload.cc.


I was talking about (b). Namely:

omp_requires_mask is already available in gcc/config/gcn/gcn.cc and 
mkoffload.cc. Thus, there is no reason to reinvent the wheel and coming 
up with another means to pass the same kind of data to the very same files.


(You still might want to add another flag to it (assuming 'omp requires 
unified_shared_memory' alias OMP_REQUIRES_UNIFIED_SHARED_MEMORY is 
insufficient.)


OK, this is a new feature that I probably should investigate.

Thanks

Andrew



Re: [PATCH 08/17] openmp: -foffload-memory=pinned

2022-07-08 Thread Tobias Burnus

On 08.07.22 11:55, Andrew Stubbs wrote:

It's not incompatible with GCN offloading, only with the XNACK mode in
which the binary was compiled (i.e. USM on or off).

The message could be less terse, indeed. I went through a variety of
messages for this and couldn't find one that I liked. How about...

  fatal error: HSA_XNACK=%s is set but this program was compiled for
HSA_XNACK=%s; please unset your environment variable.


For what it is worth: I like this message.

Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] tree-optimization/106226 - move vectorizer virtual SSA update

2022-07-08 Thread Richard Biener via Gcc-patches
When we knowingly have broken virtual SSA form we need to update
it before we eventually perform slpeel manual updating which will
call delete_update_ssa.  Currently that's done on-demand but
communicating whether it's a known unavoidable case is broken
there.  The following makes that a synchronous operation but
instead of actually performing the update we instead recod the
need, clear the update SSA sub-state and force virtual renaming
at the very end of the vectorization pass.

This reorg exposes an issue like with LOAD_LANES in simd-clone
handling which I patch up in a similar fashion.  So it might
uncover new latent issues elsewhere.

Bootstrap / regtest re-running on x86_64-unknown-linux-gnu
after fixing the simdclode issue.

Richard.

PR tree-optimization/106226
* tree-vect-loop-manip.cc (vect_do_peeling): Assert that
no SSA update is needed.  Move virtual SSA update ...
* tree-vectorizer.cc (pass_vectorize::execute): ... here,
via forced virtual renaming when TODO_update_ssa_only_virtuals
is queued.
(vect_transform_loops): Return TODO_update_ssa_only_virtuals
when virtual SSA update is required.
(try_vectorize_loop_1): Adjust.
* tree-vect-stmts.c (vectorizable_simd_clone_call): Allow
virtual renaming if the ABI forces an aggregate return
but the original call did not have a virtual definition.

* gfortran.dg/pr106226.f: New testcase.
---
 gcc/testsuite/gfortran.dg/pr106226.f | 37 
 gcc/tree-vect-loop-manip.cc  | 11 -
 gcc/tree-vect-stmts.cc   |  8 ++
 gcc/tree-vectorizer.cc   | 29 +++---
 4 files changed, 76 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr106226.f

diff --git a/gcc/testsuite/gfortran.dg/pr106226.f 
b/gcc/testsuite/gfortran.dg/pr106226.f
new file mode 100644
index 000..19237bc5a71
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr106226.f
@@ -0,0 +1,37 @@
+! { dg-do compile }
+! { dg-options "-O3 -std=legacy" }
+
+  SUBROUTINE EFTORD(DM,CHDINT,L4)
+  IMPLICIT DOUBLE PRECISION (A-H,O-Z)
+  PARAMETER (MXPT=100, MXFRG=50, MXFGPT=MXPT*MXFRG)
+  DIMENSION DM(*),CHDINT(L4)
+  COMMON /FGRAD / DEF0,DEFT0,TORQ0
+ *,ATORQ(3,MXFRG)
+  COMMON /CSSTV / CX,CY,CZ
+ *EFBTRM(MXFGPT),EFATRM2(MXFGPT),EFBTRM2(MXFGPT),
+ *EFDIP(3,MXFGPT),EFQAD(6,MXFGPT),
+ *EFOCT(10,MXFGPT),FRGNME(MXFGPT)
+  IF(NROOTS.EQ.5) CALL ROOT5
+  IF(NROOTS.EQ.6) CALL ROOT6
+  IF(NROOTS.GE.7) THEN
+ CALL ABRT
+  END IF
+  DO 403 I = 1,IJ
+  CHDINT(ICC)=CHDINT(ICC)-DUM*DUMY
+  ICC=ICC+1
+ 403  CONTINUE
+  CHDINT(ICC)=CHDINT(ICC)-DUM*DUMY
+  DO 550 J=MINJ,MAX
+  LJ=LOCJ+J
+  IF (LI-LJ) 920,940,940
+  920 ID = LJ
+  GO TO 960
+  940 ID = LI
+  960 NN = (ID*(ID-1))/2+JD
+  DUM = DM(NN)
+  ATORQ(1,INF)=ATORQ(1,INF)-DUM*(CHDINT(ICC+1)*EFDIP(3,IC)
+ $   -CHDINT(ICC+2)*EFDIP(2,IC))
+  ICC=ICC+1
+  ICC=ICC+1
+  550 CONTINUE
+  END
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index d7410d7b4bd..2c2b4f7bd53 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2696,12 +2696,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
   class loop *first_loop = loop;
   bool irred_flag = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP;
 
-  /* We should not have to update virtual SSA form here but some
- transforms involve creating new virtual definitions which makes
- updating difficult.  */
-  gcc_assert (!need_ssa_update_p (cfun)
- || loop_vinfo->any_known_not_updated_vssa);
-  update_ssa (TODO_update_ssa_only_virtuals);
+  /* SSA form needs to be up-to-date since we are going to manually
+ update SSA form in slpeel_tree_duplicate_loop_to_edge_cfg and delete all
+ update SSA state after that, so we have to make sure to not lose any
+ pending update needs.  */
+  gcc_assert (!need_ssa_update_p (cfun));
 
   create_lcssa_for_virtual_phi (loop);
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 3db6620dd42..01d982eea98 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4247,6 +4247,14 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   if (!vec_stmt) /* transformation not required.  */
 {
+  /* When the original call is pure or const but the SIMD ABI dictates
+an aggregate return we will have to use a virtual definition and
+in a loop eventually even need to add a virtual PHI.  That's
+not straight-forward so allow to fix this up via renaming.  */
+  if (gimple_call_lhs (stmt)
+ && !gimple_vdef (stmt)
+ && TREE_CODE (TREE_TYPE (TREE_TYPE (bestn->decl))) == ARRAY_TYPE)
+   vinfo->any_known_not_updated_

Re: [PATCH] lto-dump: Do not print output file

2022-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 7, 2022 at 3:05 PM Martin Liška  wrote:
>
> Right now the following is printed:
>
> lto-dump
> .file   ""
> .ident  "GCC: (GNU) 13.0.0 20220707 (experimental)"
> .section.note.GNU-stack,"",@progbits
>
> After the patch we print -help and do not emit any assembly output:
>
> lto-dump
> Usage: lto-dump [OPTION]... SUB_COMMAND [OPTION]...
>
> LTO dump tool command line options.
>
>   -list [options]   Dump the symbol list.
> -demangle   Dump the demangled output.
> -defined-only   Dump only the defined symbols.
> ...
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK

> Thanks,
> Martin
>
> gcc/lto/ChangeLog:
>
> * lto-dump.cc (lto_main): Exit in the function
> as we don't want any LTO bytecode processing.
>
> gcc/ChangeLog:
>
> * toplev.cc (init_asm_output): Do not init asm_out_file.
> ---
>  gcc/lto/lto-dump.cc | 16 ++--
>  gcc/toplev.cc   |  2 +-
>  2 files changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/lto/lto-dump.cc b/gcc/lto/lto-dump.cc
> index f88486b5143..f3d852df51f 100644
> --- a/gcc/lto/lto-dump.cc
> +++ b/gcc/lto/lto-dump.cc
> @@ -316,7 +316,10 @@ lto_main (void)
>  {
>quiet_flag = true;
>if (flag_lto_dump_tool_help)
> -dump_tool_help ();
> +{
> +  dump_tool_help ();
> +  exit (SUCCESS_EXIT_CODE);
> +}
>
>/* LTO is called as a front end, even though it is not a front end.
>   Because it is called as a front end, TV_PHASE_PARSING and
> @@ -369,11 +372,12 @@ lto_main (void)
>  {
>/* Dump specific gimple body of specified function.  */
>dump_body ();
> -  return;
>  }
>else if (flag_dump_callgraph)
> -{
> -  dump_symtab_graphviz ();
> -  return;
> -}
> +dump_symtab_graphviz ();
> +  else
> +dump_tool_help ();
> +
> +  /* Exit right now.  */
> +  exit (SUCCESS_EXIT_CODE);
>  }
> diff --git a/gcc/toplev.cc b/gcc/toplev.cc
> index a24ad5db438..61d234a9ef4 100644
> --- a/gcc/toplev.cc
> +++ b/gcc/toplev.cc
> @@ -721,7 +721,7 @@ init_asm_output (const char *name)
>  "cannot open %qs for writing: %m", asm_file_name);
>  }
>
> -  if (!flag_syntax_only)
> +  if (!flag_syntax_only && !(global_dc->lang_mask & CL_LTODump))
>  {
>targetm.asm_out.file_start ();
>
> --
> 2.36.1
>


RE: [PATCH]middle-end simplify complex if expressions where comparisons are inverse of one another.

2022-07-08 Thread Richard Biener via Gcc-patches
On Thu, 7 Jul 2022, Tamar Christina wrote:

> > -Original Message-
> > From: Andrew Pinski 
> > Sent: Wednesday, July 6, 2022 8:37 PM
> > To: Tamar Christina 
> > Cc: Richard Biener ; nd ; gcc-
> > patc...@gcc.gnu.org
> > Subject: Re: [PATCH]middle-end simplify complex if expressions where
> > comparisons are inverse of one another.
> > 
> > On Wed, Jul 6, 2022 at 9:06 AM Tamar Christina 
> > wrote:
> > >
> > > > -Original Message-
> > > > From: Andrew Pinski 
> > > > Sent: Wednesday, July 6, 2022 3:10 AM
> > > > To: Tamar Christina 
> > > > Cc: Richard Biener ; nd ; gcc-
> > > > patc...@gcc.gnu.org
> > > > Subject: Re: [PATCH]middle-end simplify complex if expressions where
> > > > comparisons are inverse of one another.
> > > >
> > > > On Tue, Jul 5, 2022 at 8:16 AM Tamar Christina via Gcc-patches  > > > patc...@gcc.gnu.org> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -Original Message-
> > > > > > From: Richard Biener 
> > > > > > Sent: Monday, June 20, 2022 9:57 AM
> > > > > > To: Tamar Christina 
> > > > > > Cc: gcc-patches@gcc.gnu.org; nd 
> > > > > > Subject: Re: [PATCH]middle-end simplify complex if expressions
> > > > > > where comparisons are inverse of one another.
> > > > > >
> > > > > > On Thu, 16 Jun 2022, Tamar Christina wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > This optimizes the following sequence
> > > > > > >
> > > > > > >   ((a < b) & c) | ((a >= b) & d)
> > > > > > >
> > > > > > > into
> > > > > > >
> > > > > > >   (a < b ? c : d) & 1
> > > > > > >
> > > > > > > for scalar. On vector we can omit the & 1.
> > > > > > >
> > > > > > > This changes the code generation from
> > > > > > >
> > > > > > > zoo2:
> > > > > > > cmp w0, w1
> > > > > > > csetw0, lt
> > > > > > > csetw1, ge
> > > > > > > and w0, w0, w2
> > > > > > > and w1, w1, w3
> > > > > > > orr w0, w0, w1
> > > > > > > ret
> > > > > > >
> > > > > > > into
> > > > > > >
> > > > > > > cmp w0, w1
> > > > > > > cselw0, w2, w3, lt
> > > > > > > and w0, w0, 1
> > > > > > > ret
> > > > > > >
> > > > > > > and significantly reduces the number of selects we have to do
> > > > > > > in the vector code.
> > > > > > >
> > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > > > x86_64-pc-linux-gnu and no issues.
> > > > > > >
> > > > > > > Ok for master?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > * fold-const.cc (inverse_conditions_p): Traverse if SSA_NAME.
> > > > > > > * match.pd: Add new rule.
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > * gcc.target/aarch64/if-compare_1.c: New test.
> > > > > > > * gcc.target/aarch64/if-compare_2.c: New test.
> > > > > > >
> > > > > > > --- inline copy of patch --
> > > > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index
> > > > > > >
> > > > > >
> > > >
> > 39a5a52958d87497f301826e706886b290771a2d..f180599b90150acd3ed895a64
> > > > > > 280
> > > > > > > aa3255061256 100644
> > > > > > > --- a/gcc/fold-const.cc
> > > > > > > +++ b/gcc/fold-const.cc
> > > > > > > @@ -2833,15 +2833,38 @@ compcode_to_comparison (enum
> > > > > > comparison_code
> > > > > > > code)  bool  inverse_conditions_p (const_tree cond1,
> > > > > > > const_tree
> > > > > > > cond2) {
> > > > > > > -  return (COMPARISON_CLASS_P (cond1)
> > > > > > > - && COMPARISON_CLASS_P (cond2)
> > > > > > > - && (invert_tree_comparison
> > > > > > > - (TREE_CODE (cond1),
> > > > > > > -  HONOR_NANS (TREE_OPERAND (cond1, 0))) == TREE_CODE
> > > > > > (cond2))
> > > > > > > - && operand_equal_p (TREE_OPERAND (cond1, 0),
> > > > > > > - TREE_OPERAND (cond2, 0), 0)
> > > > > > > - && operand_equal_p (TREE_OPERAND (cond1, 1),
> > > > > > > - TREE_OPERAND (cond2, 1), 0));
> > > > > > > +  if (COMPARISON_CLASS_P (cond1)
> > > > > > > +  && COMPARISON_CLASS_P (cond2)
> > > > > > > +  && (invert_tree_comparison
> > > > > > > +  (TREE_CODE (cond1),
> > > > > > > +   HONOR_NANS (TREE_OPERAND (cond1, 0))) == TREE_CODE
> > > > > > (cond2))
> > > > > > > +  && operand_equal_p (TREE_OPERAND (cond1, 0),
> > > > > > > + TREE_OPERAND (cond2, 0), 0)
> > > > > > > +  && operand_equal_p (TREE_OPERAND (cond1, 1),
> > > > > > > + TREE_OPERAND (cond2, 1), 0))
> > > > > > > +return true;
> > > > > > > +
> > > > > > > +  if (TREE_CODE (cond1) == SSA_NAME
> > > > > > > +  && TREE_CODE (cond2) == SSA_NAME)
> > > > > > > +{
> > > > > > > +  gimple *gcond1 = SSA_NAME_DEF_STMT (cond1);
> > > > > > > +  gimple *gcond2 = SSA_NAME_DEF_STMT (cond2);
> > > > > > > +  if (!is_gimple_assign (gcond1) || !is_gimple_assign 
> > > > > > > (gcond2))
> > > > > > > +   return false;
> > > > > > > +
> > > > > > > +  tree_code c

Re: [PATCH v2] Support --disable-fixincludes.

2022-07-08 Thread Martin Liška
On 5/25/22 07:37, Alexandre Oliva wrote:
> On May 24, 2022, Martin Liška  wrote:
> 
>> Allways install limits.h and syslimits.h header files
>> to include folder.
> 
> typo: s/Allways/Always/

Hello.

Fixed.

> 
> I'm a little worried about this bit, too.  limitx.h includes
> "syslimits.h", mentioning it should be in the same directory.  Perhaps
> it could be left in include-fixed?

Well, I would like to go w/o include-fixed if the option --disable-fixincludes 
is used.

> 
> The patch also changes syslimits.h from either the fixincluded file or
> gsyslimits.h to use gsyslimits.h unconditionally, which seemed wrong at
> first.
> 
> Now I see how these two hunks work together: syslimits.h will now always
> #include_next , which will find it in include-fixed if it's
> there, and system header files otherwise.  Nice!, but IMHO the commit
> message could be a little more verbose on the extent of the change and
> why that (is supposed to) work.

Oh, to be honest I'm not fully familiar with how these 2 files work together.
Can you explain it to me so that I can adjust the changelog entry 
correspondingly?

> 
> 
> It also looks like install-mkheaders installs limits-related headers for
> when fixincludes runs; we could probably skip the whole thing if
> fixincludes is disabled, but I'm also worried about how the changes
> above might impact post-install fixincludes: if that installs
> gsyslimits.h et al in include-fixed while your changes moves it to
> include, headers might end up in a confusing state.  I haven't worked
> out whether that's safe, but there appears to be room for cleanups
> there.

I've check that 'make install-mkheaders' work fine w/ and w/o 
--disable-fixincludes
after the patch.

> 
> gcc/config/mips/t-sdemtk also places syslimits.h explicitly in include/
> rather than include-fixed/, as part of disabling fixincludes, which is
> good, but it could be cleaned up as well.

Can we do that as a follow-up patch?

> 
> I don't see other config fragments that might require adjustments, so I
> think the patch looks good; hopefully my worries are unjustified, and
> the cleanups it enables could be

Good.

> 
> 
> We still create the README file in there and install it, even with
> fixincludes disabled and thus unavailable, don't we?  That README then
> becomes misleading; we might be better off not installing it.

Sure, fixed in v2 of the patch.

> 
> 
>> When --disable-fixincludes is used, then no systen header files
>> are fixed by the tools in fixincludes. Moreover, the fixincludes
>> tools are not built any longer.
> 
> typo: s/systen/system/

Fixed.

> 
> 
> Could you please check that a post-install mkheaders still has a
> functional limits.h with these changes?

How do I check that, please?

> The patch is ok (with the typo
> fixes) if so.  The cleanups it enables would be welcome as separate
> patches ;-)

Can I install the v2?

Martin

> 
> Thanks!
> 
From d06f931329c456821acac45aa3d89922183161bc Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 24 May 2022 13:06:07 +0200
Subject: [PATCH] Support --disable-fixincludes.

Always install limits.h and syslimits.h header files
to include folder.

When --disable-fixincludes is used, then no system header files
are fixed by the tools in fixincludes. Moreover, the fixincludes
tools are not built any longer.

gcc/ChangeLog:

	* Makefile.in: Always install limits.h and syslimits.h to
	include folder.
	* configure.ac: Assign STMP_FIXINC blank if
	--disable-fixincludes is used.
	* configure: Regenerate.
---
 gcc/Makefile.in  | 30 +-
 gcc/configure| 10 --
 gcc/configure.ac |  6 ++
 3 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 3ae23702426..b7883254324 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3164,24 +3164,27 @@ stmp-int-hdrs: $(STMP_FIXINC) $(T_GLIMITS_H) $(T_STDINT_GCC_H) $(USER_H) fixinc_
 	set -e; for ml in `cat fixinc_list`; do \
 	  sysroot_headers_suffix=`echo $${ml} | sed -e 's/;.*$$//'`; \
 	  multi_dir=`echo $${ml} | sed -e 's/^[^;]*;//'`; \
-	  fix_dir=include-fixed$${multi_dir}; \
+	  include_dir=include$${multi_dir}; \
 	  if $(LIMITS_H_TEST) ; then \
 	cat $(srcdir)/limitx.h $(T_GLIMITS_H) $(srcdir)/limity.h > tmp-xlimits.h; \
 	  else \
 	cat $(T_GLIMITS_H) > tmp-xlimits.h; \
 	  fi; \
-	  $(mkinstalldirs) $${fix_dir}; \
-	  chmod a+rx $${fix_dir} || true; \
+	  $(mkinstalldirs) $${include_dir}; \
+	  chmod a+rx $${include_dir} || true; \
 	  $(SHELL) $(srcdir)/../move-if-change \
 	tmp-xlimits.h  tmp-limits.h; \
-	  rm -f $${fix_dir}/limits.h; \
-	  cp -p tmp-limits.h $${fix_dir}/limits.h; \
-	  chmod a+r $${fix_dir}/limits.h; \
+	  rm -f $${include_dir}/limits.h; \
+	  cp -p tmp-limits.h $${include_dir}/limits.h; \
+	  chmod a+r $${include_dir}/limits.h; \
+	  cp $(srcdir)/gsyslimits.h $${include_dir}/syslimits.h; \
 	done
 # Install the README
-	rm -f include-fixed/README
-	cp $(srcdir)/../fixincludes/READ

Re: [PATCH] inline: Rebuild target option node for caller [PR105459]

2022-07-08 Thread Martin Liška
On 6/6/22 08:20, Kewen.Lin wrote:
> |Hi, PR105459 exposes one issue in inline_call handling that when it decides 
> to copy FP flags from callee to caller and rebuild the optimization node for 
> caller fndecl, it's possible that the target option node is also necessary to 
> be rebuilt. Without updating target option node early, it can make nodes 
> share the same target option node wrongly, later when we want to unshare it 
> somewhere (like in target hook) it can get unexpected results, like ICE on 
> uninitialized secondary member of target globals exposed in this PR.|


Hello.

I think your patch seems reasonable. As you mentioned we need to keep
pair of target and optimization nodes together and the only correct way
is by using build_target_option_node for the DECL_FUNCTION_SPECIFIC_TARGET.

Thanks for the fix.
Martin


[PATCH][s390]: Fix the usage of store_bit_field in the backend.

2022-07-08 Thread Tamar Christina via Gcc-patches
Hi All,

I seem to have broken the s390 bootstrap because I added a new parameter to the
store_bit_field function to indicate whether the value the field of is being set
is currently undefined.

If it's undefined we use a subreg instead.  In this case the value of false
restores the old behavior.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/s390/s390.cc (s390_expand_atomic): Pass false to 
store_bit_field to
indicate that the value is not undefined.

--- inline copy of patch -- 
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 
444b1ec20d768d829ab19a41f114a91119335e00..5aaf76a94908c4d8f09aca5ac64ef3a418615b9e
 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -7468,7 +7468,7 @@ s390_expand_atomic (machine_mode mode, enum rtx_code code,
 case SET:
   if (ac.aligned && MEM_P (val))
store_bit_field (new_rtx, GET_MODE_BITSIZE (mode), 0,
-0, 0, SImode, val, false);
+0, 0, SImode, val, false, false);
   else
{
  new_rtx = expand_simple_binop (SImode, AND, new_rtx, ac.modemaski,




-- 
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 
444b1ec20d768d829ab19a41f114a91119335e00..5aaf76a94908c4d8f09aca5ac64ef3a418615b9e
 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -7468,7 +7468,7 @@ s390_expand_atomic (machine_mode mode, enum rtx_code code,
 case SET:
   if (ac.aligned && MEM_P (val))
store_bit_field (new_rtx, GET_MODE_BITSIZE (mode), 0,
-0, 0, SImode, val, false);
+0, 0, SImode, val, false, false);
   else
{
  new_rtx = expand_simple_binop (SImode, AND, new_rtx, ac.modemaski,





Re: [PATCH v3] Enable __memcmpeq after seeing __memcmpeq prototype

2022-07-08 Thread Richard Biener via Gcc-patches
On Tue, Jul 5, 2022 at 6:27 PM H.J. Lu  wrote:
>
> extern int __memcmpeq (const void *, const void *, size_t);
>
> was was added to GLIBC 2.35.  Expand BUILT_IN_MEMCMP_EQ to __memcmpeq
> after seeing __memcmpeq prototype
>
> gcc/
>
> * builtins.cc (expand_builtin): Issue an error for
> BUILT_IN___MEMCMPEQ if there is no __memcmpeq prototype.  Expand
> BUILT_IN_MEMCMP_EQ to BUILT_IN___MEMCMP_EQ if there is __memcmpeq
> prototype.
> * builtins.def (BUILT_IN___MEMCMPEQ): New.
>
> gcc/testsuite/
>
> * c-c++-common/memcmpeq-1.c: New test.
> * c-c++-common/memcmpeq-2.c: Likewise.
> * c-c++-common/memcmpeq-3.c: Likewise.
> * c-c++-common/memcmpeq-4.c: Likewise.
> * c-c++-common/memcmpeq-5.c: Likewise.
> * c-c++-common/memcmpeq-6.c: Likewise.
> * c-c++-common/memcmpeq.h: Likewise.
> ---
>  gcc/builtins.cc | 14 +-
>  gcc/builtins.def|  3 +++
>  gcc/testsuite/c-c++-common/memcmpeq-1.c | 11 +++
>  gcc/testsuite/c-c++-common/memcmpeq-2.c | 11 +++
>  gcc/testsuite/c-c++-common/memcmpeq-3.c | 11 +++
>  gcc/testsuite/c-c++-common/memcmpeq-4.c | 11 +++
>  gcc/testsuite/c-c++-common/memcmpeq-5.c | 11 +++
>  gcc/testsuite/c-c++-common/memcmpeq-6.c | 10 ++
>  gcc/testsuite/c-c++-common/memcmpeq.h   | 11 +++
>  9 files changed, 92 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-1.c
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-2.c
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-3.c
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-4.c
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-5.c
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-6.c
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq.h
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index e6816d5c865..2254a597bec 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -7395,6 +7395,15 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
> machine_mode mode,
> return target;
>break;
>
> +case BUILT_IN___MEMCMPEQ:
> +  if (!builtin_decl_declared_p (BUILT_IN___MEMCMPEQ))
> +   {
> + error ("use of %<__builtin___memcmpeq ()%> without "
> +"%<__memcmpeq%> prototype");
> + return const0_rtx;
> +   }
> +  break;
> +

I think we don't need this - if the user manually calls
__builtin__memcmpeq () he
should know what he does.

>  /* Expand it as BUILT_IN_MEMCMP_EQ first. If not successful, change it
> back to a BUILT_IN_STRCMP. Remember to delete the 3rd parameter
> when changing it to a strcmp call.  */
> @@ -7448,7 +7457,10 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
> machine_mode mode,
> return target;
>if (fcode == BUILT_IN_MEMCMP_EQ)
> {
> - tree newdecl = builtin_decl_explicit (BUILT_IN_MEMCMP);
> + tree newdecl = builtin_decl_explicit
> +   (builtin_decl_declared_p (BUILT_IN___MEMCMPEQ)
> +? BUILT_IN___MEMCMPEQ
> +: BUILT_IN_MEMCMP);
>   TREE_OPERAND (exp, 1) = build_fold_addr_expr (newdecl);
> }
>break;
> diff --git a/gcc/builtins.def b/gcc/builtins.def
> index 005976f34e9..95642c6acdf 100644
> --- a/gcc/builtins.def
> +++ b/gcc/builtins.def
> @@ -965,6 +965,9 @@ DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX, 
> "__builtin_alloca_with_ali
> equality with zero.  */
>  DEF_BUILTIN_STUB (BUILT_IN_MEMCMP_EQ, "__builtin_memcmp_eq")
>
> +/* Similar to BUILT_IN_MEMCMP_EQ, but is mapped to __memcmpeq.  */
> +DEF_EXT_LIB_BUILTIN (BUILT_IN___MEMCMPEQ, "__memcmpeq", 
> BT_FN_INT_CONST_PTR_CONST_PTR_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF)
> +

it would be nice to not make __builtin___memcmpeq or __memcmpeq available to the
user without a __memcmpeq declaration.  I'm not sure this can be arranged for.
In principle we're just assigning an optional libfunc name to BUILT_IN_MEMCMP_EQ
when __memcmpeq is available.

Joseph, is there any convenient mechanism available I am missing?

Otherwise the builtin should probably be documented (as well as the
behavior wrt a __memcmpeq
declaration).

Thanks,
Richard.

>  /* An internal version of strcmp/strncmp, used when the result is only
> tested for equality with zero.  */
>  DEF_BUILTIN_STUB (BUILT_IN_STRCMP_EQ, "__builtin_strcmp_eq")
> diff --git a/gcc/testsuite/c-c++-common/memcmpeq-1.c 
> b/gcc/testsuite/c-c++-common/memcmpeq-1.c
> new file mode 100644
> index 000..14622f0d765
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/memcmpeq-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler "__memcmpeq" } } */
> +
> +#include "memcmpeq.h"
> +
> +int
> +foo (const char *s1, const char *s2, size_t len)
> +{
> +  return __builtin_memcmp (s1, s2, len) != 0;
> +}
>

[PATCH] testsuite: Fix tree-ssa/alias-access-path-13.c on 32bit platforms (PR 106216)

2022-07-08 Thread Martin Jambor
Hi,

for gcc.dg/tree-ssa/alias-access-path-13.c to work, SRA must think of
accesses to foo.inn.val and to foo itself as different ones, i.e. they
need to have different offset and size, which on 32bit platforms they
do not.  Fixed by replacing a dummy long int field of the union with a
struct of two integers.

Tested by:
  make -k check-gcc RUNTESTFLAGS="tree-ssa.exp=alias-access-path-13.c" and
  make -k check-gcc RUNTESTFLAGS="--target_board=unix'{-m32}' 
tree-ssa.exp=alias-access-path-13.c"
on an x86_64-linux, also with patched SRA to verify it still tests the
original intent.

I hope the fix is obvious enough (and it is only a testsuite change)
that I can commit it myself.

Thanks,

Martin


gcc/testsuite/ChangeLog:

2022-07-08  Martin Jambor  

PR testsuite/106216
* gcc.dg/tree-ssa/alias-access-path-13.c (union foo): Replace a long
int field with a struct that is larger than an int also on 32bit
platforms.
---
 gcc/testsuite/gcc.dg/tree-ssa/alias-access-path-13.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-access-path-13.c 
b/gcc/testsuite/gcc.dg/tree-ssa/alias-access-path-13.c
index e502a97bc75..87a94f5bf31 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/alias-access-path-13.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-access-path-13.c
@@ -6,10 +6,15 @@ struct inn
   int val;
 };
 
+struct biggerstruct
+{
+  int a, b;
+};
+
 union foo
 {
   struct inn inn;
-  long int baz;
+  struct biggerstruct baz;
 } *fooptr;
 
 struct bar
-- 
2.36.1



Re: [PATCH 3/3] lto-plugin: implement LDPT_GET_API_VERSION

2022-07-08 Thread Alexander Monakov via Gcc-patches


On Fri, 8 Jul 2022, Martin Liška wrote:

> Hi.
> 
> All right, there's updated version of the patch that reflects the following 
> suggestions:
> 
> 1) strings are used for version identification
> 2) thread-safe API version (1) is not used if target does not support locking 
> via pthreads
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?

Note that mold side will need to be adjusted, because it did not really
implement the proposed contract. Maybe we should be more clear how the
linker is supposed to implement this? Preliminary mold patch does this:

static PluginLinkerAPIVersion
get_api_version(const char *plugin_identifier,
unsigned plugin_version,
PluginLinkerAPIVersion minimal_api_supported,
PluginLinkerAPIVersion maximal_api_supported,
const char **linker_identifier,
unsigned *linker_version) {
  assert(maximal_api_supported >= LAPI_V1);
  *linker_identifier = "mold";
  *linker_version = get_mold_version();
  is_gcc_linker_api_v1 = true;
  return LAPI_V1;
}

but ignoring min_api_supported is wrong, and assuming max_api_supported > 0
is also wrong. It really should check how given [min; max] range intersects
with its own range of supported versions.

Alexande


Re: [PATCH v2] Simplify memchr with small constant strings

2022-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 7, 2022 at 6:45 PM H.J. Lu  wrote:
>
> When memchr is applied on a constant string of no more than the bytes of
> a word, simplify memchr by checking each byte in the constant string.
>
> int f (int a)
> {
>return  __builtin_memchr ("AE", a, 2) != 0;
> }
>
> is simplified to
>
> int f (int a)
> {
>   return ((char) a == 'A' || (char) a == 'E') != 0;
> }
>
> gcc/
>
> PR tree-optimization/103798
> * tree-ssa-forwprop.cc: Include "tree-ssa-strlen.h".
> (simplify_builtin_call): Inline memchr with constant strings of
> no more than the bytes of a word.
> * tree-ssa-strlen.cc (use_in_zero_equality): Make it global.
> * tree-ssa-strlen.h (use_in_zero_equality): New.
>
> gcc/testsuite/
>
> PR tree-optimization/103798
> * c-c++-common/pr103798-1.c: New test.
> * c-c++-common/pr103798-2.c: Likewise.
> * c-c++-common/pr103798-3.c: Likewise.
> * c-c++-common/pr103798-4.c: Likewise.
> * c-c++-common/pr103798-5.c: Likewise.
> * c-c++-common/pr103798-6.c: Likewise.
> * c-c++-common/pr103798-7.c: Likewise.
> * c-c++-common/pr103798-8.c: Likewise.
> ---
>  gcc/testsuite/c-c++-common/pr103798-1.c | 28 +++
>  gcc/testsuite/c-c++-common/pr103798-2.c | 30 
>  gcc/testsuite/c-c++-common/pr103798-3.c | 28 +++
>  gcc/testsuite/c-c++-common/pr103798-4.c | 28 +++
>  gcc/testsuite/c-c++-common/pr103798-5.c | 26 ++
>  gcc/testsuite/c-c++-common/pr103798-6.c | 27 +++
>  gcc/testsuite/c-c++-common/pr103798-7.c | 27 +++
>  gcc/testsuite/c-c++-common/pr103798-8.c | 27 +++
>  gcc/tree-ssa-forwprop.cc| 64 +
>  gcc/tree-ssa-strlen.cc  |  4 +-
>  gcc/tree-ssa-strlen.h   |  2 +
>  11 files changed, 289 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-1.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-2.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-3.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-4.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-5.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-6.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-7.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-8.c
>
> diff --git a/gcc/testsuite/c-c++-common/pr103798-1.c 
> b/gcc/testsuite/c-c++-common/pr103798-1.c
> new file mode 100644
> index 000..cd3edf569fc
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pr103798-1.c
> @@ -0,0 +1,28 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
> +
> +__attribute__ ((weak))
> +int
> +f (char a)
> +{
> +   return  __builtin_memchr ("a", a, 1) == 0;
> +}
> +
> +__attribute__ ((weak))
> +int
> +g (char a)
> +{
> +  return a != 'a';
> +}
> +
> +int
> +main ()
> +{
> + for (int i = 0; i < 255; i++)
> +   if (f (i) != g (i))
> + __builtin_abort ();
> +
> + return 0;
> +}
> +
> +/* { dg-final { scan-assembler-not "memchr" } } */
> diff --git a/gcc/testsuite/c-c++-common/pr103798-2.c 
> b/gcc/testsuite/c-c++-common/pr103798-2.c
> new file mode 100644
> index 000..e7e99c3679e
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pr103798-2.c
> @@ -0,0 +1,30 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
> +
> +#include 
> +
> +__attribute__ ((weak))
> +int
> +f (int a)
> +{
> +   return memchr ("aE", a, 2) != NULL;
> +}
> +
> +__attribute__ ((weak))
> +int
> +g (char a)
> +{
> +  return a == 'a' || a == 'E';
> +}
> +
> +int
> +main ()
> +{
> + for (int i = 0; i < 255; i++)
> +   if (f (i + 256) != g (i + 256))
> + __builtin_abort ();
> +
> + return 0;
> +}
> +
> +/* { dg-final { scan-assembler-not "memchr" } } */
> diff --git a/gcc/testsuite/c-c++-common/pr103798-3.c 
> b/gcc/testsuite/c-c++-common/pr103798-3.c
> new file mode 100644
> index 000..ddcedc7e238
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pr103798-3.c
> @@ -0,0 +1,28 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
> +
> +__attribute__ ((weak))
> +int
> +f (char a)
> +{
> +   return  __builtin_memchr ("aEgZ", a, 3) == 0;
> +}
> +
> +__attribute__ ((weak))
> +int
> +g (char a)
> +{
> +  return a != 'a' && a != 'E' && a != 'g';
> +}
> +
> +int
> +main ()
> +{
> + for (int i = 0; i < 255; i++)
> +   if (f (i) != g (i))
> + __builtin_abort ();
> +
> + return 0;
> +}
> +
> +/* { dg-final { scan-assembler-not "memchr" } } */
> diff --git a/gcc/testsuite/c-c++-common/pr103798-4.c 
> b/gcc/testsuite/c-c++-common/pr103798-4.c
> new file mode 100644
> index 000..00e8302a833
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pr103798-4.c
> @@ -0,0 +1,28 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
> +
> +__attribute__ ((weak))
> +int
> +f (char a)
> +{
> +   return  _

[PATCH] d: Move DSO registry support code from compiler to drtstuff in library (PR100062)

2022-07-08 Thread Iain Buclaw via Gcc-patches
Hi,

Currently the DSO support for D runtime is generated by the compiler in
every object, when really it is only required once per shared object.

This patch moves that support logic from the compiler itself to the
library as part of the drtstuff code.  The object files drtbegin.o and
drtend.o are now always linked in.

Bootstrapped and tested on x86_64-linux-gnu/-m32/-mx32, with no
observable regressions.

@Rainer, as you provided the original, would be good to validate this is
fine for Solaris too.

Regards
Iain.

---
PR d/100062

gcc/ChangeLog:

* config/darwin-d.cc (TARGET_D_MINFO_START_NAME): Remove.
(TARGET_D_MINFO_END_NAME): Remove.
* config/elfos.h (TARGET_D_MINFO_START_NAME): Remove.
(TARGET_D_MINFO_END_NAME): Remove.
* config/i386/winnt-d.cc (TARGET_D_MINFO_START_NAME): Remove.
(TARGET_D_MINFO_END_NAME): Remove.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Remove hooks for TARGET_D_MINFO_START_NAME and
TARGET_D_MINFO_END_NAME

gcc/d/ChangeLog:

* d-target.def (d_minfo_start_name): Remove hook.
(d_minfo_end_name): Remove hook.
* modules.cc (compiler_dso_type): Remove.
(dso_registry_fn): Remove.
(dso_slot_node): Remove.
(dso_initialized_node): Remove.
(start_minfo_node): Remove.
(stop_minfo_node): Remove.
(get_compiler_dso_type): Remove.
(get_dso_registry_fn): Remove.
(build_dso_cdtor_fn): Remove.
(build_dso_registry_var): Remove.
(register_moduleinfo): Don't generate and emit DSO registry code.

gcc/testsuite/ChangeLog:

* lib/gdc-utils.exp (gdc-convert-args): Handle -nophoboslib.

libphobos/ChangeLog:

* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Remove substitution of DRTSTUFF_SPEC.
(DRUNTIME_OS_MINFO_BRACKETING): Rename to...
(DRUNTIME_OS_NAMED_SECTIONS): ...this.
* libdruntime/Makefile.am: Always compile $(DRTSTUFF).
(gcc/drtbegin.o): Compile with gdc.
(gcc/drtend.o): Likewise.
* libdruntime/Makefile.in: Regenerate.
* libdruntime/gcc/config.d.in (OS_Have_Named_Sections): Define.
* libdruntime/gcc/sections/common.d (CompilerDSOData): Define.
* libdruntime/gcc/sections/elf.d (CompilerDSOData): Remove.
* libdruntime/gcc/sections/macho.d (CompilerDSOData): Remove.
* libdruntime/gcc/sections/pecoff.d (CompilerDSOData): Remove.
* m4/druntime/os.m4 (DRUNTIME_OS_MINFO_BRACKETING): Remove.
(DRUNTIME_OS_NAMED_SECTIONS): Define.
* src/Makefile.in: Regenerate.
* src/libgphobos.spec.in: Always add drtbegin and drtend to startfile
and endfile spec.
* testsuite/Makefile.in: Regenerate.
* libdruntime/gcc/drtstuff.c: Rename to file to...
* libdruntime/gcc/drtstuff.d: ...this. Convert source to D, add DSO
registry code originally generated by the compiler.
---
 gcc/config/darwin-d.cc  |   6 -
 gcc/config/elfos.h  |   2 -
 gcc/config/i386/winnt-d.cc  |   6 -
 gcc/d/d-target.def  |  17 +-
 gcc/d/modules.cc| 202 +---
 gcc/doc/tm.texi |  12 --
 gcc/doc/tm.texi.in  |   4 -
 gcc/testsuite/lib/gdc-utils.exp |   3 +
 libphobos/Makefile.in   |   2 +-
 libphobos/configure | 119 +++-
 libphobos/configure.ac  |  10 +-
 libphobos/libdruntime/Makefile.am   |  16 +-
 libphobos/libdruntime/Makefile.in   |  16 +-
 libphobos/libdruntime/gcc/config.d.in   |   3 +
 libphobos/libdruntime/gcc/drtstuff.c|  39 
 libphobos/libdruntime/gcc/drtstuff.d| 105 ++
 libphobos/libdruntime/gcc/sections/common.d |  11 ++
 libphobos/libdruntime/gcc/sections/elf.d|  11 --
 libphobos/libdruntime/gcc/sections/macho.d  |  11 --
 libphobos/libdruntime/gcc/sections/pecoff.d |  11 --
 libphobos/m4/druntime/os.m4 |  40 +---
 libphobos/src/Makefile.in   |   2 +-
 libphobos/src/libgphobos.spec.in|   6 +-
 libphobos/testsuite/Makefile.in |   2 +-
 24 files changed, 183 insertions(+), 473 deletions(-)
 delete mode 100644 libphobos/libdruntime/gcc/drtstuff.c
 create mode 100644 libphobos/libdruntime/gcc/drtstuff.d

diff --git a/gcc/config/darwin-d.cc b/gcc/config/darwin-d.cc
index e983883dba6..358a049212b 100644
--- a/gcc/config/darwin-d.cc
+++ b/gcc/config/darwin-d.cc
@@ -66,10 +66,4 @@ darwin_d_register_target_info (void)
 #undef TARGET_D_MINFO_SECTION
 #define TARGET_D_MINFO_SECTION "__DATA,__minfodata"
 
-#undef TARGET_D_MINFO_START_NAME
-#define TARGET_D_MINFO_START_NAME "*section$start$__DATA$__minfodata"
-
-#undef TARGET_D_MINFO_END_NAME
-#define TARGET_D_MINFO_END_NAME "*section$end$__DATA$__min

[PATCH] loongarch: fix mulsidi3_64bit instruction

2022-07-08 Thread Xi Ruoyao via Gcc-patches
I think this should be obvious.  Ok for trunk and gcc-12 branch?

(Note: this bug really amazed me.  It's just a simple typo and all of us
failed to spot it reviewing the LoongArch port.  Incredibly, it can be
reproduced with such a simple test case (in the patch) but did not blow
the entire system up.  I didn't see anything abnormal until it blown up
two UBSan test cases when I tried to port UBSan for LoongArch.)

-- >8 --

(mult (sign_extend:DI rj:SI) (sign_extend:DI rk:SI)) should be
"mulw.d.w", not "mul.d".

gcc/ChangeLog:

* config/loongarch/loongarch.md (mulsidi3_64bit): Use mulw.d.w
instead of mul.d.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/mul-1.c: New test.
* gcc.target/loongarch/mul-2.c: New test.
---
 gcc/config/loongarch/loongarch.md  |  2 +-
 gcc/testsuite/gcc.target/loongarch/mul-1.c | 20 
 gcc/testsuite/gcc.target/loongarch/mul-2.c | 10 ++
 3 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-2.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index d3c809e25f3..8f8412fba84 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -621,7 +621,7 @@ (define_insn "mulsidi3_64bit"
(mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
 (sign_extend:DI (match_operand:SI 2 "register_operand" "r"]
   "TARGET_64BIT"
-  "mul.d\t%0,%1,%2"
+  "mulw.d.w\t%0,%1,%2"
   [(set_attr "type" "imul")
(set_attr "mode" "DI")])
 
diff --git a/gcc/testsuite/gcc.target/loongarch/mul-1.c 
b/gcc/testsuite/gcc.target/loongarch/mul-1.c
new file mode 100644
index 000..8b6800804fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/mul-1.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+
+typedef __INT64_TYPE__ int64_t;
+typedef __INT32_TYPE__ int32_t;
+
+/* f() was misoptimized to a single "mul.d" instruction on LA64.  */
+__attribute__((noipa, noinline)) int64_t
+f(int64_t a, int64_t b)
+{
+  return (int64_t)(int32_t)a * (int64_t)(int32_t)b;
+}
+
+int
+main()
+{
+  int64_t a = 0x11451401;
+  int64_t b = 0x19198101;
+  if (f(a, b) != 1)
+__builtin_abort();
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/mul-2.c 
b/gcc/testsuite/gcc.target/loongarch/mul-2.c
new file mode 100644
index 000..a9a713210df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/mul-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "mulw.d.w\t\\\$r4,\\\$r5,\\\$r4" } } */
+
+/* This should be optimized to mulw.d.w for LA64.  */
+__attribute__((noipa, noinline)) long
+f(long a, long b)
+{
+  return (long)(int)a * (long)(int)b;
+}
-- 
2.37.0




Re: [PATCH v2] loongarch: fix mulsidi3_64bit instruction

2022-07-08 Thread Xi Ruoyao via Gcc-patches
v2: Move one portable test to gcc.c-torture so it will be tested with
all optimization levels.  And it might be helpful if the engineer of the
next GCC port makes a similar typo :).

-- >8 --

(mult (sign_extend:DI rj:SI) (sign_extend:DI rk:SI)) should be
"mulw.d.w", not "mul.d".

gcc/ChangeLog:

* config/loongarch/loongarch.md (mulsidi3_64bit): Use mulw.d.w
instead of mul.d.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/mulw_d_w.c: New test.
* gcc.c-torture/execute/mul-sext.c: New test.
---
 gcc/config/loongarch/loongarch.md |  2 +-
 .../gcc.c-torture/execute/mul-sext.c  | 20 +++
 gcc/testsuite/gcc.target/loongarch/mulw_d_w.c | 10 ++
 3 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/mul-sext.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/mulw_d_w.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index d3c809e25f3..8f8412fba84 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -621,7 +621,7 @@ (define_insn "mulsidi3_64bit"
(mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
 (sign_extend:DI (match_operand:SI 2 "register_operand" "r"]
   "TARGET_64BIT"
-  "mul.d\t%0,%1,%2"
+  "mulw.d.w\t%0,%1,%2"
   [(set_attr "type" "imul")
(set_attr "mode" "DI")])
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/mul-sext.c 
b/gcc/testsuite/gcc.c-torture/execute/mul-sext.c
new file mode 100644
index 000..8b6800804fb
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/mul-sext.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+
+typedef __INT64_TYPE__ int64_t;
+typedef __INT32_TYPE__ int32_t;
+
+/* f() was misoptimized to a single "mul.d" instruction on LA64.  */
+__attribute__((noipa, noinline)) int64_t
+f(int64_t a, int64_t b)
+{
+  return (int64_t)(int32_t)a * (int64_t)(int32_t)b;
+}
+
+int
+main()
+{
+  int64_t a = 0x11451401;
+  int64_t b = 0x19198101;
+  if (f(a, b) != 1)
+__builtin_abort();
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/mulw_d_w.c 
b/gcc/testsuite/gcc.target/loongarch/mulw_d_w.c
new file mode 100644
index 000..a9a713210df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/mulw_d_w.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "mulw.d.w\t\\\$r4,\\\$r5,\\\$r4" } } */
+
+/* This should be optimized to mulw.d.w for LA64.  */
+__attribute__((noipa, noinline)) long
+f(long a, long b)
+{
+  return (long)(int)a * (long)(int)b;
+}
-- 
2.37.0




Re: [PATCH] Introduce hardbool attribute for C

2022-07-08 Thread Sebastian Huber

On 08.07.22 08:58, Richard Biener via Gcc-patches wrote:

On Thu, Jul 7, 2022 at 10:00 PM Alexandre Oliva via Gcc-patches
  wrote:


This patch introduces hardened booleans in C.  The hardbool attribute,
when attached to an integral type, turns it into an enumerate type
with boolean semantics, using the named or implied constants as
representations for false and true.

Expressions of such types decay to _Bool, trapping if the value is
neither true nor false, and _Bool can convert implicitly back to them.
Other conversions go through _Bool first.

Regstrapped on x86_64-linux-gnu.  Ok to install?

Does this follow some other compilers / language?  Is such feature used
in existing code?  Why is it useful to allow arbitrary values for true/false?
Why is the default 0 and ~0 rather than 0 and 1 as for _Bool?


Maybe this helps to catch errors caused by radiation which resulted in a 
bit flip in a processor register or other memory. If you use a single 
bit for true/false you can't detect such an error without special hardware.


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH] diagnostics: Make line-ending logic consistent with libcpp [PR91733]

2022-07-08 Thread David Malcolm via Gcc-patches
On Thu, 2022-07-07 at 21:56 -0400, Lewis Hyatt wrote:
> Hello-
> 
> The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91733) points
> out that,
> while libcpp recognizes a lone '\r' as a valid line-ending character,
> the
> infrastructure that obtains source lines to be printed in diagnostics
> does
> not, and hence diagnostics do not output the intended portion of a
> source
> file that uses such line endings. The PR's author suggests that
> libcpp
> should stop accepting '\r' line endings, but that seems rather
> controversial
> and not likely to change. Fixing the diagnostics is easy enough
> though, and
> that's done by the attached patch. Please let me know if it looks OK,
> thanks! bootstrap + regtest all languages looks good, with just new
> PASSes
> for the new testcase.
> 
> FAIL 103 103
> PASS 543592 543627
> UNSUPPORTED 15298 15298
> UNTESTED 136 136
> XFAIL 4130 4130
> XPASS 20 20
> 

The patch looks good to me.

Thanks
Dave



Re: [PATCH] Add condition coverage profiling

2022-07-08 Thread Sebastian Huber

Hello Jørgen,

some time passed. It would be nice if you could give a status update. I 
am quite interested in your work.


Best regards,
Sebastian

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH] Introduce hardbool attribute for C

2022-07-08 Thread Alexandre Oliva via Gcc-patches
On Jul  8, 2022, Richard Biener  wrote:

> The documentation should probably be explicit about this case.

Please consider, for purposes of review, the following incremental
patchlet as if integrated with yesterday's submission.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index a63a94158341a..a1dcd581dd8ad 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8483,7 +8483,22 @@ followed by a mapping from @code{false} and @code{true} 
to
 typedef char __attribute__ ((__hardbool__ (0x5a))) hbool;
 hbool first = 0;   /* False, stored as (char)0x5a.  */
 hbool second = !first; /* True, stored as ~(char)0x5a.  */
-@end smallexample
+
+static hbool zeroinit; /* False, stored as (char)0x5a.  */
+auto hbool uninit; /* Undefined, may trap.  */
+@end smallexample
+
+When zero-initializing a variable or field of hardened boolean type
+(presumably held in static storage) the implied zero initializer gets
+converted to @code{_Bool}, and then to the hardened boolean type, so
+that the initial value is the hardened representation for @code{false}.
+Using that value is well defined.  This is @emph{not} the case when
+variables and fields of such types are uninitialized (presumably held in
+automatic or dynamic storage): their values are indeterminate, and using
+them invokes undefined behavior.  Using them may trap or not, depending
+on the bits held in the storage (re)used for the variable, if any, and
+on optimizations the compiler may perform on the grounds that using
+uninitialized values invokes undefined behavior.
 
 
 @item may_alias


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] Introduce hardbool attribute for C

2022-07-08 Thread Alexandre Oliva via Gcc-patches
On Jul  8, 2022, Richard Biener  wrote:

> Does this follow some other compilers / language?

It is analogous to Ada's booleans with representation clauses and
runtime validation checking at use points.

> Is such feature used in existing code?

Not that I know.  The attribute name was my choice.

That said, we have already delivered the experimental implementation to
the customer who requested it (GCC was in stage3, thus the delayed
submission), so by now they may already have some code using it.

> Why is it useful to allow arbitrary values for true/false?

Increasing the hamming distance between legitimate values is desirable
to catch hardware-based attacks, but booleans are probably the only
builtin type that has room for that.

> Why is the default 0 and ~0 rather than 0 and 1 as for _Bool?

My understanding is that the goal is to maximize the hamming distance
between the legitimate values, so as to increase the sensibility to
errors.

There was no requirement for defaults to be these values, however.  The
examples often used 0x5a and 0xa5, but those seemed too arbitrary to be
defaults.  I found ~1 and 1 to be too nasty, so I went for 0 and ~0,
that are still recognizable as false and true values, respectively,
though I'm not sure whether this is advantageous.

>> +@smallexample
>> +typedef char __attribute__ ((__hardbool__ (0x5a))) hbool;
>> +hbool first = 0;   /* False, stored as (char)0x5a.  */
>> +hbool second = !first; /* True, stored as ~(char)0x5a.  */
> hbool thrid;

> what's the initial value of 'third'?

If it's an automatic variable, it's uninitialized, as expected for C.
It might by chance happen to hold one of the legitimate values, but odds
are it doesn't, and if so, accessing it will trap.

If it's a static-storage variable, it will be zero-initialized as
expected, but the zero will be mapped to the representation for false.

> The documentation should probably be explicit about this case.

Agreed, thanks, will do.

> If the underlying representation is an Enum, why not have
> hardened_enum instead?

In Ada, Booleans are enumerator types with various conventions and
builtin operations, with or without a representation clause, that sets
the representation values for the enumerators.  Other enumerations
aren't subject to such conventions as automatic conversions between
Boolean types with different representations, that enable all of them to
be used interchangeably (source-wise) in logical expressions.

It would nevertheless be feasible to implement hardened enumerator
types, that, like Ada, perform runtime validation checking that the
stored value corresponds to one of the enumerators.  This would not fit
some common use cases of enumerator types, e.g. those in which
enumerators define bits or masks, and different enumerators are combined
into a single variable.  This was not the feature that we were asked to
implement, though.

> A hardened _Bool might want to have a special NaT value as well I
> guess?

That might sound appealing, but ISTM that it would instead break the
symmetry of the maximal hamming distance between the representation
values for true and false.  OTOH, NaB, if so desired, would be just any
other value; the challenge would be to get such a value stored in a
variable, given that actual booleans can only hold true (nonzero) or
false (zero), and neither would convert to NaB.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] Control flow redundancy hardening

2022-07-08 Thread Alexandre Oliva via Gcc-patches
On Jul  8, 2022, Richard Biener  wrote:

> I'm possibly missing the importance of 'redundancy' in -fharden-control-flow

I took "Control Flow Redundancy" as a term of the art and never
questioned it.  I think the "redundancy" has to do with the fact that
control flow is generally affected by tests and conditionals, and the
checks that an expected path was seemingly taken is redundant with
those.

> but how can you, from a set of visited blocks local to a function,
> determine whether the control flow through the function is "expected"

Hmm, maybe the definition should be in the negated form: what the check
catches is *unexpected* execution flows, e.g. when a block that
shouldn't have been reached (because none of its predecessors was)
somehow was.  This unexpected circumstance indicates some kind of fault
or attack, which is what IIUC this check is about.

Whether the fault was that the hardware took a wrong turn because it was
power deprived, or some software exploit returned to an artifact at the
end of a function to get it to serve an alternate purpose, the check at
the end of the function would catch the unexpected execution of a block
that couldn't be reached under normal circumstances, and flag the error
before further damage occurs.

> Can you elaborate on what kind of "derailed" control flow this catches
> (example?) and what cases it does not?

As in the comments for the pass: for each visited block, check that at
least one predecessor and at least one successor were also visited.  


> I'm also curious as of how this compares to hardware
> mitigations like x86 indirect branch tracking and shadow stack

I'm not expert in the field, but my understanding is that these are
complementary.

Indirect branch tracking constrains the set of available artifacts one
might indirectly branch to, but if you reach one of them, you'd be no
wiser that something fishy was going on without checking that you got
there from some of the predecessor blocks.  (we don't really check
precisely that, nor do we check at that precise time, but we check at
the end of the function that at least one of the predecessor blocks was
run.)  Constraining the available indirect branch targets helps avoid
bypassing the code that sets the bit corresponding to that block, which
might enable an attacker to use an artifact without detection., if
there's no subsequent block that would be inexplicably reached.

Shadow stacks avoid corruption of return addresses, so you're less
likely to reach an unexpected block by means of buffer overruns that
corrupt the stack and overwrite the return address.  Other means to land
in the middle of a function, such as corrupting memory or logical units
through power deprivation remain, and this pass helps guard against
those too.

> and how this relates to the LLVM control flow hardening (ISTR such
> thing exists).

I've never heard of it.  I've just tried to learn about it, but I
couldn't find anything pertinent.

Are you by any chance thinking of
https://clang.llvm.org/docs/ControlFlowIntegrity.html
?

This appears to be entirely unrelated: the control flow nodes it's
concerned with are functions/methods/subprograms in a program, rather
than basic blocks within a function.


Thanks a lot for these questions.  They're going to help me be better
prepared for a presentation about various hardening features (*) that
I've submitted and am preparing for the upcoming Cauldron.

(*) 
https://docs.adacore.com/live/wave/gnat_rm/html/gnat_rm/gnat_rm/security_hardening_features.html

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


libbacktrace patch committed: Check for sys/link.h

2022-07-08 Thread Ian Lance Taylor via Gcc-patches
Apparently QNX declares dl_iterate_phdr and friends in sys/link.h
rather than link.h.  This patch updates libbacktrace to check there.
This fixes https://github.com/ianlancetaylor/libbacktrace/issues/86.
Bootstrapped and ran libbacktrace testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian

* configure.ac: Check for sys/link.h.  Use either link.h or
sys/link.h when checking for dl_iterate_phdr.
* elf.c: Include sys/link.h if available.
* configure, config.h.in: Regenerate.
bab8b6e52fb0b48b5d9d1af5f93e5c8fb20d6240
diff --git a/libbacktrace/configure.ac b/libbacktrace/configure.ac
index 857987a2859..1daaa2f62d2 100644
--- a/libbacktrace/configure.ac
+++ b/libbacktrace/configure.ac
@@ -335,13 +335,17 @@ fi
 AC_SUBST(BACKTRACE_USES_MALLOC)
 
 # Check for dl_iterate_phdr.
-AC_CHECK_HEADERS(link.h)
-if test "$ac_cv_header_link_h" = "no"; then
+AC_CHECK_HEADERS(link.h sys/link.h)
+if test "$ac_cv_header_link_h" = "no" -a "$ac_cv_header_sys_link_h" = "no"; 
then
   have_dl_iterate_phdr=no
 else
   if test -n "${with_target_subdir}"; then
+link_h=link.h
+if test "$ac_cv_header_link_h" = "no"; then
+   link_h=sys/link.h
+fi
 # When built as a GCC target library, we can't do a link test.
-AC_EGREP_HEADER([dl_iterate_phdr], [link.h], [have_dl_iterate_phdr=yes],
+AC_EGREP_HEADER([dl_iterate_phdr], [$link_h], [have_dl_iterate_phdr=yes],
[have_dl_iterate_phdr=no])
   else
 AC_CHECK_FUNC([dl_iterate_phdr], [have_dl_iterate_phdr=yes],
diff --git a/libbacktrace/elf.c b/libbacktrace/elf.c
index 8b82dd45875..181d195fe35 100644
--- a/libbacktrace/elf.c
+++ b/libbacktrace/elf.c
@@ -40,7 +40,12 @@ POSSIBILITY OF SUCH DAMAGE.  */
 #include 
 
 #ifdef HAVE_DL_ITERATE_PHDR
-#include 
+ #ifdef HAVE_LINK_H
+  #include 
+ #endif
+ #ifdef HAVE_SYS_LINK_H
+  #include 
+ #endif
 #endif
 
 #include "backtrace.h"


[PATCH] c++: Add __reference_con{struc, ver}ts_from_temporary [PR104477]

2022-07-08 Thread Marek Polacek via Gcc-patches
This patch implements C++23 P2255R2, which adds two new type traits to
detect reference binding to a temporary.  They can be used to detect code
like

  std::tuple t("meow");

which is incorrect because it always creates a dangling reference, because
the std::string temporary is created inside the selected constructor of
std::tuple, and not outside it.

There are two new compiler builtins, __reference_constructs_from_temporary
and __reference_converts_from_temporary.  The former is used to simulate
direct- and the latter copy-initialization context.  But I had a hard time
finding a test where there's actually a difference.  Under DR 2267, both
of these are invalid:

  struct A { } a;
  struct B { explicit B(const A&); };
  const B &b1{a};
  const B &b2(a);

so I had to peruse [over.match.ref], and eventually realized that the
difference can be seen here:

  struct G {
operator int(); // #1
explicit operator int&&(); // #2
  };

int&& r1(G{}); // use #2 (no temporary)
int&& r2 = G{}; // use #1 (a temporary is created to be bound to int&&)

The implementation itself was rather straightforward because we already
have conv_binds_ref_to_prvalue.  The main function here is
reference_from_temporary.  The renaming to ref_conv_binds_to_temporary_p
is because previously the function didn't distinguish between an invalid
conversion and one that binds to a prvalue.

The patch also adds the relevant class and variable templates to .

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/104477

gcc/c-family/ChangeLog:

* c-common.cc (c_common_reswords): Add
__reference_constructs_from_temporary and
__reference_converts_from_temporary.
* c-common.h (enum rid): Add RID_REF_CONSTRUCTS_FROM_TEMPORARY and
RID_REF_CONVERTS_FROM_TEMPORARY.

gcc/cp/ChangeLog:

* call.cc (ref_conv_binds_directly_p): Rename to ...
(ref_conv_binds_to_temporary_p): ... this.  Add a new bool
parameter.  Return true only if the conversion is valid and
conv_binds_ref_to_prvalue returns true.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY.
* cp-tree.h (enum cp_trait_kind): Add CPTK_REF_CONSTRUCTS_FROM_TEMPORARY
and CPTK_REF_CONVERTS_FROM_TEMPORARY.
(ref_conv_binds_directly_p): Rename to ...
(ref_conv_binds_to_temporary_p): ... this.
(reference_from_temporary): Declare.
* cxx-pretty-print.cc (pp_cxx_trait_expression): Handle
CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY.
* method.cc (reference_from_temporary): New.
* parser.cc (cp_parser_primary_expression): Handle
RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY.
(cp_parser_trait_expr): Likewise.
(warn_for_range_copy): Adjust to call ref_conv_binds_to_temporary_p.
* semantics.cc (trait_expr_value): Handle
CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY.
(finish_trait_expr): Likewise.

libstdc++-v3/ChangeLog:

* include/std/type_traits (reference_constructs_from_temporary,
reference_converts_from_temporary): New class templates.
(reference_constructs_from_temporary_v,
reference_converts_from_temporary_v): New variable templates.
(__cpp_lib_reference_from_temporary): Define for C++23.
* include/std/version (__cpp_lib_reference_from_temporary): Define for
C++23.
* testsuite/20_util/variable_templates_for_traits.cc: Test
reference_constructs_from_temporary_v and
reference_converts_from_temporary_v.
* testsuite/20_util/reference_from_temporary/value.cc: New test.
* testsuite/20_util/reference_from_temporary/value2.cc: New test.
* testsuite/20_util/reference_from_temporary/version.cc: New test.

gcc/testsuite/ChangeLog:

* g++.dg/ext/reference_constructs_from_temporary1.C: New test.
* g++.dg/ext/reference_converts_from_temporary1.C: New test.
---
 gcc/c-family/c-common.cc  |   4 +
 gcc/c-family/c-common.h   |   2 +
 gcc/cp/call.cc|  14 +-
 gcc/cp/constraint.cc  |   8 +
 gcc/cp/cp-tree.h  |   7 +-
 gcc/cp/cxx-pretty-print.cc|   6 +
 gcc/cp/method.cc  |  28 +++
 gcc/cp/parser.cc  |  14 +-
 gcc/cp/semantics.cc   |   8 +
 .../reference_constructs_from_temporary1.C| 214 ++
 .../ext/reference_converts_from_temporary1.C  | 214 ++
 libstdc++-v3/include/std/type_traits  |  39 
 libstdc++-v3/include/std/version  |   5 +-
 .../20_util/reference_from_temporary/value.cc | 110 +
 .../reference_from_temporary/value2.cc|  28 +++
 .../ref

[r13-1573 Regression] FAIL: gcc.dg/pr106063.c (test for excess errors) on Linux/x86_64

2022-07-08 Thread skpandey--- via Gcc-patches
On Linux/x86_64,

f7854e2faf7640230062dec3596e71773ca500ed is the first bad commit
commit f7854e2faf7640230062dec3596e71773ca500ed
Author: Tamar Christina 
Date:   Fri Jul 8 08:30:22 2022 +0100

middle-end: don't lower past veclower [PR106063]

caused

FAIL: gcc.dg/pr106063.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r13-1573/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr106063.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr106063.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH] btf: emit linkage information in BTF_KIND_FUNC entries

2022-07-08 Thread Jose E. Marchesi via Gcc-patches



The kernel bpftool expects BTF_KIND_FUNC entries in BTF to include an
annotation reflecting the linkage of functions (static, global).  For
whatever reason they (ab)use the `vlen' field of the BTF_KIND_FUNC entry
instead of adding a variable-part to the record like it is done with
other entry kinds.

This patch makes GCC to include this linkage info in BTF_KIND_FUNC
entries.

Tested in bpf-unknown-none target.

gcc/ChangeLog:

* ctfc.h (struct ctf_itype): Add field ctti_linkage.
* ctfc.cc (ctf_add_function): Set ctti_linkage.
* dwarf2ctf.cc (gen_ctf_function_type): Pass a linkage for
function types and subprograms.
* btfout.cc (btf_asm_func_type): Emit linkage information for the
function.
---
 gcc/btfout.cc| 3 ++-
 gcc/ctfc.cc  | 3 ++-
 gcc/ctfc.h   | 3 ++-
 gcc/dwarf2ctf.cc | 4 +++-
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 31af50521da..417d87cf519 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -740,7 +740,8 @@ static void
 btf_asm_func_type (ctf_dtdef_ref dtd)
 {
   dw2_asm_output_data (4, dtd->dtd_data.ctti_name, "btt_name");
-  dw2_asm_output_data (4, BTF_TYPE_INFO (BTF_KIND_FUNC, 0, 0), "btt_info");
+  dw2_asm_output_data (4, BTF_TYPE_INFO (BTF_KIND_FUNC, 0,
+ dtd->dtd_data.ctti_linkage), 
"btt_info");
   dw2_asm_output_data (4, get_btf_id (dtd->dtd_data.ctti_type), "btt_type");
 }
 
diff --git a/gcc/ctfc.cc b/gcc/ctfc.cc
index f24e7bff948..ad7f8bb8e86 100644
--- a/gcc/ctfc.cc
+++ b/gcc/ctfc.cc
@@ -777,7 +777,7 @@ ctf_add_function_arg (ctf_container_ref ctfc, dw_die_ref 
func,
 ctf_id_t
 ctf_add_function (ctf_container_ref ctfc, uint32_t flag, const char * name,
  const ctf_funcinfo_t * ctc, dw_die_ref die,
- bool from_global_func)
+ bool from_global_func, int linkage)
 {
   ctf_dtdef_ref dtd;
   ctf_id_t type;
@@ -792,6 +792,7 @@ ctf_add_function (ctf_container_ref ctfc, uint32_t flag, 
const char * name,
 
   dtd->from_global_func = from_global_func;
   dtd->dtd_data.ctti_info = CTF_TYPE_INFO (CTF_K_FUNCTION, flag, vlen);
+  dtd->dtd_data.ctti_linkage = linkage;
   /* Caller must make sure CTF types for ctc->ctc_return are already added.  */
   dtd->dtd_data.ctti_type = (uint32_t) ctc->ctc_return;
   /* Caller must make sure CTF types for function arguments are already added
diff --git a/gcc/ctfc.h b/gcc/ctfc.h
index 001e544ef08..273997a2302 100644
--- a/gcc/ctfc.h
+++ b/gcc/ctfc.h
@@ -116,6 +116,7 @@ typedef struct GTY (()) ctf_itype
   } _u;
   uint32_t ctti_lsizehi;   /* High 32 bits of type size in bytes.  */
   uint32_t ctti_lsizelo;   /* Low 32 bits of type size in bytes.  */
+  uint32_t ctti_linkage;   /* Linkage info for function types.  */
 } ctf_itype_t;
 
 #define ctti_size _u._size
@@ -423,7 +424,7 @@ extern ctf_id_t ctf_add_forward (ctf_container_ref, 
uint32_t, const char *,
 extern ctf_id_t ctf_add_typedef (ctf_container_ref, uint32_t, const char *,
 ctf_id_t, dw_die_ref);
 extern ctf_id_t ctf_add_function (ctf_container_ref, uint32_t, const char *,
- const ctf_funcinfo_t *, dw_die_ref, bool);
+ const ctf_funcinfo_t *, dw_die_ref, bool, 
int);
 extern ctf_id_t ctf_add_sou (ctf_container_ref, uint32_t, const char *,
 uint32_t, size_t, dw_die_ref);
 
diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc
index a6329ab6ee4..39714c2 100644
--- a/gcc/dwarf2ctf.cc
+++ b/gcc/dwarf2ctf.cc
@@ -644,6 +644,7 @@ gen_ctf_function_type (ctf_container_ref ctfc, dw_die_ref 
function,
 
   ctf_funcinfo_t func_info;
   uint32_t num_args = 0;
+  int linkage = get_AT_flag (function, DW_AT_external);
 
   ctf_id_t return_type_id;
   ctf_id_t function_type_id;
@@ -687,7 +688,8 @@ gen_ctf_function_type (ctf_container_ref ctfc, dw_die_ref 
function,
   function_name,
   (const ctf_funcinfo_t *)&func_info,
   function,
-  from_global_func);
+  from_global_func,
+   linkage);
 
   /* Second pass on formals: generate the CTF types corresponding to
  them and add them as CTF function args.  */
-- 
2.11.0



Re: [PATCH 00/12] RFC: Replay of serialized diagnostics

2022-07-08 Thread David Malcolm via Gcc-patches
On Wed, 2022-06-22 at 18:34 -0400, David Malcolm wrote:
> We currently have a couple of formats into which our diagnostics can
> be serialized: (a) gcc's own json format and (b) SARIF, via:
> 
>   -fdiagnostics-format=json-stderr (and -fdiagnostics-format=json)
>   -fdiagnostics-format=json-file
>   -fdiagnostics-format=sarif-stderr
>   -fdiagnostics-format=sarif-file
> 
> This experimental patch kit implements the ability for GCC to
> replay these serialized diagnostics - and thus to use GCC's diagnostic
> printing machinery to print the output of any tool that can write to
> these formats.

[...snip...]

This kit should refer to PR other/96032, fwiw:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96032

Dave



Re: [PATCH/RFC] combine_completed global variable.

2022-07-08 Thread Segher Boessenkool
Hi!

On Thu, Jul 07, 2022 at 08:40:38PM +0100, Roger Sayle wrote:
> Although I've not been able to reproduce your ICE (using gcc135
> on the compile farm), I completely agree with Segher's analysis
> that the Achilles heel with my approach/patch is that there's
> currently no way for the backend/recog to know that we're in a
> pass after combine.

To know combine has been run at least once already, yes.  It currently
won't run more than one of course, but the only thing that really stops
running it more than once was historically that combine was one of the
more expensive passes.  Now otoh it is really quite cheap.

Currently combine is one of the last passes before RA.  It likely will
be nice to run combine quite early as well (or a limited version of it
that is), just like other optimisation passes already are, including its
closest relative fwprop.

> Rather than give up on this optimization (and a similar one for
> I386.md where test;sete can be replaced by xor $1 when combine
> knows that nonzero_bits is 1, but loses that information afterwards),
> I thought I'd post this "strawman" proposal to add a combine_completed
> global variable, matching the reload_completed and regstack_completed
> global variables already used (to track progress) by the middle-end.

It should be called differently, given the above.

> I was wondering if I could ask you could test the attached patch
> in combination with my previous rs6000.md patch (with the obvious
> change of reload_completed to combine_completed) to confirm
> that it fixes the problems you were seeing.
> 
> Segher/Richard, would this sort of patch be considered acceptable?
> Or is there a better approach/solution?

There is the thing I have been working on for about a year now, that
makes combine's nonzero_bits superfluous, by making the generic version
more capable than combine's version.  That is a better way forward imo,
it solves many other problems as well.  But it isn't ready yet.

Your patch is fine with a name change (and documentation change)
indicating it means that combine has run at least once.

Thanks!


Segher


Re: [x86 PATCH] Fun with flags: Adding stc/clc instructions to i386.md.

2022-07-08 Thread Segher Boessenkool
Hi!

On Fri, Jul 08, 2022 at 08:14:58AM +0100, Roger Sayle wrote:
> This patch adds support for x86's single-byte encoded stc (set carry flag)
> and clc (clear carry flag) instructions to i386.md.

Maybe add a test for clc as well?  Because:

> +(define_insn "x86_clc"
> +  [(set (reg:CCC FLAGS_REG) (const_int 0))]
> +  ""
> +  "stc"
> +  [(set_attr "length" "1")
> +   (set_attr "length_immediate" "0")
> +   (set_attr "modrm" "0")])

Spot the problem :-)

> +/* { dg-final { scan-assembler "stc" } } */

This checks if the substring "stc" occurs anywhere in the generated
assembler code.  More robust is to use scan-assembler-times, and to use
\mstc\M (same as \ in some other languages, or \bstc\b in Perl).


Segher


[COMMITTED] Fix tree-opt/PR106087: ICE with inline-asm with multiple output and assigned only static vars

2022-07-08 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The problem here is that when we mark the ssa name that was referenced in the 
now removed
dead store (to a write only static variable), the inline-asm would also be 
removed
even though it was defining another ssa name. This fixes the problem by checking
to make sure that the statement was only defining one ssa name.

Committed as approved after a bootstrapped and tested on x86_64 with no 
regressions.

PR tree-optimization/106087

gcc/ChangeLog:

* tree-ssa-dce.cc (simple_dce_from_worklist): Check
to make sure the statement is only defining one operand.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/inline-asm-1.c: New test.
---
 gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c | 14 ++
 gcc/tree-ssa-dce.cc|  7 +++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c

diff --git a/gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c 
b/gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c
new file mode 100644
index 000..0044cb761b6
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c
@@ -0,0 +1,14 @@
+/* PR tree-opt/106087,
+   simple_dce_from_worklist would delete the
+   inline-asm when it was still being referenced
+   by the other ssa name. */
+
+static int t;
+
+int f(void)
+{
+  int tt, tt1;
+  asm("":"=r"(tt), "=r"(tt1));
+  t = tt1;
+  return tt;
+}
diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index bc533582673..daf0782b0e1 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -2061,6 +2061,13 @@ simple_dce_from_worklist (bitmap worklist)
   if (gimple_has_side_effects (t))
continue;
 
+  /* The defining statement needs to be defining only this name.
+ASM is the only statement that can define more than one
+(non-virtual) name. */
+  if (is_a(t)
+ && !single_ssa_def_operand (t, SSA_OP_DEF))
+   continue;
+
   /* Don't remove statements that are needed for non-call
 eh to work.  */
   if (stmt_unremovable_because_of_non_call_eh_p (cfun, t))
-- 
2.17.1



Re: Mips: Fix kernel_stat structure size

2022-07-08 Thread Hans-Peter Nilsson
On Fri, 1 Jul 2022, Dimitrije Milosevic wrote:

> Fix kernel_stat structure size for non-Android 32-bit Mips.
> LLVM currently has this value for the kernel_stat structure size,
> as per compiler-rt/lib/sanitizer-common/sanitizer_platform_limits_posix.h.
> This also resolves one of the build issues for non-Android 32-bit Mips.

I insist that PR105614 comment #7 is the way to go, i.e. fix
the merge error, avoiding the faulty include that it
reintroduced.  Was this tested on O32?

>
> libsanitizer/ChangeLog:
>
> * sanitizer_common/sanitizer_platform_limits_posix.h: Fix
> kernel_stat structure size for non-Android 32-bit Mips.
>
> ---
>
>  libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h 
> b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
> index 89772a7e5c0..62a99035db3 100644
> --- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
> +++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
> @@ -83,7 +83,7 @@ const unsigned struct_kernel_stat64_sz = 104;
>  #elif defined(__mips__)
>  const unsigned struct_kernel_stat_sz = SANITIZER_ANDROID
> ? FIRST_32_SECOND_64(104, 128)
> -   : FIRST_32_SECOND_64(144, 216);
> +   : FIRST_32_SECOND_64(160, 216);
>  const unsigned struct_kernel_stat64_sz = 104;
>  #elif defined(__s390__) && !defined(__s390x__)
>  const unsigned struct_kernel_stat_sz = 64;
>
> ---


Re: Mips: Fix kernel_stat structure size

2022-07-08 Thread Xi Ruoyao via Gcc-patches
On Fri, 2022-07-08 at 21:42 -0400, Hans-Peter Nilsson wrote:
> On Fri, 1 Jul 2022, Dimitrije Milosevic wrote:
> 
> > Fix kernel_stat structure size for non-Android 32-bit Mips.
> > LLVM currently has this value for the kernel_stat structure size,
> > as per compiler-rt/lib/sanitizer-
> > common/sanitizer_platform_limits_posix.h.
> > This also resolves one of the build issues for non-Android 32-bit
> > Mips.
> 
> I insist that PR105614 comment #7 is the way to go, i.e. fix
> the merge error, avoiding the faulty include that it
> reintroduced.  Was this tested on O32?

I'm pretty sure it is *not* the way to go.

Sanitizer does not really intercept system call.  It intercepts libc
stat() or lstat() etc. calls.  So you need to keep struct_kernel_stat_sz
same as the size of struct stat in libc, i. e. "the size of buffer which
*libc* stat()-like functions writing into".

The "kernel_" in the name is just misleading.

And, if you still think it should be the way to go, let's submit the
change to LLVM and get it reviewed properly.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH][s390]: Fix the usage of store_bit_field in the backend.

2022-07-08 Thread Jeff Law via Gcc-patches




On 7/8/2022 6:10 AM, Tamar Christina via Gcc-patches wrote:

Subject:
[PATCH][s390]: Fix the usage of store_bit_field in the backend.
From:
Tamar Christina via Gcc-patches 
Date:
7/8/2022, 6:10 AM

To:
gcc-patches@gcc.gnu.org
CC:
n...@arm.com, uweig...@de.ibm.com


Hi All,

I seem to have broken the s390 bootstrap because I added a new parameter to the
store_bit_field function to indicate whether the value the field of is being set
is currently undefined.

If it's undefined we use a subreg instead.  In this case the value of false
restores the old behavior.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/s390/s390.cc (s390_expand_atomic): Pass false to 
store_bit_field to
 indicate that the value is not undefined.

OK.  And I went ahead and pushed it to restore s390 to a buildable state.

jeff


[PATCH v3] loongarch: fix mulsidi3_64bit instruction

2022-07-08 Thread Xi Ruoyao via Gcc-patches
v3: Relax scan-assembler pattern in test case mulw_d_w.c.  It's because
multiplication is Abelian and the compiler may switch the order of
operands in the future. 

-- >8 --

(mult (sign_extend:DI rj:SI) (sign_extend:DI rk:SI)) should be
"mulw.d.w", not "mul.d".

gcc/ChangeLog:

* config/loongarch/loongarch.md (mulsidi3_64bit): Use mulw.d.w
instead of mul.d.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/mulw_d_w.c: New test.
* gcc.c-torture/execute/mul-sext.c: New test.
---
 gcc/config/loongarch/loongarch.md |  2 +-
 .../gcc.c-torture/execute/mul-sext.c  | 20 +++
 gcc/testsuite/gcc.target/loongarch/mulw_d_w.c | 10 ++
 3 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/mul-sext.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/mulw_d_w.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index d3c809e25f3..8f8412fba84 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -621,7 +621,7 @@ (define_insn "mulsidi3_64bit"
(mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
 (sign_extend:DI (match_operand:SI 2 "register_operand" "r"]
   "TARGET_64BIT"
-  "mul.d\t%0,%1,%2"
+  "mulw.d.w\t%0,%1,%2"
   [(set_attr "type" "imul")
(set_attr "mode" "DI")])
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/mul-sext.c 
b/gcc/testsuite/gcc.c-torture/execute/mul-sext.c
new file mode 100644
index 000..8b6800804fb
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/mul-sext.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+
+typedef __INT64_TYPE__ int64_t;
+typedef __INT32_TYPE__ int32_t;
+
+/* f() was misoptimized to a single "mul.d" instruction on LA64.  */
+__attribute__((noipa, noinline)) int64_t
+f(int64_t a, int64_t b)
+{
+  return (int64_t)(int32_t)a * (int64_t)(int32_t)b;
+}
+
+int
+main()
+{
+  int64_t a = 0x11451401;
+  int64_t b = 0x19198101;
+  if (f(a, b) != 1)
+__builtin_abort();
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/mulw_d_w.c 
b/gcc/testsuite/gcc.target/loongarch/mulw_d_w.c
new file mode 100644
index 000..4ab7df8836b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/mulw_d_w.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "mulw.d.w" } } */
+
+/* This should be optimized to mulw.d.w for LA64.  */
+__attribute__((noipa, noinline)) long
+f(long a, long b)
+{
+  return (long)(int)a * (long)(int)b;
+}
-- 
2.37.0